Weekly Wrap up – Data cleaning
I spent a lot of time this week cleaning up some data used to synchronize two databases for one of my clients. Essentially, this involves taking inconsistent or otherwise messy data from system A, and applying changes or rules to it when updating system B. Ideally, data would be clean from the start, but that never is the case. NEVER!
For tasks like this, my goto tool is Perl (see www.perl.org). It’s old school, I know, and most programmers today never heard of it, or used it. It’s sort of like the COBOL of the 90’s. Everybody says it’s obselete, but there’s a lot of legacy code out there written in Perl.
Perl is ideally suited for text / data processing, in my opinion. It’s a scripting language, so it’s quick to make changes and re-run, great with regular expressions, and generally pretty fast. Oh, and it’s ubiquitous. That means, practically every Unix system you run into has it installed already, and it easily installs to Windows if for some reason you have to use it there. That makes it easy to copy and reuse code, without lots of setup.
So, use Perl. Or ask me for help!
Comments Off on Weekly Wrap up – Data cleaning | Uncategorized