Clean data can make or break a business. Quality data means it is usable, current and accurate. While we all strive to have perfectly clean data, the reality is that data will never be 100% clean. But how does one decide when to clean their data? Here are a few scenarios when data cleaning is a must.
System Migration – When you need to migrate from one system to the next you often have to perform data cleaning. One of the reasons to clean your data prior to migrating is to start with a clean slate. It usually means removing bogus and outdated records as well as de-duping.
The second reason is that fields may be stored differently in a new system, therefore, data standardization should take place. An example of mismatched data that can be fixed automatically is the date field. Instead of ‘12/10/2017’ it would need to be changed to ‘2017-12-10’. Other data fields may have to be standardized by converting from a text field to a drop down unified record.
- A New Year
- Marketing Campaign
- New Data
- Running Analysis
– Some companies do a yearly clean-up to ensure that they have clean data. Some of ongoing data cleaning include retiring historical data, records’ de-duping and normalizing.
– Companies doing direct mail campaigns typically clean their data prior to sending a costly direct mail. Data cleaning can include deduping and address verification. Before data cleaning, you may also want to do some segmentation analysis to figure out who to deploy the campaign to. Usually segmentation is accomplished prior to sending the file for data cleaning.
–When you came back from a big tradeshow with a new list it is a good time to clean the list prior to uploading it to your CRM or conducting your analysis. Your data cleaning may involve de-duping between your CRM and the new list that you are going to upload. It can also mean fields’ normalizing to match your CRM’s formats, appending of the missing data and so on.
– As an analyst, it is not strange to receive a new dataset that you have to analyze. Often when you receive data, it may not be in the format that you need in order to run analysis on it. For example, a field may contain product names without product categories. However, your task is to run predictive analysis on product categories that are growing. Therefore, you will need to extract the category from the product name.
If you do not have an on-going data cleaning plan, you are not deploying a big marketing campaign or migrating your system you may not be sure when it is time to clean your data. After all, you have been successful with your marketing and sales are up until this point. But before you decide against data cleaning, consider some basic indicators that may persuade you to perform data cleaning:
- Bounce Rate is increasing
- Number of duplicates is greater than 3%:
- Inability to use your data:
- Other Departments complain about data quality:
– If you had a bounce rate of 3% but over the past few months it has been slowly increasing to 5% or more it is a good indication that either you are not bringing high quality Emails or your database is aging. Either-way, it is time to validate and clean historical data.
You should conduct a quick duplicate rate test every quarter to identify if duplicate records are becoming a problem. A quick way to test is to download all data into Excel (assuming you do not have millions of records). To find duplicate records apply conditional formatting color to one key field. For contacts or leads you can use Email address as a unique identifier. For account data, you can use website. If the percentage of highlighted cells is greater than 3% it is time to de-dupe your data. You can de-duplicate your data using a de-duping tool or by outsourcing it to a data cleansing company.
Talk to your marketing manager or an analyst and see if they are able to use the data the way they would like. Chances are your marketing manager may want to deploy an Email to a specific industry but they cannot. Either because the data is not in a usable state and, therefore, needs to be normalized or some data is simply missing in which case data appending is required. Your analyst may have the same issues as your marketing manager, where they cannot run analysis on a specific segment.
Sales, customer service, operations, finance or other departments may have an issue with data quality that your department has authority over. When people start verbalizing their issues about data, you are due for a data cleaning.
Now that you have established that you need to clean your data, your next step is to identify what data cleaning you need to perform and how to go about it.
Guest Author : Anna Kayfitz is C.E.O. and founder of StrategicDB Corporation, an analytics and data cleansing company. StrategicDB Corp. helps businesses get more from their data. By analyzing sales and marketing data, you can derive tremendous value for your business. StrategicDB offers data cleansing services because no analysis is possible if you cannot trust your data. Some of our services include: segmentation modelling, dashboard building, market basket analysis, lifetime value analysis and much much more.