Data is power – but only if it’s correctly collected, processed and managed
Reportedly, 85 per cent of businesses fail to effectively leverage big data to power their digital transformation initiatives. While the causes of failure are diverse – from process issues to people issues, there is an underlying challenge of data quality that is often the root cause of a digital transformation or big data failure. An Experian survey found 68 percent of businesses experience the impact of poor data quality on their data and transformation initiatives.
All too often, key data quality issues are overlooked unless they become a severe bottleneck causing the failure of an initiative. It’s only at this point that businesses realise they had been building their data foundations on sand. In this article I’ll highlight some of the key problems businesses face and how to rectify them.
The what and the why of data quality
Simply put, data quality refers to the “health” of your data and whether it is fit for its intended use.
This means your data must be:
- Clean and free of errors such as typos, abbreviations and punctuation mistakes
- Valid and complete for all critical fields like phone numbers and email addresses
- Unique and free of redundancies or duplication
- Accurate and reliable for insights, reporting and statistical calculations
- Reliable, up-to-date and accessible whenever necessary
For most organisations, the problem with data only comes to light when a migration or digital transformation initiative is halted because data is not prepared or good enough.
Often in the case of mergers, companies struggle the most with the consequences of poor data. When one company’s Customer Relationship Management (CRM) system is messed up, it affects the entire migration process – where time and effort is supposed to be spent in understanding and implementing the new system, it’s spent in sorting data!
What exactly constitutes poor data? Well, if your data suffers from:
- Human input error such as spelling mistakes, typos, upper- and lower-case issues, lack of consistency in naming conventions across the data set
- Inconsistent data format across the data set such as phone numbers with and without a country code or numbers with punctuation
- Address data that is invalid or incomplete with missing street names or postcodes
- Fake names, addresses or phone numbers
…then it’s considered to be flawed data.
These are considered surface issues that are inevitable and universal – as long as you have humans formulating and inputting the data errors will occur.
However, poor data quality goes beyond surface issues. If data is siloed away, is hard to access and is duplicated, you’ve got serious trouble. Indeed, data duplication is a key challenge most organisations find difficult to tackle.
Let’s understand this further.
Data duplication and data mismanagement as the key challenge
On average, enterprises have some 400 different data sources. Companies are literally drowning in data, especially duplicate data!
There are multiple ways duplicate data can be created of which some of the most common are:
- A user entering their data multiple times through different channels. Someone may be signing up using multiple emails causing inflation in the number of users. A company may think they have 10 new signups when they actually only have three.
- A user may go by several names and nicknames. For example, J.D. Smith may also go by Jonathan Smith or Jonathan Davis Smith. Mr Smith may enter his name as J.D. Smith in a web form, but when he becomes a paying customer and billing information is required, his name may be recorded in the company’s CRM in full. In this particular example, J.D. Smith’s record has been duplicated at two different data sources which are also being used by two different departments.
- Technical glitches or processes in databases and data sources that may result in the duplication of data.
- Partial duplicates created by human error – when a sales rep or a customer service rep enter information manually, for example. This causes a mismatch even though the records may contain the same name or phone number. A spelling mistake, the difficulty in recording non-English names and other such instances may create duplicates. Partial duplicates are the most difficult to overcome especially since they don’t get caught during a normal deduplication process.
Data duplication occurs primarily because of the lack of data governance and data mismanagement. As organisations grow, they focus on simply gathering data. More leads, more buyers, more sales. Vanity metrics are used to measure success.
If businesses really sorted their data, they would see a drastic difference in what they think they have vs what they actually have.
The consequences of bad data
Consider this example:
An organisation’s employees are often at the receiving end of bad data. Day in, day out, marketers, sales reps, and customer service reps attempt to fix data problems, but despite using a powerful CRM like HubSpot, they are still not able to get clean, reliable data.
When executives demand insight reports, the reps show data at a superficial level. In fact, they only discover they have missing emails or phone numbers when they are running a report. Executives don’t look into the nitty-gritty – managers are satisfied as long as their signups, leads and sales targets are met.
All day, employees whose jobs should be to analyse data and contribute to strategic decision-making are frustrated. They know the data is flawed but management isn’t taking the problem seriously enough to invest in a solution.
As a result bad data quality causes:
- Operational inefficiency – poor data affects processes, causing organisations to be inefficient and unproductive. Referring back to the example above, employees hardly get the time to do the job they are intended for. Morale drops, frustration levels are high and employees are simply passing time at the job performing redundant tasks.
- Data security and data compliance risks – data quality issues also cause companies to fall foul of data compliance and regulatory standards. Take the infamous example of PayPal being fined $7.7m for allowing illegal payments simply because the company did not screen its database to detect users blacklisted by the US government.
- Hindrances to transformation plans – although 87 per cent of businesses think digital will disrupt their industry, only 44 per cent are prepared for digital disruption. One of the key hindrances to digital transformation plans is messy data stored in legacy systems that are no longer able to keep up with the security and technological demands of today.
- Poor customer experience – in a digital age, customers expect companies to be efficient and considerate. Poor data quality leads to negative customer experience. For example, a customer gets emails with his name spelled wrong, a buyer receives the wrong parcel, a closed lead still receives emails despite unsubscribing – these are all instances of problems caused by poor data.
- Loss of brand reputation – when your organisation is constantly battling the consequences of poor data, it will result in the loss of brand reputation. In a world where businesses are expected to be ethical, considerate and customer-centric, poor data quality can drag you down.
That sounds alarming, right? Well, luckily, there are positive steps you can take.
Implementing a data quality framework
A data quality framework is basically a lifecycle that makes it possible for companies to fix issues with their data and obtain data they can trust and use.
The framework consists of:
Integration of data sources for real-time or batch cleansing: This allows companies to connect their data sources such as databases, social media platforms, CRMs, emails and any other cloud source to the third-party platform for data profiling and cleansing.
Profiling data to give an overview of problems: This gives you an overview of the quality of your data. You can discover the percentage of data that is missing, invalid, corrupt, or flawed and find out the ‘health’ of your data fields. Data profiling will help you gauge the complexity of problems and the kind of standards you will need to put in place to ensure such problems don’t recur.
Cleaning data of errors, typos and format issues: Once you get an idea of the problems plaguing your data in the data profiling stage, you begin with data cleansing to fix those problems. For example, if data profiling shows that the [Phone] field contains letters of the alphabet or punctuation, data cleansing will remove them – automatically and with no manual intervention required.
Removing duplicates and merging data sources with data matching: The most important part of the data quality framework, data matching does a number of things. It helps you:
- Find redundant information and remove it
- Clear duplicates ensuring uniqueness of content
- Match data from one or multiple data sources
- Match and merge data from multiple sources to maintain a single source of truth
- Identify fake, invalid, incomplete or contradictory information
Implementing standards with data governance: As you clean, match and remove duplicate data, you’re much better equipped to understand the steps required to prevent such errors from happening. For example, you could employ a mechanism to ensure that all phone numbers start with + (country code) followed by (city code). Furthermore, you could also categorise phone numbers into mobiles, landlines or VOIP numbers.
It’s all about the process
Eve since data was first collected, stored and processed, data quality has been a consistent problem in the business world. However, in our current age, companies are so occupied with making grand plans for data that ironically, they often miss out on the very basics.
While there are tools that can be used to manage and fix data, they will not be effective if there is no process in place. Getting to grips with data issues and instituting reliable data management processes provides a strong, efficient foundation for digital transformation and the kinds of projects many companies have today.