The Seven Golden Rules of Data Quality

One of the current and persistent topics in recent times has been Big Data, which is usually followed by Data Analytics. The impression we all have is that there is some golden nugget of information that can be derived from all of this Big Data stuff that will make our business successful and deliver an unbeatable USP.

On the other hand, we commonly hear about, or experience, campaigns/initiatives/business information being delayed and/or under delivering as a result of data issues. Whether it is sourcing, combining or interpreting of data, the projects fall as the results often don’t make sense.

This occurs because, in reality, many organisations are simply not ready to take the dive and use their data. This is most probably because they don’t trust it (see Dr. Kenneth McKenzie's views on how companies should use Big Data here).

You need to get back to the basics and forget big ideas, concentrating on the base unit of the whole process - your data. I admit it may not sound that exciting, but if you want those great results then take the five minutes to read on and it may help you stop running blind.

The starting point for anybody setting on the necessary road of using the most important of all assets - your customer data - is to have standards around your data. By following the seven simple rules of Data Quality outlined below, you will get through a long leg of that journey.

What is Data Quality?

So before we hop to the seven key rules, we must first understand what Data Quality is. The following definition is one I think is very useful when trying to grasp the concept of Data Quality:

"Data quality is about producing information that is fit for purpose so that services can be managed, delivered efficiently and effectively and in response to a need."

So what does that really mean? Well, Data Quality is about producing robust data across an organisation that is trusted, relied upon, and used. Simple!

In this blog we are focused on data. The gathering, controlling, verification, and use of data. Everything else will make appearances in future blogs.

Seven Golden Rules of Data Quality

At last, you say, the meat of the blog that we were looking for! And yes, it will be worth the wait.

There are seven areas of Data Quality that need to be considered. These are:

Accuracy
Validity
Relability
Timeliness
Relevance
Completeness
Cleanliness

So lets take a whirlwind tour of what these terms mean in practice:

1. Accuracy – The data that is being captured in correspondence to what it is being used for in the real world needs to be sufficiently accurate for its intended use. Along with being accurate, the data should only be captured once at the first point of activity, and ideally, be useable across multiple systems, as required. This can be subjective and can change overtime, so we need to be rechecking what we mean by accuracy.

2. Validity – The data should be captured and held in accordance with relevant organisation requirements, e.g. syntax, format, range. This is required to ensure consistency between data capture periods and departments/other organisations.

3. Reliability – There needs to be clearly understood and applied, stable and consistent data collection processes, both across collections and over time. The source data needs to be clearly identified and be useable, whether collected manually, automatically or from other systems and/or datasources. This creates a consistency so that two or more representations of a thing can be compared, e.g. date of birth and age

4. Timeliness – Data should be captured as quickly as possible after an event and must be available for use as soon as possible thereafter, as frequently as required. A simple way to check timelines is by 'time stamping' when the data is captured. It must be timed to ensure it meets the needs of the organisation, and support the influence service and management decisions. So if you are a stockbroker, you need instant real time values on stock prices and you also may need information on a particular company from your annual accounts. However, you may be assessing a company based on annual accounts which are a year old.

5. Relevance – This is going to sound obvious, but data should be relevant to the purpose for which it is being used. What this means is that you have to keep the data that is being collected under review to make sure it is in line with on going and changing organisational needs.

6. Completeness – There needs to be a specific and detailed descripition of what the data requirements are to meet the needs of the organisation. We need to understand the relevant aspects of the problem we are trying to solve, and include the full population, time period, and geographic area.

7. Cleanliness – This means that the data is free of any duplicates, is organised, standardised structured, and labelled. Most data does not fit into neat data tables, e.g. emails, social media, videos, reports etc., yet it still needs to documented and structured to be useful. (Our AddressFix service can help standardising and structuring existing customer lists).

What are the dangers of getting it wrong? Well the old IT adage keeps coming back to haunt you - ‘Rubbish In Rubbish out’. So stop, check, and think - Do we have the data management in place to use our data wisely?

1160bc9-(1)-1.jpg

Dara Keogh

CEO

Connecting Big Data with Business - Carme Artigas, Synergic Partners

Big Data: HANDLE WITH CARE - Dr. Kenneth McKenzie, Target McConnells

Posted: 12/05/2017 15:22:27

Discover more insights

Get the latest news, insights and trends from the GeoDirectory blog

Learn more

The Seven Golden Rules of Data Quality

What is Data Quality?

Seven Golden Rules of Data Quality

Related Articles

Discover more insights