Contact Us

Please complete the form and one of our specialists will get back to you to discuss your specific needs.

Request your demo

The Predict.X Platform enables data driven decision making across your organisation, background in advanced analytics not required!

Big Data 15th August 2016 - 4 min read

Poor data quality and its consequences. How to avoid it?

By Bart Badzioch

Data is the lifeblood of any organisation. It’s potentially every company’s greatest asset. With the right data, the right business decisions can be made, customers and staff are better supported and every department can run more efficiently.

But as I said last time, data is also disparate, infinite and often sits siloed in systems that aren’t designed for the purpose, by the time data accessed, it’s often already out of date. That’s if it is ever accessed at all. I also mentioned the company with $25m spend loss that cannot be tracked or monitored, which begs the question – how could this have been avoided?

Well, for a start, better quality data could have resolved this issue. But unfortunately poor data quality isn’t an unusual problem – in fact, many more examples exist of poor data quality than of good.

For example, company travel managers struggling to report on specific areas of the business unit are invariably grappling with unreliable information obtained from inaccurate spreadsheets and user defined fields where up to 80% of data has been input manually. Up to 20% is often totally wrong due to human error.

Poor data quality is not only frustrating – data that can’t be trusted is worse than having no data at all. It can result in under-reporting, delays in decision making and can impact all areas of a business, from the HR system that doesn’t score people properly to the poorly populated CRM that leads to lost sales opportunities.

An example that springs to mind is a 2014 California auditor’s office report on sick leave and vacation credit recording at the State Controller’s Office. The report found instances where eight hours was recorded as 80 – and in one case, 800. In total, the audit uncovered 200,000 questionable hours of leave due to data entry errors, with an overall value of $6 million.

Unfortunately however, many companies are reluctant to admit their data inaccuracies, meaning this often remains a hidden issue. That said, things are getting better; not so long ago, huge quantities of data existed only in warehouses and filing cabinets. But at the same time our expectations of data are higher, as users look for answers to more questions than before and expect their data to provide all the answers.

So, as our reliance on data continues to grow, how can business improve the quality of their data – especially in circumstances where the system is not necessarily to blame?

Well, lack of training is one issue that should be addressed; users may not have been shown how to input information – or how to use data contained in the system. Another is the constant pressure on data inputters to push information in and out as quickly as possible; under those circumstances, the margin of error is bound to be high.

Systems, however, are also guilty. A good system should be able to minimise human error. This can be easily done by eliminating free text fields and thus forcing users to choose from a predefined list of options or allowing to input only certain formats. Of course there is a way round everything.One common real-life example is when people try to avoid giving their real email addresses and type a@a.al instead, which the system incorrectly recognises as a valid email address.

The biggest challenge is posed by external systems which can’t be replaced and which produce incomplete data. This can be frustrating from the reporting perspective. At Pi, we have dealt with this issue already. Instead of changing the way  the external system works, we were able to produce validation forms for our clients where they could fill in any missing information themselves. This helped to provide accurate reports without any interference with external systems.

Another problem is a complete lack of system in place. Companies tend to keep information on spreadsheets which are stored on shared drives. They create thousands of macros, lookups and pivots. The first issue is the limit of records that can be stored on  spreadsheets. The second problem is the lack of consistency in the formulas used, links between files and the quality of data provided. The only way to avoid any inconsistencies in the data and to move on from using old-fashioned spreadsheets, is to use a new system that will store this information; be it a simple web form or a very complicated portal.

Of course, in an ideal world, we would all let the system take the strain. And this can be achieved. Over at Pi, we are using machine learning to optimise data integrity, our SLATE travel platform being just one example. Imagine a situation where people make bookings via an online portal and they do not provide their correct identifier as the system simply accepts any numerical value. On the other hand, they need to provide the details of their expenses once they return from the trip. By using SLATE, we are able to link a booking with its   corresponding expense and by doing so, we can assign the correct employee ID against the booking which was not provided in the first place. Without an automated system in place, performing such operations would take endless hours of mundane work digging through a multitude of spreadsheets and receipts.

Finally, never under-estimate the power of your data. It is a crucial commodity that is not only desirable but necessary for managing projects, avoiding fraud, assessing performance, controlling finances and delivering services efficiently.

Bart Badzioch
By Bart Badzioch
4 min read

Leave a Comment

Your comment will need to be approved.