18th August 2020 - 5 min read

Top 5 data quality issues and how to overcome them

By Joni Lindes

If you have ever grappled with reporting and building a data strategy from scratch, you will have struggled with data quality. Data quality is when inaccurate information makes it into the report – throwing off your numbers and leading to incorrect conclusions – minimising your credibility.

At PredictX, we have worked with clients who have, in the past, often experienced data quality challenges. That is why we have data verification processes set up during data transfers to catch issues earlier and work with suppliers of source data to fix these problems.

If you are struggling to deal with poor quality data, these are the common issues we experience and how to avoid them:

Common data quality issues and how to avoid them:

1. Duplicated data

When we have multiple, siloed systems, which we often have in corporate travel, duplicated data becomes inevitable. The same trip can be booked through an agency and appear in the credit card feed at the same time. Both these systems need to be combined for a total trip cost – leaving us with a duplicate record.

Ensure you or your data provider has an appropriate data verification process complete with data deduplication tools to comb through the data and identify duplicate records – even if the record or name is not exactly the same but with some sort of similarity. As each data source supplier has a different method of writing the same information, like hotel property name, for example, ensure your data deduplication tool recognizes similar data points and can flag these for deduplication.

If you have a third party data analytics provider like PredictX, you may not need an additional tool. Rather, you can make use of an advanced data verification process where algorithms automatically remove duplicates and potential duplicates.

2. Incomplete fields

If data is captured manually through entry by humans, incomplete fields are a common hazard. Agents in a rush sometimes do not record every bit of information from a phone booking just as travellers filing expense reports do not always provide complete details on who the merchant for an expense was.

Overcome this issue by using systems that don’t allow bookings to be submitted unless all fields have been filled out. Set up data management systems that automatically exclude incomplete entries – grouping them aside for analysis later. If you used a third party data management solution, ensure they have relationships with data suppliers and can potentially chase up missing information.

3. Inconsistent formats

Differing data formats are always a major pain point for any analyst. Something as simple as a date, for example, can be entered in multiple different ways. This causes problems, especially as data flows from one system to another, where wrongly-interpreted dates can lead to numerous dramatic data errors. More recent developments have started to specify formats, but legacy protocols do not always conform to these.

IATA’s NDC programme, for example, aimed to solve fragmentation in airline content by developing and driving adoption of an XML-based data transmission standard.

GDS systems and TMCs alike had to invest in ensuring their systems can support this standard. That was a huge exercise. It shows just how much variation we have in travel industry data.

When working with data sources and suppliers, make sure you, or your data analytics provider specifies preferred formats. This may still leave issues. Ensure you have automated AI-based data verification processes in place to catch these issues earlier, so they can be corrected.

4. Different languages and measurement units

Similar to different formats, each country has their own way of expressing measurement. If data with two different unit measurements or currencies are added together this can create incorrect figures easily.

One example of how this can go drastically wrong was when NASA lost a spacecraft due to a metric math mistake. The Mars Climate Orbiter, built at a cost of $125 million dollars, was a robotic space probe built to discover the climate on Mars. The navigation team at the Jet Propulsion Laboratory (JPL) used the metric system of millimetres and metres in its calculations, while Lockheed Martin Astronautics in Denver, Colorado, which designed and built the spacecraft, provided crucial acceleration data in the English system of inches, feet, and pounds. A NASA review board found that the problem was in the software controlling the orbiter’s thrusters. The software calculated the force that the thrusters needed to exert in pounds of force. A second piece of code that read this data assumed it was in the metric unit—“newtons per square meter”. This navigation mishap pushed the spacecraft dangerously close to the planet’s atmosphere where it presumably burned and broke into pieces.

When it comes to languages, special characters like umlauts and accents can wreak havoc if a system isn’t configured to handle them. To a computer, Zurich does not equal Zürich and, perhaps more subtly, Sao Paulo doesn’t equal São Paulo. Duplicates can be missed and the same hotel property can be represented multiple times.

Travel data obviously deals with multiple different languages and measurements. Make sure your analytics partner’s algorithms are trained to deal with these complications. Currencies should also be converted into one before making any type of calculation.

5. Human error

The biggest obstacle to data quality is, in fact, us. Employees and agents can make typos leading to data quality issues, errors and incorrect data sets. The only way to limit this is by minimising human effort as much as possible.

We live in a world where AI is making automation more possible every day. Instead of getting employees to fill out expense reports, use virtual wallets which automatically log expense transactions and direct purchases straight away. Also, when data is analysed and transferred between systems, using AI-based systems and advanced algorithms instead of multiple offshore analysts will ensure that human error is reduced as much as possible.

In every analysis there will always be data quality issues yet the right data strategy that minimises the opportunity for error from data capture through to processing and analysis can ensure these issues are dealt with earlier on. Data quality errors can then become more manageable.

These days, C-level stakeholders are intimately involved in travel programme management. The right data can help manage cost, optimise duty of care and improve travel programmes. Communicating incorrect and inaccurate data to stakeholders can make your data lose credibility.

If you need help with developing a data strategy that can overcome these challenges, book a demo with a member of our team.

Joni Lindes
By Joni Lindes
5 min read

Leave a Comment

Your comment will need to be approved.

PredictX uses cookies to improve your experience on our website. Cookies enable you to enjoy certain features and they also help us understand how our site is being used. Find out more here. By continuing to use our site you consent to using cookies. More information

The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.

Close