Everyday we see TMCs and other data specialists boast about new, innovative solutions. We hear about digital assistants, predictive analytics and pre-trip data making data analysis as close to real-time as possible yet, often, one of the most untold stories is the actual data they are working with in the first place: what is data and what makes it useful?
Through data we get knowledge which gives us the confidence to make the right call in every situation. A category like Travel manages billions of transactions tracked by data. Suppliers collect and process the data using multiple formats and names – leaving the travel managers to access the secondary data through the back office. As corporate travel programmes don’t “input” or “capture” the data themselves, the initial data format, accuracy and overall quality is very much out of the travel manager’s hands. Quality control over this data needs to happen after the initial data capture – something most know is no easy task.
This is when the question of data quality comes into play. Even though it’s not the most “fun” or interest-provoking field of discussion – somehow it is the most fundamental. If your data quality is poor, anything you build based off that data will be poor as well, whether it is sourcing new contracts, predictive forecasting or even changes in travel policy – it is like driving blind without even knowing it.
So what is poor data quality?
Data quality is defined as the ability the data has to serve its purpose. If we want data to serve its purpose in travel, it must be accurate and provide a clear picture of what happened in reality. The problem is that, often travel data is not as accurate as we would like.
Why is Travel data often inaccurate?
Any travel manager sitting in a budget meeting where your figures don’t reflect the figures from other stakeholders are all too aware of how important accuracy is in an industry which continues to make data accuracy more difficult everyday.
But first, before we look at what is happening now, we need to delve into the history of data in travel. As we know, most of the systems at the heart of the travel data ecosystem were built and developed as air travel started to, excuse the pun, “take off” in the fifties. American Airlines converted a US Air Force flight operation coordination system into the first GDS to process passenger bookings. This is the start of what we know as Sabre. Apollo, which we now know as Amadeus was the other main GDS. We know these systems because, well over six decades later, we continue to use them. The entire travel market, is in fact, to some extent, dependent on the GDS for data. When we are dependent on them and we have no choice but to use their data, they have very little incentive or need to be concerned over the quality or accuracy of the data they process.
The next stop in our flow of data is the TMC feed itself. First, the TMC can only work with what they receive from the GDS. Secondly, this step is another opportunity where data can be inputted incorrectly.
As TMCs make no savings based on having the correct data on volume and demand for each hotel, airline or supplier they have no financial incentive to ensure this data is clean. Their chief volume concern is the amount of bookings that come from the travellers themselves. Whether JR MARRI means Marriott Canary Wharf or Marriott Grosvenor Square is inconsequential to them. Rather, it’s the corporate company footing the bill and ensuring traveller safety that has the most stake in whether this data is accurate or of good quality. As many travel managers know, having good quality data is rife with many challenges.
Top 3 data quality challenges and how to overcome them
Challenge Number One
There is no standard way to input each data point
A simple hotel name can have multiple variations each time it is inputted – making it incredibly difficult to get an accurate picture of volume per hotel chain.
Challenge Number Two
There is no single version of truth
Many educators have drilled into us the importance of going to multiple sources for accurate information when we are writing a research paper, for example. When it comes to data, multiple sources of truth is even more necessary. One source often misses most of the data.
TMC data, for example, is often thought as the source where all data can be pulled from. Our internal study analysed around $8,5 billion of travel-related spend only to find that the TMC only holds around an average of 41% of all travel spend. If we are only looking at a portion of what happens in reality, how much value can be gained from it?
Challenge Number Three
Even though each GDS, TMC and Credit Card company has their own data format, we forget the most uncontrollable variable in the whole process is the human being. When we fill out data fields in Expenses, for example, typos are made, wrong information is added or placeholder data like ‘x’s are added if the agent or employee is in a rush.
Any analyst who has worked on cleaning up a dataset will tell you that the data inputted by humans is often the most inaccurate and jumbled – requiring a huge cleanup.
How can we overcome these challenges?
Although these challenges seem numerous, there is a simple answer for all of them: we need to build a “library” of data complete with a master database. The database should exist as a reference system as how the data “should” look like.
When we have a library of data, it is much easier to find out what happened in a single trip. We can reconcile our data feeds allowing us to construct a more complete story that matches reality.
How do we create a library? By fusing together data from different sources and forming a multifaceted version of the truth. Let us say we have an incorrect employee ID in Expense, for example. If we look at another source and see the same bookings at the same time with the same airlines and hotels, we can resolve this small inaccuracy very easily. Similarly, if a hotel has multiple addresses for a trip, we can look at the time the booking went through and the details of the booking in several sources to find the most used address. We then add this to our master database as a reference.
That is why it is crucial to look to other sources like the Credit Card, the Expense system and even the HR feed for job function and not just a single source.
As UBS Global Travel Lead Mark Cuschieri puts it:
“There is no single source of truth. There are multiple data sources in order to get to a single source of truth. We have agency data, Credit Card data, GDS data and multiple sources of data because we know there is leakage out there. You need to have multiple sources to get to the point where you can compare and contrast. THAT is critical.”
How do we clean up this mess?
So we have multiple data from multiple sources, the only problem is that it is EVERYWHERE. It’s now less like a library and more like a giant pile of books piling up on the floor in no apparent order. This is when “cleaning” the data comes into play. Much like a library, we need a “dewey decimal guide” or system that will organise the data to where it should be for easy use.
Any travel manager or analyst doing this knows it is easier said than done. In fact, we recently produced a survey where we asked travel managers on what they found most challenging about their data. It turns out, 43.66% of travel managers reported that the largest data challenge for them was matching different data sources while 21.13% reported that data in too many different places was the most difficult. Fusing this data together into one, clean data set is therefore the number one way to ensure data quality but, simultaneously, the most difficult process to do.
The good news? We can use computers to do this for us now. Simple heuristics can be used to match data between sources. These heuristics can also evolve from the past. Once one data point or problem has been identified and the correct data added to the master database, the heuristics can ensure all other similar data does the same.
Most data feeds, like the TMC for example, will not clean up all your data. Some providers may, if its their own feed, however the single source problem still remains. If you want a truly independent, accurate and clean dataset from multiple sources you are going to have to do the work yourself, or better yet, implement a system to do it for you. This will involve implementing a data analytics solution that has both the travel domain experience and the IT infrastructure to make your data not only accurate and correct but also easily accessible and delivered on time.
Although this takes a lot of work when you implement it in the beginning, once the initial cleanup work is done and your master database is in play, it is an easy, automated process going forward.
We can then get inventive and play with forecasting and predicting future scenarios, but data quality is still fundamental.
Data quality is very much like the board an Olympic diver will jump off. The diver can have all the technical skill, grace and ability and can do many amazing things in the air, but if the board they jump off is faulty, the results can be disastrous. In an industry where we often have less rather than more control over our data we base our programme’s actions on, it is more important than ever to dig out the data, lay it on the floor and use technology to “clean up the mess”. The end result? The ultimate “springboard” for any travel programme: a data set you can trust.