Sometimes we see artificial intelligence as little more than a buzzword. We are incessantly told that it is set to change the way we work but it’s often difficult to see how the theoretical capabilities can translate to making a difference in reality. This is because it often works behind everyday functions we take for granted. Something we most definitely take for granted? The quality of our data.
It’s not that we don’t think it’s important. Rather, it’s that we only miss data quality when it is not there. Organisations believe that over 27% of their revenue is wasted due to inaccurate master data. Poor data quality therefore has consequences and we need to be on top of it.
What is data quality and why do we need it?
Data is defined as high quality if it correctly represents the real-world construct to which it refers to. If the spend our data records is actually happening in reality, then we can term that data to be high quality. When we use data to make supplier management and policy decisions it is not ideal when the reality is different to the data. Whether it is going into a meeting with finance only to find your spend figures are inaccurate, or when it comes up during supplier negotiations that your demand figures are incorrect – having accurate and good quality data has never been so important.
How do we get high quality data?
Anyone attempting to do this knows it is, in actual fact, a complex process. Multiple data sources need to be considered, data needs to be matched and de-duplicated and one engine or data warehouse needs to have the infrastructure and scope to run it all. TMCs provide data analytics, however it is not in their interests to preserve the data quality. Travel managers running a programme based off TMC data therefore frequently complain of hotel duplication, incorrect code mapping and poor integration with company hierarchies. We must also not forget that they can only capture spend on bookings made through them or their associated booking tool. As we know we can’t rely on travellers to only use approved booking channels, using TMC data only will ultimately result in leakage.
Even when we have combined multiple data sources we need to remember that each source inputs the data in a different format and, in some cases – like the Expense system, for example – we have humans inputting data in many different ways. Supplier names are spelt differently, and addresses are often inputted incorrectly. Even when the data is machine-generated from a GDS or booking tool, for example, the technology provider’s financial incentives fall around booking volume rather than the situational information within the bookings themselves. There is therefore no incentive to ensure the data is correct. Corporate travel programmes and hotel, air and ground transport suppliers are the only parties receiving the benefit on accurate data. And it’s the corporate travel manager that has the least control over how the data is inputted.
The spend for a single ticket, for example, can also not go through one channel but through multiple channels including TMCs, Cards, the GDS etc. A booking can be made for a single trip via the TMC and later, due to flight delays, the flight can be changed on the card.
Real life does not fit into neat, organised data sets. It is chaotic and, as a result, so is our data. Cleaning it is not an easy process.
What about artificial intelligence?
Fortunately, this is where artificial intelligence (AI) comes in handy. According to Stonebraker, Bruckner, and Ilyas’s study, an AI enabled data curation system reduces the curation costs for data cleansing, data transformation and deduplication by 90%. When we are matching data from disparate sources we can choose to either have seven people working for a month on Excel spreadsheets, or, we can get a machine to do it for us.
Even though AI has recently become pretty much synonymous with machine learning, AI is a broad umbrella term that captures various techniques. In this example, the primary technique used to improve data quality is a relatively simple technique known as heuristics.
Heuristics are used in artificial intelligence, computer science and mathematical optimisation to produce a solution to a given problem in a reasonable amount of time. A heuristic can be used in artificial intelligence systems while searching a solution space. By using some function that is put into the system by the designer or by adjusting the weight of branches based on how likely each branch is to lead to a goal node, a known-good configuration can be derived and applied to multiple datasets. In plain English, the function matches and consolidates data the same way it has previously been successfully matched before. When it comes to travel data the system can learn each variation in hotel name or carrier name etc and match it to the master database. As new variations are added, the system continues to learn.
This is a lot easier than matching each hotel name and booking made from each data set ourselves. The added benefit is the data can be delivered in time to do something about it. As sourcing becomes increasingly dynamic we need our data to be more dynamic as well. Implementing traveller policy also requires immediate intelligence and action from departments rather than a traveller being informed his trip was out of policy two months after it happened.
This ability for AI to learn automatically will enable us to respond to structural changes in data management quicker and easier. Instead of waiting for the code that manually upgrades the system once a year, the system itself adds new tools, creates new features and alters itself to satisfy user requirements. AI allows you to be ‘data agile’ – responding to changes in requirements quickly and easily, rather than building new apps and systems to process it.
Organisations will now increasingly implement AI–based DevOps workflows for application development. That way, engineers can continuously integrate and deliver software updates that leverage the AI’s knowledge and learning. With AI, we are building a process that doesn’t just clean up the current data we use today, but will be able to easily learn, hold and curate the data of the future.