Almost all the prospects I have a conversation with have a few AI use-cases thought out. In my previous blog, I focused on how to find the correct use case to begin your AI journey with. In this article, I want to focus on the key ingredient of a successful AI implementation – data.
While it is common sense today that AI runs on (historic) data, most organizations don’t have all the data required to build a successful AI model. If they do, they don’t have it in the right format or at the right level of granularity. In fact, in many cases there is a lot of confusion over the variables to be included in an AI model. So, to ensure your AI product is a success, first of all enumerate all the data inputs you will require to build an AI model. Find out which variables have an impact on the outcome you have in mind.
For example, if you are building a customer churn prediction model, list out all the data points you want to base your predictive model upon. If you think the duration of conversation with customer service executive is an important driver of churn prediction, include it as a variable. Now, if you don’t have the cumulative time spent in conversation with a customer service executive for each of the customer, your model would not have this as an input. So the first step in this case would be to start capturing this data. Now, it is unlikely that you don’t log the customer conversation duration data. But even if you do, many a times it is aggregated at representative level and not readily available at customer level. In these cases you need some data engineering or ETL. The point being, only having the data captured is not enough, it needs to be in the right format and right level of granularity.
Just having the data at the right level of granularity is not sufficient. Many-a- times data is not accurate or clean. For example, missing values or incorrectly captured values are often present in datasets. If these values are part of the training dataset, your model accuracy will suffer. So it is very important to ensure that your data is clean and accurate.
In cases where data inputs are manually entered in the system, there is larger risk of data inaccuracies as compared to scenarios where data is automatically captured and replicated across systems. However, in most large businesses, data is spread across department silos and hence having a single version of truth becomes difficult. So it is very important that you choose the right source systems to feed data into your machine learning models.
Given these challenges, building efficient data engineering pipelines is as important as having the right data science / AI-ML expertise when it comes to success of AI projects. At Konverge.AI we’ve consciously chosen cloud data engineering as one of our key focus areas. Our cloud data engineering capabilities enable us to fetch the right data from the right systems to the models we build. This is a key component of our AI success so far.
Since data is a key ingredient in any AI project, quality and availability of data becomes crucial for AI success. If you are planning on building your AI roadmap, data availability is one of the most important things you should focus on. They say data is the new oil and it should be pure to ensure that there is no smoke around your AI vehicle.