Train Only Predictions — a new method to enhance reliability of your machine learning/artificially intelligent solutions
For various reasons, the performance of a machine learning/artificial intelligence model once deployed can lag far behind the training performance. These range from having biases and variations in data, to unidentified assumptions to improper hyper parameter usage.
Since such instances are a direct loss to the investments made, and often leads to the solution being shelved, it is important that such situations be addressed.
This paper outlines one method that can be used as a last resort and that can allow organizations to continue using the deployment machine learning/artificial intelligence solutions until the root cause is addressed.
While the long term solution shall be to identify the root cause of deviation in performance, the given method can be used in the interim.
For the sake of the discussion, let us say we are using kmeans clustering in our solution for unsupervised learning.
In an ideal world, the kmeans would be trained and then the fully trained, validated model would be deployed to production for future uses. During production runs, the model would loaded and straight away be used to predict the new incoming data.
The proposed solution, that we are calling, Train Only Prediction, takes a slightly different approach. Here are the outlined steps:
1) Choose the prediction you want to use. Let's call this f(x)
2) Train the model using your training data, called d1, test and tune until its production ready. Retain your hyperparameters. Let's call the hyperparameter set p1. Let's say this yields the prediction set f1'. Our working model now becomes f1' = f(d1, p1)
3) For the new dataset needing prediction needed in production, let's called d2, first append d2 to d1. Let's call the new set, d3, ie, d3 = d1 + d2
4) Train the updated dataset again in production using the same hyperparameter set, p1, ie, f2' = f((d3), p1)
5) Once trained, identify the predicted values during this training phase for all of d2 - this is your final result, ie, f2'[index of d2 in d3]
No further steps are needed.
Why it works
Let us first understand why prediction fails. Most models make predictions in a single pass, ie, they have one look at the incoming data and guess what the result should look like, based on whatever they have learnt during training. If this guess is wrong, there are no second takes.
Training phase works different. In training phase, the models are designed to iterate a few times over the entire set and incrementally self correct their results in each iteration until the final result becomes acceptable. In machine learning model,s this is achieved through cross validation techniques and in neural networks the combination of epochs and mini batches achieves the result.
Train Only Predcition leverages this behaviour. By putting the small amount of d2 with a vast d1 ensures that range of similar items as contained in d2 increases manifold. Now when this enhanced dataset is run iteratively through the training phase, in the end the probability of it finding the right clusters increases manifold.
This approach suffers from the following limitations:
1) This increases the prediction time by a large factor
2) This is not suitable for supervised learning or regression problems
3) This method can be used for cases when time can be sacrificed in return of a reliable prediction system
Using stratified sampling can speed the process up slightly.
If you have any thoughts on this, please share your views and feedbacks.
Also published on www.itmtb.com