Are we thinking hard enough about AI Models and Business Impact?
To illustrate, let us assume we are running an e-commerce engine that uses AI on user-supplied queries to identify the user-intended category. Say the AI model in production takes 20ms latency and has 90% accuracy.
A natural next goal from an AI perspective would be to drive the accuracy higher, to 95% or more. We know that improving accuracy almost always requires the consumption of more resources and will likely either hurt the latency of 20ms or drive the cost higher with more machines. The improvement and engineering itself can be a year-long project.
The business goal, however, is always the same. Given a user query, the AI should drive relevance and minimize the time for product discovery. Predicting the intended category is only an intermediate step toward achieving the business goal. We may not see improved relevance in the downstream business by driving the accuracy to 95%.
To improve relevance, we could have another model called AI No. 2 that predicts the intended “price range.” It may also be a good idea to develop AI No. 3 to predict the intended “functionality,” AI No. 4 for intended “attributes,” and AI No. 5 to predict the propensity of a user to buy vs. explore. This would normally be difficult to achieve without hurting the latency or operational cost. But what if we could squeeze all five new models into the same 20ms budget without using additional machines?
Only numbers can decide!
There is no limit to the variations and combinations of AI models that can assist in driving relevance. Unfortunately, to know the impact of these models, we must develop the AIs and see the actual relevance with some form of A/B testing for business value. Even worse, we must constantly experiment with all of these models repeatedly, to respond to frequent changes in the data distribution.
If we only invest in improving a single AI task, we are likely not materializing AI’s complete potential. We live in an AI era where possibly-valuable ideas are being generated very rapidly. We must have a broad focus, but it is not scalable or sustainable to handle new hypotheses by adding new headcount.
The shorter the time to a better hypothesis, the quicker the progress.
AutoML: The Good, The Bad, The Ugly.
To validate an AI hypothesis, we need three main components:
- The dataset.
- Production-ready AI models.
- Infrastructure for A/B testing.
The datasets, and likely the testing infrastructure, are specific and domain dependent. The appeal of AutoML is in automating the development of production-ready AI models, thus driving down the time for validations. However, the current ML software technology is not efficient. AutoML trains hundreds of combinations of models with existing ML software, resulting in huge and unpredictable costs.
Most AutoML tools are also constrained by the amount and type of information they can leverage, resulting in model capabilities that are still likely sub-optimal. To illustrate, let’s imagine a “Netflix” style dataset that consists of timestamped transactions. The transaction records that user “userid” watched “movieid” at time “timestamp” and rated with “4“. The userid and movieids are categorical features with millions of values. In addition, we have textual metadata for each “userid” and “movieid“. This type of dataset occurs naturally in many settings. Many problems in e-commerce, as well as other real-time predictive tasks like forecasting or estimation, can be modeled in a similar format.
For personalization, the ML framework should be able to handle a large number of categories and build predictive models that are conditional on every userid. Sequential information, which is almost always available in the form of timestamps, provides critical insight for building truly personalized models that are aware of ever-evolving temporal behaviors. In addition, meta-data in the form of text will likely require us to leverage some form of NLP. Ultimately, we need ML solutions that combine all the modalities of information.
Modern Deep Learning has shown remarkable progress with the Neural Scaling Law, which argues that large neural networks can model variations in multi-modal varieties of inputs. However, current ML software cannot train large neural models efficiently. Even after the costly training process is over, these models are still likely not production-ready as their inference latency or compute are prohibitive. It is not surprising, then, that these multi-modal capabilities are absent in current AutoML frameworks.
ThirdAI’s Efficient AutoML powered by BOLT.
Thanks to ThirdAI’s BOLT engine, we can train models with billions of parameters on any standard CPU with remarkable speed. Our sparsity-accelerated inference takes only a few milliseconds, even with very large model sizes, resulting in production-ready large deep-learning models from the time of inception. This capability makes it easy and cheap to experiment with many large neural models. BOLT provides the technological leap to unlock the next generation of AutoML software that can leverage multi-modal, temporal, personalized, and NLP data, all in one production-ready neural model for accurate prediction with push-button deployment.