TEXT CLASSIFICATION

Text Classification Demo

Using our BOLT engine, we demonstrate 1 ms inference latency on text classification tasks: 50 times faster and 10% more accurate than the popular RoBERTa model.
What’s more, BOLT attains this speed and performance with a giant 2 billion parameter network (5x bigger than RoBERTa) that was trained, from scratch, for just 2 hrs on a modest Intel CPU.

Large Language MODEL TRAINING and INFERENCE on COMMODITY CPU

Natural Language Processing (NLP) forms the backbone of AI-driven automated response systems. An essential component of understanding natural language is understanding the intent and sentiment. The sentence “I love milk but it makes me lethargic” is something that humans clearly understand as being of negative sentiment in the sense that the person doesn’t want milk. It is, however, non-trivial for computers to figure this out. Most commercial recommendation, search, and service systems rely heavily on identifying these subtle text nuances to streamline offerings.

Deep Learning and recent advances in NLP have made it possible to understand the intent of a query through giant, carefully engineered models that have been trained on a vast corpora of data. Advances in NLP and state-of-the-art models like RoBERTa are getting better and better at this task. However, they need a fleet of expensive GPUs to offer real-time rest APIs with an acceptable latency of 10ms per query.

The ThirdAI Difference

ThirdAI delivers high accuracy and performant AI on commodity CPUs. Our technology is based upon scientifically proven hash-based processing algorithms, which unlock game-changing accuracy by training significantly bigger models at ease. As a result, our commodity CPUs are sufficient to capitalize on the scaling laws of neural networks.

ThirdAI’s BOLT Engine leverages sparsity for training huge text classification models with billions of parameters in just about 2 hours on millions of data samples on a commodity old generation 24 core CPU.

Methods RoBERTa
(fine tuned for sentiment)
ThirdAI’s BOLT
Accuracy
83.02%
93%
Training Time
Several days pre-training on GPUs with 40 hrs fine tuning.
20 minutes training from scratch on laptop
Inference latency, (per sample.)
46ms
1ms
No of Parameters (in millions)
355MM
200MM
Annual Cost**: Assume 100MM inference per week.
400k USD
2.3k USD
Annual Cost**: Assume Fine Tuning once a week
52K USD
Trivial

Case Study

Sentiment Classification on yelp review dataset

** If we anticipate 100 MM queries per week, your p3.16x large instances on AWS will incur an annual cost of $400K. With ThirdAI’s BOLT engine, to serve the same number of queries at the same latency, you would only need one r6i.xlarge CPU instance with an annual cost of just $2.3K!!. The best rest API offering of RoBERTa takes 46ms per query on the same CPU. To meet the standard 10ms latency requirement, we need a p3.16x GPU box on AWS.
The advantages of scaling laws are evident in our following case study on sentiment classification. Training a 5x larger model (with a large batch size) results in significant accuracy gains. With BOLT, this is never a concern, and CPUs have substantial memories. With BOLT’s sparsity, it only takes 2 hours of training time!

There is no reason not to train your accurate model with mere 2 hours of CPU time. However, likely, you do not have enough data to train. We have a solution for that too! Our out-of-the-box text classification model provides easy-to-use APIs for predicting the sentiment of an input sentence in just about 1ms on a 24 core CPU!

How does it work?

A simple API call:

str = “money is very good only for greedy people”
prediction = boltmodel.predict(str)

print(prediction.sentiment) Output: Negative Sentiment
print(prediction.confidence) Output: Confidence = 84.2%
print(prediction.explanation) Output: Most responsible keywords: “greedy”, “only”.

Unlock the complete power of AI with ThirdAI’s BOLT

Base

tier 1

Core Deep Learning Recommendation Engine

Reduce cost

State-of-the-art AI/NP accuracy

< 1ms inference latency on CPUs

Privacy compliant

Add-on

tier 2

Sequential and Personalization

Captures temporal patterns of user behavior and choice of products

Personalized search and recommendation based on contextual information

Add-on

tier 3

Graph Neural Network and Explainability

Ingest relationships between users and products

Identify the most relevant features

Add-on

tier 4

Continual Learning

Model accuracy improves over time with usage

Automatically identify and rectify mistakes

ThirdAI’s BOLT unlocks the power of AI and NLP (Natural Language Processing) in any product search engine without any effort.  Our push-button solution integrates with existing search engines without any engineer or data scientist’s effort. BOLT consumes historical data of past customer interactions and automatically builds and deploys the AI model with the highest seen accuracy and at 100x less cost for AI.

BOLT gives you a complete handle on training, retraining, and deployment in any given environment or infrastructure of your choice. Our scalable AI can be easily extended to provide personalization. In addition,  BOLT can leverage and mine known relationships between entities in the form of a graph.  Furthermore, BOLT provides a continually learning solution, which means that the model evolves automatically with more usage.

In addition to autotuning all the hyperparameters, BOLT also autotunes for the available resources and latency budget.  Therefore there is no need to spend on expensive data scientists and ML engineers. Overall, BOLT takes care of the complete AI cycle on any infrastructure.