APRIL 6, 2022
Up to 50x faster training and inference with large NLP models on CPU: Text Classification, Question Answering, and more.
Learn how to unlock 50x or more acceleration on both training and inference with large NLP and recommendation models on commodity hardware. Leverage the power of sparsity to train billion parameter models in a few hours with the cheapest and most available hardware.
During the session, our engineering team will take you through the following solutions:
Product Recommendation: Learn how you can change two lines of code to train your own neural network on a CPU significantly faster than on a state of the art GPU. Our BOLT engine can accelerate a range of networks – from a few million parameter model trained on MNIST to billion parameter product recommendation models.
Text Classification: Using our BOLT engine, we will showcase 1 ms inference latency on text classification tasks: 50 times faster and 10% more accurate than the popular RoBERTa model. What’s more, BOLT attains this speed and performance with a giant 2 billion parameter network (5x bigger than RoBERTa) that was trained, from scratch, for just 2 hrs on a modest Intel CPU.
Document Search: Finally, we will demonstrate state of the art retrieval accuracy with sub-100 ms latency on document search on a modest CPU, 25x faster than ColBERT inference on CPU.