Using our BOLT engine, we demonstrate 1 ms inference latency on text classification tasks: 50 times faster and 10% more accurate than the popular RoBERTa model.
What’s more, BOLT attains this speed and performance with a giant 2 billion parameter network (5x bigger than RoBERTa) that was trained, from scratch, for just 2 hrs on a modest Intel CPU.