We are pleased to announce the power of BOLT Engine for training large deep learning models on any CPU. The Bolt Engine is an algorithmic accelerator for training deep learning models that can achieve or even surpass GPU-level performance on commodity CPU hardware. Bolt is the first engine where computation reductions are exponential. The BOLT algorithm achieves neural network training in 1% or fewer FLOPS, unlike standard tricks like quantization, pruning, and structured sparsity, which only offer a slight constant factor improvement. As a result, we don’t have to rely on any specialized instructions, and the speedups are naturally observed on any CPU, be it Intel, AMD, or ARM. Even older versions of commodity CPUs can be made equally capable of training billion parameter models faster than A100 GPUs. And to top it all, the BOLT engine can be invoked via just a few line changes in existing python machine learning pipelines. To know more about our technology, click here.