Fully connected networks are ubiquitous in recommendation systems and NLP tasks. State of the art models like Facebook’s DLRM, Microsoft’s DSSM, and Amazon’s DSSM are all giant fully connected neural networks.
To benchmark BOLT, we trained two networks, one with 200 million parameters and the other with 1.6 billion parameters. The aforementioned industry-scale recommendation models have 200 to 500 million parameters. Notably, even BERT-Large is in the same ballpark, with 340 million parameters to train. For a task representative of several industry-scale applications, we choose the Amazon 670K Kaggle dataset. For our two BOLT networks, one consisted of a 256-dimensional hidden embedding (200 million parameter model) and the other consisted of a 2000 dimensional hidden layer (1.6 billion parameter model).