Announcement: Checkout out our new BOLT2.5B LLM – World’s first Generative LLM trained exclusively on CPUs

Improving the Productivity of Data Scientists with Universal Deep Transformers (UDT)

Understanding the fundamental prototype-to-production gap inherent in AI development. How neural scaling laws with algorithmic advancements in sparsity-based training and inference opens the door for a very optimistic future.
A recent venturebeat article suggests that 87% of data science projects never make it into production or deployment. One of the primary reasons for this grim outcome was the lack of collaboration among different parties involved in data science and engineering. However, there is a bigger, more fundamental problem: the lack of an ML software ecosystem focused on productivity and software maintenance. Existing ML ecosystems are only focused on rapid prototyping and ease of hypothesis validation. Unsurprisingly, we get what we build (or optimize) the software ecosystem for — a zoo of great prototypes sitting in scripts without seeing the light of production.
Prototyping challenges with a rapidly evolving Machine Learning ecosystem: Most data science exploration starts with an aim to build improved prototypes. Data scientists are always on their toes with the latest and greatest idea to solve the most complex problem and improve the accuracy of their AI/ML pipelines. The pace at which machine learning is advancing requires data scientists to constantly experiment with newer tools and ideas. The multitude of open-source software, almost all designed for fast prototyping, has made this process significantly faster.
Liberal Open Source Use and Code Duplication: Many libraries are available for any given task in the exploding open-source ecosystem. For example, a simple clustering or classification subroutine can be solved with hundreds of different packages equally well in terms of accuracy. However, each of these functionalities likely has drastically different implementations, interface designs, and system performance.
Even with a very identical-looking structure, different ML prototypes are entirely different when engineering it for production. In software development jargon, the consequence is a well-known issue that goes by the name code duplication. The same functionality independently developed in two places is one of the primary reasons for code duplication. Robert C. Martin explains in his book Clean Code:

“Duplication may be the root of all evil in software.”

The need to maintain a specialized team for a given AI pipeline: It is easy to see that with the number of choices in the open-source world, taking any promising AI prototype to production will require a dedicated engineering team. Some prototypes may use decision trees instead of neural networks for classification. Some prototypes may use a two-tower model from GitHub that is not designed for parallel or distributed environments. The list goes on. Engineering work to optimize pipelines for one data science team won’t help any other group, and the collaboration thus goes out of the window. We cannot optimize every computation, and a poor library or data processing choice will make the prototype infeasible with production constraints. This Wayfair blog highlights how the strict latency challenge in e-commerce pipelines prohibits most standard ML libraries. Many prototypes, therefore, do not see the light of production.
Neural scaling laws are the game changer: Fortunately, the ML world is moving towards the unification of different types of models and pipelines. Neural scaling laws informally state that large neural networks are the ideal architecture, no matter the task. The size of the neural network matters rather than the architecture. More and more recent results validate these claims, including the success of models like GPT-3 on various tasks. As a result, a universal pipeline for most AI problems involves an extensive neural network trained on enough supervised data. If there is not enough supervised data, self-supervised pre-training followed by fine-tuning on a few examples seems ideal. Both pre-training and fine-tuning are essentially the same supervised training process but on auxiliary tasks. Most feature extraction pipelines are pre-trained large neural networks. It is excellent news from an engineering perspective because it implies that there is one common infrastructure that we should optimize for production — training and inference with large neural networks. With this one optimized infrastructure, we should be able to nail most ML pipelines of the future.

Prohibitive infrastructure/energy cost for training large neural networks: Building AI infrastructure that requires training and deploying large neural networks is too expensive for many industries. The software available in the open-source community requires an extensive fleet of costly hardware, such as GPUs and TPUs, to tame these massive neural models. In addition, integrating this hardware into the existing software stack is a nightmare that requires specialized and costly engineers. GPUs, TPUs, and other hardware accelerators for deep neural networks also have significantly higher carbon footprints than their commodity counterparts, the CPUs. Unfortunately, only some industries can afford the investment to build such infrastructure.

Technological Leap: Beyond Tensors Hash-based Deep Learning on CPUs: The good news is that algorithmic breakthroughs have shown remarkable progress. Recently, hash-based sparsity-inducing algorithms on CPUs have been shown to provide orders of magnitude faster acceleration compared to state-of-art GPUs for training large neural models. These advances will likely change the economics of AI infrastructure for training large neural models.
Introducing UDT (Universal Deep Transformers): CPU-only Software Solution for all your AI needs. ThirdAI Corp., a pioneering startup built by the inventors of hash-based algorithms for deep learning, has developed a unique software engine BOLT (Big Ol’ Layer Training), where training neural networks are orders of magnitude more efficient and economical at scale. The engine can accelerate AI training and inference by several orders of magnitude on any commodity CPU, be it AMD, Intel, or ARM. Powered by this BOLT Engine, ThirdAI offers a unified AI infrastructure for modern AI pipelines: the Universal Deep Transformer (UDT) library. UDT is optimized for large neural networks and supports a variety of related tasks, including supervised training, unsupervised pretraining, self-supervised representation learning, transfer learning, multi-task learning, generation, and many more. The system also offers the optimized distribution of computations over a cluster of CPU machines.
UDT is designed for production deployment as the priority. From raw feature processing to final model prediction, everything built in UDT is production ready since the inception of the pipeline itself. Unlike other machine learning libraries that may require additional compilation steps or developers to write their serialization operations, UDT handles all of these challenges within a single save operation and allows for push-button deployment across various platforms, clouds, or even on-premises.
Furthermore, since UDT can perform model inference, even for billion parameter networks, in as little as 1–2 milliseconds, customers do not need to worry about adding additional model compression steps such as knowledge distillation, quantization, or pruning to their machine learning pipelines. Effectively, the arduous journey of taking a model into production is simplified with ThirdAI. UDT is tested within several popular cloud ML model serving frameworks, validating both the robustness of our model serialization design and the performance of our models at inference time. In particular, UDT can be invoked directly with popular industry standard frameworks, including Databricks, Google Cloud Vertex AI, Microsoft Azure Machine Learning, and Amazon Sagemaker.

Use UDT to channelize all data science efforts towards business objectives and validations: UDTs provides a unifying push-button AutoML support for a variety of functionalities involving large neural networks, including classification, regression, recommendation, forecasting, generation, multi-modal, multi-task, pretraining, finetuning, and many more. As a result, data scientists can focus on essential issues such as selecting the right task that aligns with business objectives, improving the quality of the dataset for supervised or self-supervised learning, and drive accuracy to align with the goals of the company. Worrying about their codes’ production readiness is no longer a concern if the pipeline is built using UDT.

Machine learning has matured from a scientific curiosity to an essential lever in driving business outcomes. We in turn need an ML software stack with production readiness at the center so valuable data science innovations immediately turn into value for customers. With UDT, we have built such an automated, ready-to-deploy toolkit and have already seen the difference it has made for multiple organizations.