Announcement: PocketLLM was featured on ProductHunt! | Check out the latest AWS blog on ThirdAI benchmarks 

ThirdAI’s Private and Personalizable Neural Database: Enhancing Retrieval-Augmented Generation (Part 3/3)

This is the final chapter (3/3) in our series on building an AI-Agent using retrieval augmented generation. In part 1/3, we discussed the limitations of a disconnected embedding and vector-based retrieval pipeline. In part 2/3, we introduced Neural Databases, which eliminates the need for storing and manipulating heavy and expensive embeddings. Instead, it uses a simple unified end-to-end learnable retrieval system. We argued that embedding representations are 3–25 times heavier than the text data itself, while neural databases only require a few billion parameter networks and simple integer hash tables (less than 20GB overhead), even for hundreds of gigabytes to terabytes of text, resulting in a significant reduction in memory usage.

We concluded part 2/3 by highlighting ThirdAI’s scientifically proven “dynamic sparsity” as a crucial capability for building and deploying LLMs required for the Neural Database on CPUs. For Neural Databases to be widely applicable, commodity infrastructure having simple CPUs should be sufficient for their training and deployment.

Why ThirdAI?: Two Major Breakthroughs Making NeuralDB Commercially Viable on Commodity CPUs

The figure below illustrates the components of ThirdAI’s NeuralDB system. NeuralDB is a recent concept, and its implementations are specialized and rare, mainly found in selected industries like Meta. However, in order to make NeuralDB commercially available, a unique team of experts is required to combine expertise in crafting neural networks and its integration with a highly parallelized hash table based retrieval systems. It takes years of experience in making design choices and automating internal processes to make it widely accessible.
The ThirdAI team has been at the forefront of these ideas. Our founders and team members have pioneered some of the earliest works on end-to-end and efficient learned retrieval systems. The key papers NIPS 14 (Best Paper), NeurIPS 2019, ICLR 2020, KDD 2022 are cited in the end.
Our NeuralDB requires large language model (LLMs) that maps text into a large space of discrete buckets. The range of buckets can easily reach several millions and beyond, while GPT models typically deal with an output space of only 50k. Training, fine-tuning, and performing inference with such a large LLM on CPUs would be impossible without ThirdAI’s “dynamic sparse” BOLT Engine. This unique software stack, pioneered by ThirdAI, is an integral part of our approach.
It should be noted that the ability to run all the operations of NeuralDB entirely on CPUs is crucial for its adoption, particularly for applications like PocketLLM, which uses NeuralDB. This technology enables the most advanced neural search system to be available on laptops and desktops, catering to general no-code users with limited computing resources.
Before we delve into ThirdAI’s NeuralDB APIs and their seamless integration with langchain and ChatGPT, we summarize the differences and advantages of Neural Database over the existing ecosystem, as shown in the table above.

ThirdAI’s Lightweight NeuralDB Python APIs for Any Environment (On-premise or On-Cloud)

We are excited to introduce our NeuralDB APIs, a CPU-only “semantic-retrieval” ecosystem. Our NeuralDBs offer advanced semantic search and fine-tuning capabilities with simple, auto-tuned APIs to deliver an easy user experience. These capabilities are also accessible on laptops/desktops (Windows and Mac) using the NO-CODE UI interfaces provided by the PocketLLM app.
  1. Automatic Self-supervised Pre-training on the Inserted Text: Insert any raw text into the NeuralDB with a flag for additional fine tuning on the new data. The flag kicks in a pre-training process that allows the NeuralDB to specialize in understanding co-occurrences within the inserted text. This process is adaptable to variety of inputs such as logs, codes, or even multilingual data. Unlike existing fixed and pre-trained embedding models, self-supervised pre-training empowers NeuralDB to be domain specialized, providing a significant upgrade in end-to-end retrieval.
  2. Supervised Training of NeuralDB: In addition to self-supervised pre-training, NeuralDB can also be trained in a supervised manner. You can leverage text-to-text mappings (weak or strong) to specify textual information that should be close to each other, similar to contrastive training of embedding models. Furthermore, any supervised mapping from text to a known category, such as product search engines mapping user queries to products, can be utilized.
  3. Real-Time Reinforcement Learning with Human Feedback: NeuralDB can be further refined in real-time using human feedback. Two forms of human feedback are supported by NeuralDB APIs. First, preference information can be used, where users provide a thumbs-up or upvote on the best option among several retrieved options. Second, the model can be guided to associate two different text strings in an online fashion, similar to supervised training. For example, you can align NeuralDB to understand the oil industry jargon where “WOW” is associated with “Wait On Weather.”
NeuralDB API functionalities offer precise control and personalization of the retrieval ecosystem. You no longer need to depend solely on the open-source community or existing LLM service providers to improve AI models for your specific needs. With NeuralDB, you can take charge and provide the vision and refinements that are optimal for your business requirements. This is the true democratization of AI for everyone.

The AI community has recognized a key lesson from the success of ChatGPT: even the most advanced AI systems require constant human expert feedback. Our NeuralDB is designed with this in mind. Achieving a high-quality AI model is a continuous process that involves ongoing training, fine-tuning, and reinforcement learning.

NeuralDB: A Much-Needed Reduction in the AI Software Stack

The LLM (Large Language Model) stack has become increasingly complex with multiple layers and components, surpassing the complexity of traditional AI stacks. Developers are realizing that each component adds more friction, uncertainties, points of failure, costs, and latency. The heavy GPU infrastructure required for embedding models forces developers to build an inefficient ecosystem with constant data movements between CPUs and GPUs. In short, the more components and data movements involved, the harder it becomes to manage and debug the process.
At ThirdAI, our unique technology allows us to significantly simplify the LLM stack by eliminating the generation and management of intermediate embedding representations. By co-locating with the data and eliminating back-and-forth data movement between CPUs and GPUs, we achieve a simplified stack that prioritizes privacy, stability, and reliability.

Resources, Notebooks, and PubMed Q & A NeuralDB

All our APIs are summarized in this straightforward Python notebook. To use them, you can apply for a free ThirdAI license here. These notebooks can run efficiently on laptops, processing thousands of pages in just a few minutes. As an example, we have a completely free NeuralDB that is pre-trained on an 800k Pubmed abstract dataset. It was trained on a single CPU in a few hours. You can download the model and directly use it for question answering using the provided script.

References

  1. BLISS: A Billion scale Index using Iterative Re-partitioning. Gaurav Gupta, Tharun Medini, Anshumali Shrivastava, and Alex Smola SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD) 2022.
  2. SOLAR: Sparse Orthogonal Learned and Random Embeddings. Tharun Medini, Beidi Chen, Anshumali Shrivastava International Conference on Learning Representations (ICLR) 2021.
  3. Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products. Tharun Medini, Qixuan Huang, Yiqiu Wang, Vijai Mohan, Anshumali Shrivastava Neural Information Processing Systems (NeurIPS) 2019.
  4. Asymmetric LSH (ALSH) for Sublinear Time Maximum Inner Product Search (MIPS). Anshumali Shrivastava and Ping Li. Neural Information Processing Systems (NIPS) 2014 Best Paper Award.