Announcement: Checkout out our new BOLT2.5B LLM – World’s first Generative LLM trained exclusively on CPUs

Generative AI’s Dark Horse: Query Reformulation

The future of e-commerce product search looks promising with language models, but privacy concerns and the high latencies of these models currently prohibit widespread adoption. We need smarter algorithms to change the fundamental economics of large-scale AI.
The remarkable capabilities of ChatGPT and Generative AI have triggered a flurry of discussions around their potential use cases. It is not inconceivable that the future of software development could involve dedicated industries training and fine-tuning such mega-AI models across various verticals. Despite the long list of applications that this new technology could disrupt, all of these large language models still need to catch up on a less-known, but widely used application that we interact with daily: query reformulation.
In this post, we introduce the query reformulation problem and discuss why it is an essential component of modern product search engines. We then proceed to describe why massive generative models, such as GPT-3, remain inadequate for reformulation in practical scenarios. Finally, we introduce ThirdAI’s new query reformulation features within our Universal Deep Transformers toolkit that enable organizations of all sizes to benefit from text rewriting at ultra-low latencies on ordinary CPU devices.

What is Query Reformulation?

Search engines routinely transform (or reformulate) a user-typed query text to a different text that better aligns a user’s intent with the retrieval system, saving time and driving the overall search experience. A simple example of query reformulation is spell correction in user-issued queries, but that is just a small fraction of applications.
As a practical illustration, if you type “twin bee” on, you will notice all of the results show “twin beds.” The search system is likely reformulating “twin bee” as “twin bed.” Note that this reformulation goes beyond simple spell correction since “twin bee” is a perfectly valid query in English. Read Amazon’s paper, co-authored by Vihan and myself, about how randomized algorithms solve the problem and meet tight latency constraints.
Domain-Specific Fine-tuning: Query reformulation is also very sensitive to the application domain. Type the same “twin bee” query on Google, and you will notice all of the results refer to the Wikipedia page of a popular arcade shooting game, Twinbee. Clearly, for, the reformulation of “twin bee” should be “twin bed” while, for Google, it should be “twinbee.” In addition to domain-specificity, query reformulation should also be personalized. A query “dust bin with lid” should ideally transform to “trash can with lid” in the USA to leverage prior cached search data, but is a very informative query in the UK.

Need for Query Reformulation: Eliminate Data Sparsity and Drive Relevance

Search engines generate a lot of supervised information associated with “frequent enough” queries. A significant amount of information in the form of purchases, cart additions, and other observed metrics is associated with frequent queries. For rare query texts, we will not have enough information. Since most search engines improve relevance using history and machine learning, it is likely that, in the United States, “trash can with lid” will generate more relevant results than queries with the same intent but less common phrasings such as “dustbin with lid” or “wastebasket with lid.”
A good query reformulation should resolve all of the above texts back to “trash can with lid.” This rewriting will unlock all associated behavioral information and history, even for seemingly rare texts, and drive improved relevance and customer experience.
Real Impact of Query Reformulation: Table 2 from our recent paper with Amazon shows the gain, as measured by A/B testing, on some key e-commerce metric. A lexical query reformulation system improves the click-through-rate by 7.26%, resulting in a statistically significant revenue and purchases boost when compared to the existing system that already had a good query reformulation in place.

The message is loud and clear, if you don't have a query reformulation solution you absolutely need one. Moreover, if you have one, it is worth exploring a better one.

GPT to reformulate a query? The LATENCY and the PRIVACY barrier:

Query reformulation falls under the standard category of text generation. Large language models seem like a slam-dunk technology to solve this problem. However, existing popular solutions for text generation are prohibitively slow and expensive. E-commerce systems operate in real time and work under a tight latency budget with a typical end-to-end latency limit of fewer than 100 milliseconds. A well-known study shows that any system taking more than 100ms is likely to lose sales because of poor user experience.
With all the overheads of the search system, typically any subroutine that takes more than 10 to 20 milliseconds for query reformulation is prohibitive. Most popular language models for text generation, even with the best hardware, require a latency of at least 40 milliseconds for a single token. Assuming we are generating, on an average, 5–6 tokens recursively, we are looking at at least 200 milliseconds just for the query reformulation without system overheads. An important point to note is that batching queries and parallelism do not help. Even if we serve a million queries in parallel and finish all of them in 1 second, the queries individually still see a response time of 1 second, which is unacceptable in all production search systems.
The second and likely more significant issue prohibiting existing LLMs is privacy. Most LLM services are hosted on public clouds and require the transfer of user-typed query text to the hosted device for reformulations. In addition, since query reformulation is very sensitive to the context, LLMs must be fine-tuned on the domain-dependent query text. It is well known that query logs are privacy-critical, and the infamous AOL blunder, where anonymized query logs were released to the public, is still considered one of the poorest decisions in business. Given ever-tightening privacy regulations, transmitting query logs to a third-party service will be a non-starter for many organizations.

ThirdAI’s Universal Deep Transformers (UDT)

Build your own privacy preserving query reformulation with < 10ms reformulation latency
ThirdAI’s Universal Deep Transformers (UDT) leverages intelligent algorithms to enable all developers to build a powerful query reformulation engine with single-digit millisecond latency on an ordinary CPU. There is no need to transfer the query logs anywhere.
ThirdAI’s algorithmic efficiency allows us to offer our customers private query reformulation services without an exorbitant hardware bill. ThirdAI also provides services to build large language models with complete privacy locally. The data, as well as the model, never leaves your machine. Our demo YouTube video, also provided above in this post, gives a detailed walk-through of UDT’s interface and illustrates how to unlock powerful query reformulation with a push-button command.
Code and other details: Our code and a demo with all of the details are available here. The demo uses a public dataset of sentence correction, a similar task to real-world query reformulation challenges. Please follow this link to run our demo on a Google Colab notebook — yes a free single CPU machine is enough to try it!
Our query reformulation system is currently tailored for speed and constrained to generate text candidates close to the training distribution. Please contact us with any customization requests. The current system can be used as a candidate generator, which provides a good set of reformulated results in 2–3 milliseconds, allowing for further re-ranking if necessary. UDT can be integrated into any standard ML pipeline and is integrates seamlessly across various platforms, including Databricks, Vertex AI, Sagemaker, Azure, and more.