As we introduced in our previous post, ThirdAI is a startup dedicated to democratizing artificial intelligence by enabling all developers to train and deploy large-scale neural networks on commodity CPU hardware. Through algorithmic and software innovations, we aim to reduce the cost of developing state-of-the-art machine learning solutions by orders of magnitude.
UDT Capabilities
- Universal Interface: UDT can tackle a broad range of machine learning tasks and data modalities, from natural language processing, to search, to recommendations, to tabular data analytics, to time series, to text reformulation, and more all through the same API. This universal API simplifies the workflow especially as customers look to apply UDT to multiple business problems.
- Automated Parameter Tuning: ThirdAI’s proprietary algorithms for neural network training and inference are based on years of research in applying hashing and probabilistic data structures towards machine learning. These tools, while effective, involve a lot of custom hyper-parameters that require considerable domain expertise to tune correctly for a given dataset and workload. However, we have eliminated this bottleneck by developing new mathematical techniques for automatically selecting the optimal hyper-parameters associated with our algorithms. As a customer, you only need to invoke the UDT interface on your dataset and then sit back and relax knowing ThirdAI’s technology will select the optimal hyper-parameters and perform feature engineering automatically without any additional computational overhead.
- Billion-Scale Training: Thanks to the ThirdAI’s software engineering innovations as well as the larger memory bandwidth on CPUs compared to GPUs, UDT can seamlessly scale to datasets with billions of entries and extreme classification tasks with hundreds of million of output labels.
- Sub-Millisecond Inference Latency: By utilizing ThirdAI’s proprietary sparse inference algorithms, UDT can achieve inference latencies under 1 millisecond on standard CPUs regardless of the overall model size.
- Immediately Production-Ready: We designed UDT with production deployment at the forefront. After training a UDT model, customers can immediately save the network in a serialized format that can be loaded in a variety of runtime environments with no additional engineering effort.
UDT Case Studies
from thirdai import bolt
from thirdai.demos import download_census_income
train_filename, test_filename, inference_batch = download_census_income()
model = bolt.UniversalDeepTransformer(
data_types={
"age": bolt.types.numerical(range=(17, 90)),
"workclass": bolt.types.categorical(),
"fnlwgt": bolt.types.numerical(range=(12285, 1484705)),
"education": bolt.types.categorical(),
"education-num": bolt.types.categorical(),
"marital-status": bolt.types.categorical(),
"occupation": bolt.types.categorical(),
"relationship": bolt.types.categorical(),
"race": bolt.types.categorical(),
"sex": bolt.types.categorical(),
"capital-gain": bolt.types.numerical(range=(0, 99999)),
"capital-loss": bolt.types.numerical(range=(0, 4356)),
"hours-per-week": bolt.types.numerical(range=(1, 99)),
"native-country": bolt.types.categorical(),
"label": bolt.types.categorical(),
},
target="label",
n_target_classes=2,
)
# Training the model
model.train(train_filename, epochs=5, learning_rate=0.01, metrics=["categorical_accuracy"])
# Evaluating the model
model.evaluate(test_filename, metrics=["categorical_accuracy"]);
# Saving
model.save("income_prediction.model")
# Loading
model = bolt.UniversalDeepTransformer.load("income_prediction.model")
from thirdai import bolt
from thirdai.demos import download_clinc_dataset
train_filename, test_filename, inference_batch = download_clinc_dataset()
model = bolt.UniversalDeepTransformer(
data_types={
"text": bolt.types.text(),
"category": bolt.types.categorical(),
},
target="category",
n_target_classes=150,
)
model.train(train_filename, epochs=5, learning_rate=0.01, metrics=["categorical_accuracy"])
model.evaluate(test_filename, metrics=["categorical_accuracy"]);
save_location = "intent_classification.model"
# Saving
model.save(save_location)
# Loading
model = bolt.UniversalDeepTransformer.load(save_location)
from thirdai import bolt
from thirdai.demos import prepare_query_reformulation_data
import pandas
train_filename, test_filename, inference_batch = prepare_query_reformulation_data()
model = bolt.UniversalDeepTransformer(
source_column="source_queries", target_column="target_queries", dataset_size="medium"
)
model.train(filename=train_filename)
query_reformulations = model.evaluate(filename=test_filename, top_k=5)
model_location = "query_reformulation.model"
# Saving
model.save(filename=model_location)
# Loading
model = bolt.UniversalDeepTransformer.load(model_location)
Conclusion
In this post, we introduced ThirdAI’s latest product offering: Universal Deep Transformers (UDT). With UDT, customers have access to a state-of-the-art AutoML interface for machine learning that operates efficiently on CPUs without the need for tedious manual parameter tuning or domain expertise in deep learning. We encourage you to try out UDT for yourself through our demo notebooks and visit our website. To use UDT for your business needs, please reach out to us by requesting a trial license for our software.