• Live Crypto Prices
  • Crypto News
    • Worldwide
      • Bitcoin
      • Ethereum
      • Altcoin
      • Blockchain
      • Regulation
    • Australian Crypto News
  • Education
    • Cryptocurrency For Beginners
    • Where to Buy Cryptocurrency
    • Where to Store Cryptos
    • Cryptocurrency Tax in Australia 2021
No Result
View All Result
CryptoABC.net
No Result
View All Result

Perplexity AI Leverages NVIDIA Inference Stack to Handle 435 Million Monthly Queries

December 6, 2024
in Blockchain
Reading Time: 2min read
0 0
A A
0
Nvidia Plans to add Innovation in the Metaverse with Software, Marketplace Deals
0
SHARES
5
VIEWS
ShareShareShareShareShare


Terrill Dicki
Dec 06, 2024 04:17

Perplexity AI utilizes NVIDIA’s inference stack, including H100 Tensor Core GPUs and Triton Inference Server, to manage over 435 million search queries monthly, optimizing performance and reducing costs.





Perplexity AI, a leading AI-powered search engine, is successfully managing over 435 million search queries each month, thanks to NVIDIA’s advanced inference stack. The platform has integrated NVIDIA H100 Tensor Core GPUs, Triton Inference Server, and TensorRT-LLM to efficiently deploy large language models (LLMs), according to NVIDIA’s official blog.

Serving Multiple AI Models

To meet diverse user demands, Perplexity AI operates over 20 AI models simultaneously, including variations of the open-source Llama 3.1 models. Each user request is matched with the most suitable model using smaller classifier models that determine user intent. These models are deployed across GPU pods, each managed by an NVIDIA Triton Inference Server, ensuring efficiency under strict service-level agreements (SLAs).

The pods are hosted within a Kubernetes cluster, featuring an in-house front-end scheduler that directs traffic based on load and usage. This ensures consistent SLA adherence, optimizing performance and resource utilization.

Optimizing Performance and Costs

Perplexity AI employs a comprehensive A/B testing strategy to define SLAs for varied use cases. This process aims to maximize GPU utilization while maintaining target SLAs, optimizing inference serving costs. Smaller models focus on minimizing latency, while larger, user-facing models like Llama 8B, 70B, and 405B undergo detailed performance analysis to balance costs and user experience.

Performance is further enhanced by parallelizing model deployment across multiple GPUs, increasing tensor parallelism to achieve lower serving costs for latency-sensitive requests. This strategic approach has enabled Perplexity to save approximately $1 million annually by hosting models on cloud-based NVIDIA GPUs, surpassing third-party LLM API service costs.

Innovative Techniques for Enhanced Throughput

Perplexity AI is collaborating with NVIDIA to implement ‘disaggregating serving,’ a method that separates inference phases onto different GPUs, significantly boosting throughput while adhering to SLAs. This flexibility allows Perplexity to utilize various NVIDIA GPU products to optimize performance and cost-efficiency.

Further improvements are anticipated with the upcoming NVIDIA Blackwell platform, promising substantial performance gains through technological innovations, including a second-generation Transformer Engine and advanced NVLink capabilities.

Perplexity’s strategic use of NVIDIA’s inference stack underscores the potential for AI-powered platforms to manage vast query volumes efficiently, delivering high-quality user experiences while maintaining cost-effectiveness.

Image source: Shutterstock


Credit: Source link

ShareTweetSendPinShare
Previous Post

XRP Price Steadies Above Support: Preparing for the Next Move?

Next Post

DeFi’s TVL Skyrockets with Bitcoin’s BitVM-Powered Push

Next Post
DeFi’s TVL Skyrockets with Bitcoin’s BitVM-Powered Push

DeFi’s TVL Skyrockets with Bitcoin’s BitVM-Powered Push

You might also like

XRP Starts New Week With Bullish Confirmation, But This Level Is A Problem

XRP Starts New Week With Bullish Confirmation, But This Level Is A Problem

March 9, 2026
Uniswap (UNI) Price Rallies 6.53% – Is Now the Time to Buy? Comprehensive Analysis & Trading Insights

WIF Price Prediction: Tests $0.21 Resistance as March Rally Potential Emerges

March 7, 2026
Bitcoin Price Prediction: Bitcoin Is Vanishing From Exchanges — Is a Massive Supply Shock Coming?

Bitcoin Price Prediction: Bitcoin Is Vanishing From Exchanges — Is a Massive Supply Shock Coming?

March 6, 2026
Analyst Predicts Conservative XRP Price If It Follows 2017 Run

Analyst Predicts Conservative XRP Price If It Follows 2017 Run

March 7, 2026
XRP Traders Face $50B in Unrealized Losses as Price Slips Below $1.40

XRP Traders Face $50B in Unrealized Losses as Price Slips Below $1.40

March 9, 2026
Sydney-Based Iren Orders 50,000 Nvidia GPUs to Supercharge AI Data Center Expansion

Sydney-Based Iren Orders 50,000 Nvidia GPUs to Supercharge AI Data Center Expansion

March 6, 2026
CryptoABC.net

This is an Australian online news/education portal that aims to provide the latest crypto news, real-time updates, education and reviews within Australia and around the world. Feel free to get in touch with us!

What's New Here!

Understanding the Role and Capabilities of AI Agents

LangChain Gives AI Agents Control Over Their Own Memory Management

March 12, 2026
TVL Spikes 23% In Less Than Two Weeks

TVL Spikes 23% In Less Than Two Weeks

March 12, 2026

Subscribe Now

  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

© 2021 cryptoabc.net - All rights reserved!

No Result
View All Result
  • Live Crypto Prices
  • Crypto News
    • Worldwide
      • Bitcoin
      • Ethereum
      • Altcoin
      • Blockchain
      • Regulation
    • Australian Crypto News
  • Education
    • Cryptocurrency For Beginners
    • Where to Buy Cryptocurrency
    • Where to Store Cryptos
    • Cryptocurrency Tax in Australia 2021

© 2021 cryptoabc.net - All rights reserved!

Welcome Back!

Login to your account below

Forgotten Password?

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Please enter CoinGecko Free Api Key to get this plugin works.