• Live Crypto Prices
  • Crypto News
    • Worldwide
      • Bitcoin
      • Ethereum
      • Altcoin
      • Blockchain
      • Regulation
    • Australian Crypto News
  • Education
    • Cryptocurrency For Beginners
    • Where to Buy Cryptocurrency
    • Where to Store Cryptos
    • Cryptocurrency Tax in Australia 2021
No Result
View All Result
CryptoABC.net
No Result
View All Result

Perplexity AI Leverages NVIDIA Inference Stack to Handle 435 Million Monthly Queries

December 6, 2024
in Blockchain
Reading Time: 2min read
0 0
A A
0
Nvidia Plans to add Innovation in the Metaverse with Software, Marketplace Deals
0
SHARES
5
VIEWS
ShareShareShareShareShare


Terrill Dicki
Dec 06, 2024 04:17

Perplexity AI utilizes NVIDIA’s inference stack, including H100 Tensor Core GPUs and Triton Inference Server, to manage over 435 million search queries monthly, optimizing performance and reducing costs.





Perplexity AI, a leading AI-powered search engine, is successfully managing over 435 million search queries each month, thanks to NVIDIA’s advanced inference stack. The platform has integrated NVIDIA H100 Tensor Core GPUs, Triton Inference Server, and TensorRT-LLM to efficiently deploy large language models (LLMs), according to NVIDIA’s official blog.

Serving Multiple AI Models

To meet diverse user demands, Perplexity AI operates over 20 AI models simultaneously, including variations of the open-source Llama 3.1 models. Each user request is matched with the most suitable model using smaller classifier models that determine user intent. These models are deployed across GPU pods, each managed by an NVIDIA Triton Inference Server, ensuring efficiency under strict service-level agreements (SLAs).

The pods are hosted within a Kubernetes cluster, featuring an in-house front-end scheduler that directs traffic based on load and usage. This ensures consistent SLA adherence, optimizing performance and resource utilization.

Optimizing Performance and Costs

Perplexity AI employs a comprehensive A/B testing strategy to define SLAs for varied use cases. This process aims to maximize GPU utilization while maintaining target SLAs, optimizing inference serving costs. Smaller models focus on minimizing latency, while larger, user-facing models like Llama 8B, 70B, and 405B undergo detailed performance analysis to balance costs and user experience.

Performance is further enhanced by parallelizing model deployment across multiple GPUs, increasing tensor parallelism to achieve lower serving costs for latency-sensitive requests. This strategic approach has enabled Perplexity to save approximately $1 million annually by hosting models on cloud-based NVIDIA GPUs, surpassing third-party LLM API service costs.

Innovative Techniques for Enhanced Throughput

Perplexity AI is collaborating with NVIDIA to implement ‘disaggregating serving,’ a method that separates inference phases onto different GPUs, significantly boosting throughput while adhering to SLAs. This flexibility allows Perplexity to utilize various NVIDIA GPU products to optimize performance and cost-efficiency.

Further improvements are anticipated with the upcoming NVIDIA Blackwell platform, promising substantial performance gains through technological innovations, including a second-generation Transformer Engine and advanced NVLink capabilities.

Perplexity’s strategic use of NVIDIA’s inference stack underscores the potential for AI-powered platforms to manage vast query volumes efficiently, delivering high-quality user experiences while maintaining cost-effectiveness.

Image source: Shutterstock


Credit: Source link

ShareTweetSendPinShare
Previous Post

XRP Price Steadies Above Support: Preparing for the Next Move?

Next Post

DeFi’s TVL Skyrockets with Bitcoin’s BitVM-Powered Push

Next Post
DeFi’s TVL Skyrockets with Bitcoin’s BitVM-Powered Push

DeFi’s TVL Skyrockets with Bitcoin’s BitVM-Powered Push

You might also like

Trump-Iran war deal nudges Israel PM market, Eizenkot leads at 38.55%

Trump curbs OpenAI launch as Polymarket prices Newsom at 20.7%

June 26, 2026
Micro AGI’s in-home robot data push as Polymarket keeps Anthropic at 95%

Micro AGI’s in-home robot data push as Polymarket keeps Anthropic at 95%

June 22, 2026

Congress Sends Anti-CBDC Housing Bill To Trump After House Vote

June 24, 2026
Trump headlines as state fair saga fuels 2028 nomination market

Inflation gauge hits 3-year high as Polymarket pegs July Fed hold at 77.5%

June 25, 2026
Strive Seeks $4.2B ATM Expansion To Fund More Bitcoin Buys

Strive Adds 759 Bitcoin As Corporate BTC Treasury Race Continues

June 22, 2026
Bitcoin Trapped as Liquidation Maps Spot Major Resistance an

Bitcoin Trapped as Liquidation Maps Spot Major Resistance an

June 27, 2026
CryptoABC.net

This is an Australian online news/education portal that aims to provide the latest crypto news, real-time updates, education and reviews within Australia and around the world. Feel free to get in touch with us!

What's New Here!

Trump-Iran war deal nudges Israel PM market, Eizenkot leads at 38.55%

Letlow primary win shifts Iran-entry market as Polymarket puts Senators at 55%

June 28, 2026
Fed Likely Holds Rate as Market Bets Persist on July Decision

GOP affordability feud hits headlines as Polymarket’s Newsom slips to 20.65%

June 27, 2026

Subscribe Now

  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

© 2021 cryptoabc.net - All rights reserved!

No Result
View All Result
  • Live Crypto Prices
  • Crypto News
    • Worldwide
      • Bitcoin
      • Ethereum
      • Altcoin
      • Blockchain
      • Regulation
    • Australian Crypto News
  • Education
    • Cryptocurrency For Beginners
    • Where to Buy Cryptocurrency
    • Where to Store Cryptos
    • Cryptocurrency Tax in Australia 2021

© 2021 cryptoabc.net - All rights reserved!

Welcome Back!

Login to your account below

Forgotten Password?

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Please enter CoinGecko Free Api Key to get this plugin works.