• Live Crypto Prices
  • Crypto News
    • Worldwide
      • Bitcoin
      • Ethereum
      • Altcoin
      • Blockchain
      • Regulation
    • Australian Crypto News
  • Education
    • Cryptocurrency For Beginners
    • Where to Buy Cryptocurrency
    • Where to Store Cryptos
    • Cryptocurrency Tax in Australia 2021
No Result
View All Result
CryptoABC.net
No Result
View All Result

Perplexity AI Leverages NVIDIA Inference Stack to Handle 435 Million Monthly Queries

December 6, 2024
in Blockchain
Reading Time: 2min read
0 0
A A
0
Nvidia Plans to add Innovation in the Metaverse with Software, Marketplace Deals
0
SHARES
5
VIEWS
ShareShareShareShareShare


Terrill Dicki
Dec 06, 2024 04:17

Perplexity AI utilizes NVIDIA’s inference stack, including H100 Tensor Core GPUs and Triton Inference Server, to manage over 435 million search queries monthly, optimizing performance and reducing costs.





Perplexity AI, a leading AI-powered search engine, is successfully managing over 435 million search queries each month, thanks to NVIDIA’s advanced inference stack. The platform has integrated NVIDIA H100 Tensor Core GPUs, Triton Inference Server, and TensorRT-LLM to efficiently deploy large language models (LLMs), according to NVIDIA’s official blog.

Serving Multiple AI Models

To meet diverse user demands, Perplexity AI operates over 20 AI models simultaneously, including variations of the open-source Llama 3.1 models. Each user request is matched with the most suitable model using smaller classifier models that determine user intent. These models are deployed across GPU pods, each managed by an NVIDIA Triton Inference Server, ensuring efficiency under strict service-level agreements (SLAs).

The pods are hosted within a Kubernetes cluster, featuring an in-house front-end scheduler that directs traffic based on load and usage. This ensures consistent SLA adherence, optimizing performance and resource utilization.

Optimizing Performance and Costs

Perplexity AI employs a comprehensive A/B testing strategy to define SLAs for varied use cases. This process aims to maximize GPU utilization while maintaining target SLAs, optimizing inference serving costs. Smaller models focus on minimizing latency, while larger, user-facing models like Llama 8B, 70B, and 405B undergo detailed performance analysis to balance costs and user experience.

Performance is further enhanced by parallelizing model deployment across multiple GPUs, increasing tensor parallelism to achieve lower serving costs for latency-sensitive requests. This strategic approach has enabled Perplexity to save approximately $1 million annually by hosting models on cloud-based NVIDIA GPUs, surpassing third-party LLM API service costs.

Innovative Techniques for Enhanced Throughput

Perplexity AI is collaborating with NVIDIA to implement ‘disaggregating serving,’ a method that separates inference phases onto different GPUs, significantly boosting throughput while adhering to SLAs. This flexibility allows Perplexity to utilize various NVIDIA GPU products to optimize performance and cost-efficiency.

Further improvements are anticipated with the upcoming NVIDIA Blackwell platform, promising substantial performance gains through technological innovations, including a second-generation Transformer Engine and advanced NVLink capabilities.

Perplexity’s strategic use of NVIDIA’s inference stack underscores the potential for AI-powered platforms to manage vast query volumes efficiently, delivering high-quality user experiences while maintaining cost-effectiveness.

Image source: Shutterstock


Credit: Source link

ShareTweetSendPinShare
Previous Post

XRP Price Steadies Above Support: Preparing for the Next Move?

Next Post

DeFi’s TVL Skyrockets with Bitcoin’s BitVM-Powered Push

Next Post
DeFi’s TVL Skyrockets with Bitcoin’s BitVM-Powered Push

DeFi’s TVL Skyrockets with Bitcoin’s BitVM-Powered Push

You might also like

If XRP Price Loses This Current Support, This Is How Low It Will Go

If XRP Price Loses This Current Support, This Is How Low It Will Go

June 4, 2026
Year-end odds on Israel–Indonesia ties shift in Polymarket

Year-end odds on Israel–Indonesia ties shift in Polymarket

June 6, 2026
Bitcoin Hits $0 on Paradex After Starknet Glitch — Mass Liquidations Force Rollback

Bitcoin Slumps Toward $69K as Mt. Gox Moves 10,422 BTC to Unmarked Wallets

June 2, 2026
Bitcoin Holdings in Public Company Treasuries Exceed 200,000 BTC

Legal AI Giant Harvey Opens Singapore Office Amid APAC Expansion

June 2, 2026
Bitcoin Price Prediction: Florida’s Crypto Bill and $198B U.S. Surplus Boost Market Outlook

Why is Crypto Going Down? Iran Just Bombed Kuwait’s Airport and Struck the Strait of Hormuz, Bitcoin Is Crashing Toward Critical Support

June 3, 2026
After a $60M short assault, Aave recommends governance reforms.

AAVE Price Prediction: Oversold Bounce to $80 Within 48 Hours as Whales Load Up

June 4, 2026
CryptoABC.net

This is an Australian online news/education portal that aims to provide the latest crypto news, real-time updates, education and reviews within Australia and around the world. Feel free to get in touch with us!

What's New Here!

Bitcoin Reaches Deep Undervaluation Zone – Time To Get In?

Bitcoin Reaches Deep Undervaluation Zone – Time To Get In?

June 7, 2026
Why Is Crypto Up Today? – October 15, 2025

CPI on June 10 and the FOMC on June 17, Bitcoin’s Next Big Move Will Be Decided in the Next 7 Days

June 7, 2026

Subscribe Now

  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

© 2021 cryptoabc.net - All rights reserved!

No Result
View All Result
  • Live Crypto Prices
  • Crypto News
    • Worldwide
      • Bitcoin
      • Ethereum
      • Altcoin
      • Blockchain
      • Regulation
    • Australian Crypto News
  • Education
    • Cryptocurrency For Beginners
    • Where to Buy Cryptocurrency
    • Where to Store Cryptos
    • Cryptocurrency Tax in Australia 2021

© 2021 cryptoabc.net - All rights reserved!

Welcome Back!

Login to your account below

Forgotten Password?

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Please enter CoinGecko Free Api Key to get this plugin works.