• Live Crypto Prices
  • Crypto News
    • Worldwide
      • Bitcoin
      • Ethereum
      • Altcoin
      • Blockchain
      • Regulation
    • Australian Crypto News
  • Education
    • Cryptocurrency For Beginners
    • Where to Buy Cryptocurrency
    • Where to Store Cryptos
    • Cryptocurrency Tax in Australia 2021
No Result
View All Result
CryptoABC.net
No Result
View All Result

Ray Serve Upgrade Delivers 88% Lower Latency for AI Inference at Scale

March 24, 2026
in Blockchain
Reading Time: 2min read
0 0
A A
0
VeChain Foundation Releases Q1 2024 Treasury Report
0
SHARES
4
VIEWS
ShareShareShareShareShare


Jessie A Ellis
Mar 24, 2026 16:58

Anyscale announces major Ray Serve optimizations with HAProxy and gRPC, achieving 11.1x throughput gains for LLM inference workloads on enterprise deployments.





Anyscale has shipped substantial performance upgrades to Ray Serve that slash P99 latency by up to 88% and boost throughput by 11.1x for large language model inference workloads. The improvements, available in Ray 2.55+, address scaling bottlenecks that have plagued enterprise AI deployments running latency-sensitive applications.

The upgrades center on two architectural changes: HAProxy integration for ingress traffic and direct gRPC communication between deployment replicas. Both bypass Python-based components that previously created chokepoints under heavy load.

What the Numbers Show

In benchmark testing of a deep learning recommendation model pipeline, the optimized configuration pushed throughput from 490 to 1,573 queries per second while cutting P99 latency by 75%. At 400 concurrent users, the performance gap widened dramatically as Ray Serve’s default Python proxy saturated while HAProxy continued scaling.

For LLM inference specifically, the results proved even more striking. Running GPT-class models on H100 GPUs at 256 concurrent users per replica, throughput scaled linearly with replica count when using HAProxy—something the default configuration couldn’t achieve as the Python process hit its ceiling.

Streaming workloads saw 8.9x throughput improvements, while unary request patterns hit the full 11.1x gain.

Technical Architecture Shift

The core problem: Ray Serve’s default proxy runs on Python’s asyncio, which struggles at high concurrency. HAProxy, written in C and battle-tested across production systems globally, handles the same traffic with significantly less overhead.

The second optimization targets inter-deployment communication. Previously, when one deployment called another, Ray Serve routed everything through Ray Core’s actor task system—useful for complex orchestration but overkill for simple request-response patterns. The new gRPC option establishes direct channels between replica actors, serializing with protobuf instead of going through Ray’s object store.

Benchmarks show gRPC alone delivers 1.5x throughput improvement for unary calls and 2.4x for streaming at equivalent latency targets.

Enterprise Implications

These aren’t academic improvements. Companies running recommendation systems, real-time fraud detection, or customer-facing LLM applications have consistently hit Ray Serve’s scaling limits. The partnership with Google Kubernetes Engine that drove these optimizations suggests enterprise demand was substantial enough to prioritize the work.

A single environment variable—RAY_SERVE_USE_GRPC_BY_DEFAULT—enables the gRPC transport. HAProxy activation requires cluster-level configuration but integrates with existing Kubernetes deployments.

Anyscale is working toward making both optimizations the default for all inter-deployment communication, with an RFC currently under discussion. For teams already running Ray Serve in production, the upgrade path is straightforward: update to Ray 2.55+ and flip the appropriate flags.

The benchmark code is publicly available on GitHub for teams wanting to validate performance gains against their specific workloads before deploying.

Image source: Shutterstock


Credit: Source link

ShareTweetSendPinShare
Previous Post

3 Big Hyperliquid News You Might Have Missed This Week

Next Post

Tether Crypto $13Bn Profit Engine Fuels $1.5Bn Bet on Health Intelligence

Next Post
1inch and Innerworks Flip the Script on Hackers: Deploying AI-Powered Immune Layer

Tether Crypto $13Bn Profit Engine Fuels $1.5Bn Bet on Health Intelligence

You might also like

Bitcoin Addresses Holding Between 100 and 10,000 BTC Hit a 7-Week High

Algorand, Aptos Lead Quantum-Resistant Blockchain Efforts: Coinbase

April 22, 2026
XRP Integrations Keep Rolling In Across The Ecosystem

XRP Integrations Keep Rolling In Across The Ecosystem

April 23, 2026
Hong Kong and Israel Central Banks Collaborate on Retail CBDC Prototype

Hong Kong Auctions RMB Sovereign Bonds, Results Due April 22

April 22, 2026
Soldier Charged After Betting on Secret Maduro Arrest Using Classified Intel

Soldier Charged After Betting on Secret Maduro Arrest Using Classified Intel

April 24, 2026
What Bulls Need To Reclaim $2.90 And What Bears Must Break

What Bulls Need To Reclaim $2.90 And What Bears Must Break

April 25, 2026
Justin Sun Sues World Liberty Financial Over WLFI Crypto Token Freeze

Justin Sun Sues World Liberty Financial Over WLFI Crypto Token Freeze

April 22, 2026
CryptoABC.net

This is an Australian online news/education portal that aims to provide the latest crypto news, real-time updates, education and reviews within Australia and around the world. Feel free to get in touch with us!

What's New Here!

Analyst Who Called Bitcoin’s Top Correctly Now Predicting The Bottom

Analyst Who Called Bitcoin’s Top Correctly Now Predicting The Bottom

April 26, 2026
Bitcoin Price Prediction: Florida’s Crypto Bill and $198B U.S. Surplus Boost Market Outlook

XRP NEWS: GraniteShares Just Delayed Its 3x XRP ETF for the Fifth Time: Is the SEC Blocking Leveraged Crypto Products?

April 26, 2026

Subscribe Now

  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

© 2021 cryptoabc.net - All rights reserved!

No Result
View All Result
  • Live Crypto Prices
  • Crypto News
    • Worldwide
      • Bitcoin
      • Ethereum
      • Altcoin
      • Blockchain
      • Regulation
    • Australian Crypto News
  • Education
    • Cryptocurrency For Beginners
    • Where to Buy Cryptocurrency
    • Where to Store Cryptos
    • Cryptocurrency Tax in Australia 2021

© 2021 cryptoabc.net - All rights reserved!

Welcome Back!

Login to your account below

Forgotten Password?

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Please enter CoinGecko Free Api Key to get this plugin works.