• Live Crypto Prices
  • Crypto News
    • Worldwide
      • Bitcoin
      • Ethereum
      • Altcoin
      • Blockchain
      • Regulation
    • Australian Crypto News
  • Education
    • Cryptocurrency For Beginners
    • Where to Buy Cryptocurrency
    • Where to Store Cryptos
    • Cryptocurrency Tax in Australia 2021
No Result
View All Result
CryptoABC.net
No Result
View All Result

Ray Serve Upgrade Delivers 88% Lower Latency for AI Inference at Scale

March 24, 2026
in Blockchain
Reading Time: 2min read
0 0
A A
0
VeChain Foundation Releases Q1 2024 Treasury Report
0
SHARES
4
VIEWS
ShareShareShareShareShare


Jessie A Ellis
Mar 24, 2026 16:58

Anyscale announces major Ray Serve optimizations with HAProxy and gRPC, achieving 11.1x throughput gains for LLM inference workloads on enterprise deployments.





Anyscale has shipped substantial performance upgrades to Ray Serve that slash P99 latency by up to 88% and boost throughput by 11.1x for large language model inference workloads. The improvements, available in Ray 2.55+, address scaling bottlenecks that have plagued enterprise AI deployments running latency-sensitive applications.

The upgrades center on two architectural changes: HAProxy integration for ingress traffic and direct gRPC communication between deployment replicas. Both bypass Python-based components that previously created chokepoints under heavy load.

What the Numbers Show

In benchmark testing of a deep learning recommendation model pipeline, the optimized configuration pushed throughput from 490 to 1,573 queries per second while cutting P99 latency by 75%. At 400 concurrent users, the performance gap widened dramatically as Ray Serve’s default Python proxy saturated while HAProxy continued scaling.

For LLM inference specifically, the results proved even more striking. Running GPT-class models on H100 GPUs at 256 concurrent users per replica, throughput scaled linearly with replica count when using HAProxy—something the default configuration couldn’t achieve as the Python process hit its ceiling.

Streaming workloads saw 8.9x throughput improvements, while unary request patterns hit the full 11.1x gain.

Technical Architecture Shift

The core problem: Ray Serve’s default proxy runs on Python’s asyncio, which struggles at high concurrency. HAProxy, written in C and battle-tested across production systems globally, handles the same traffic with significantly less overhead.

The second optimization targets inter-deployment communication. Previously, when one deployment called another, Ray Serve routed everything through Ray Core’s actor task system—useful for complex orchestration but overkill for simple request-response patterns. The new gRPC option establishes direct channels between replica actors, serializing with protobuf instead of going through Ray’s object store.

Benchmarks show gRPC alone delivers 1.5x throughput improvement for unary calls and 2.4x for streaming at equivalent latency targets.

Enterprise Implications

These aren’t academic improvements. Companies running recommendation systems, real-time fraud detection, or customer-facing LLM applications have consistently hit Ray Serve’s scaling limits. The partnership with Google Kubernetes Engine that drove these optimizations suggests enterprise demand was substantial enough to prioritize the work.

A single environment variable—RAY_SERVE_USE_GRPC_BY_DEFAULT—enables the gRPC transport. HAProxy activation requires cluster-level configuration but integrates with existing Kubernetes deployments.

Anyscale is working toward making both optimizations the default for all inter-deployment communication, with an RFC currently under discussion. For teams already running Ray Serve in production, the upgrade path is straightforward: update to Ray 2.55+ and flip the appropriate flags.

The benchmark code is publicly available on GitHub for teams wanting to validate performance gains against their specific workloads before deploying.

Image source: Shutterstock


Credit: Source link

ShareTweetSendPinShare
Previous Post

3 Big Hyperliquid News You Might Have Missed This Week

Next Post

Tether Crypto $13Bn Profit Engine Fuels $1.5Bn Bet on Health Intelligence

Next Post
1inch and Innerworks Flip the Script on Hackers: Deploying AI-Powered Immune Layer

Tether Crypto $13Bn Profit Engine Fuels $1.5Bn Bet on Health Intelligence

You might also like

Bitcoin Price Action Turns Unsteady, Downside Threat Grow

Bitcoin Price In Freefall As Panic Sweeps Through The Market

June 3, 2026
Bitcoin Price Prediction: Florida’s Crypto Bill and $198B U.S. Surplus Boost Market Outlook

Cardano Price Prediction: ADA Active Addresses Had Grown By 14% as CME Launch 24/7 Trading

June 2, 2026
Cardano Price Witnesses Bullish Resurgence With 26% Rally — Here’s The Likely Catalyst

Cardano Price Could Close May Below This Multi-Year Support — What’s Next?

May 31, 2026
Bitcoin Addresses Holding Between 100 and 10,000 BTC Hit a 7-Week High

Bitcoin ETFs Hit Record $3B Outflows in 10 Days, Analysts See Bottom Signal

May 30, 2026
After a $60M short assault, Aave recommends governance reforms.

AAVE Price Prediction: Oversold Bounce to $80 Within 48 Hours as Whales Load Up

June 4, 2026
XRP News: Ripple Targets Turkey Inflation Market: Can RLUSD Beat USDT and USDC?

XRP News: Ripple Targets Turkey Inflation Market: Can RLUSD Beat USDT and USDC?

June 2, 2026
CryptoABC.net

This is an Australian online news/education portal that aims to provide the latest crypto news, real-time updates, education and reviews within Australia and around the world. Feel free to get in touch with us!

What's New Here!

XRP To $0.70 Next? The Case For Another 40% Crash

XRP To $0.70 Next? The Case For Another 40% Crash

June 6, 2026
Veteran Analyst Eyes $53,000 Bitcoin As Final Cycle Stage Begins

Veteran Analyst Eyes $53,000 Bitcoin As Final Cycle Stage Begins

June 6, 2026

Subscribe Now

  • Contact Us
  • Privacy Policy
  • Terms of Use
  • DMCA

© 2021 cryptoabc.net - All rights reserved!

No Result
View All Result
  • Live Crypto Prices
  • Crypto News
    • Worldwide
      • Bitcoin
      • Ethereum
      • Altcoin
      • Blockchain
      • Regulation
    • Australian Crypto News
  • Education
    • Cryptocurrency For Beginners
    • Where to Buy Cryptocurrency
    • Where to Store Cryptos
    • Cryptocurrency Tax in Australia 2021

© 2021 cryptoabc.net - All rights reserved!

Welcome Back!

Login to your account below

Forgotten Password?

Create New Account!

Fill the forms below to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
Please enter CoinGecko Free Api Key to get this plugin works.