Summary of "2025 - M. Łukasik - Monolith to Microserv. at 20M+ requests per second: Latency and Scale Challenges"
Summary of “2025 - M. Łukasik - Monolith to Microserv. at 20M+ requests per second: Latency and Scale Challenges”
Key Technological Concepts and Product Features
Context & Scale
- RTB House operates in the real-time bidding (RTB) advertising ecosystem, handling over 20 million bid requests per second globally.
- RTB auctions require extremely low latency responses (~60 ms round-trip), making performance critical.
- The bidding service is a monolithic Java application responsible for evaluating bid requests and deciding whether and how much to bid.
Motivation for Splitting the Monolith
- Memory constraints due to loading large static resources (e.g., geo IP database, user agent parsing data).
- Difficulty managing deployments and code from multiple teams in a single monolith.
- Inefficiencies in caching caused by many monolith instances each maintaining separate caches.
- Desire to implement sticky sessions and partition traffic more effectively.
Microservices Extracted
- URL Context Parser: Analyzes URL content categories (e.g., cats, sports).
- Geo IP Resolver: Maps user IP to geolocation (country, region, city) using a ~10GB geo database.
- User Agent Analyzer: Parses user agent strings with heavy caching to meet latency demands.
Migration Design
- Bid request processing split into three phases:
- Parsing the bid request.
- Enrichment phase calling microservices for additional data.
- Evaluation phase running ML models and policies.
- Microservices communicate via gRPC.
Performance and Scale Challenges
Load Balancing
- Centralized load balancers (HAProxy) could only handle 3-4% of traffic per two instances.
- Client-side load balancing implemented with a simple round-robin policy and DNS-based service registry.
- gRPC health checking uses streaming to track backend health but can fail if a server hangs and does not report unhealthy status.
- Mitigations include external health probes and timeout handling.
Request Overhead
- Initial unary gRPC calls for each bid request to microservices caused huge CPU overhead (80% CPU spent on overhead like GC, thread dispatching, TCP handling).
- Using gRPC streaming reduced overhead somewhat but was complex to maintain due to manual request-response mapping.
- Batching requests (grouping ~10 requests into one gRPC call) significantly reduced overhead and number of required microservice instances (about 3x reduction).
- Trade-off: batching introduces additional latency but improves CPU efficiency.
Latency and Garbage Collection
- Strict latency constraints (~7 ms added latency allowed).
- Initial use of G1 GC caused high tail latency (P99 ~21 ms).
- Switching to ZGC (Z Garbage Collector) reduced tail latency significantly, though average latency slightly increased and memory consumption was higher.
- ZGC is better suited for low-latency requirements but has trade-offs in throughput and memory usage.
System Resiliency
- Implemented a cool-down period before shutting down microservice instances by marking them unhealthy and waiting to propagate this info.
- Used gRPC hedging (sending duplicate requests shortly after the first attempt without waiting for timeout) to reduce tail latency and handle transient errors.
- Hedging includes throttling mechanisms (token bucket algorithm) to prevent overload during system stress.
- Servers can push back metadata to disallow further hedging if overloaded.
Final Architecture and Benefits
- Bid processing split into parsing, enrichment (with batching buffers per microservice), and evaluation.
- Microservices run with ZGC and use unary gRPC calls with batched requests.
- Achieved:
- ~15 GB RAM freed from bidding monolith.
- Improved cache efficiency due to fewer, larger caches in microservices.
- Lower CPU overhead and better resource utilization.
- Established a framework for further microservice extraction in the future.
Reviews, Guides, and Tutorials Provided
-
Performance Testing & Profiling
- Used production profiling with CPU flame graphs to identify overhead.
- Conducted A/B tests with artificial latency injection to find max acceptable latency increase (~7 ms).
- Load tests to measure load balancer capacity and microservice instance requirements.
-
gRPC Usage
- Explanation of unary calls vs streaming RPC.
- Client-side load balancing with DNS-based service registry.
- gRPC health checking via streaming and its pitfalls.
- Hedging for latency and error handling with throttling.
-
Batching Strategy
- Detailed rationale for batching requests to microservices to reduce overhead.
- Trade-offs between batch size, latency, and throughput.
- Implementation of batching buffers with ticker-based flush mechanism.
-
Garbage Collector Selection
- Comparison between G1 and ZGC for low latency systems.
- Practical tuning considerations and trade-offs.
-
System Resiliency Approaches
- Cool-down periods for graceful shutdown.
- Hedging to mitigate latency spikes and transient errors.
- Throttling and push-back metadata to avoid cascading failures.
Main Speaker / Source
- Mateusz (Matthew) Łukasik
- Senior Software Engineer and Tech Lead at RTB House.
- Specializes in building reliable, scalable Java systems.
- Shares deep insights from hands-on experience migrating a high-scale bidding monolith to microservices.
This presentation provides a thorough technical case study on migrating a high-throughput, low-latency Java monolith to microservices, focusing on practical performance challenges and solutions including load balancing, batching, garbage collection tuning, and resiliency techniques.
Category
Technology
Share this summary
Is the summary off?
If you think the summary is inaccurate, you can reprocess it with the latest model.
Preparing reprocess...