Tag: continuous batching

Batched Generation in LLM Serving: How Request Scheduling Impacts Performance

Explore how batched generation and request scheduling optimize LLM serving. Learn the difference between static and continuous batching and how PagedAttention boosts GPU efficiency.

Apr 17, 2026
Collin Pace
10
Permalink

Tags:
batched generation
LLM serving
continuous batching
request scheduling
vLLM

Transformer Efficiency Tricks: KV Caching and Continuous Batching in LLM Serving

KV caching and continuous batching are essential for efficient LLM serving. They reduce compute by 90% and boost throughput 3.8x, making long-context responses feasible. Without them, deploying LLMs at scale is prohibitively expensive.

Mar 22, 2026
Collin Pace
8
Permalink

Tags:
KV caching
continuous batching
LLM inference
transformer efficiency
LLM serving

Tag: continuous batching

Batched Generation in LLM Serving: How Request Scheduling Impacts Performance

Transformer Efficiency Tricks: KV Caching and Continuous Batching in LLM Serving

Categories

Archive