Tag: LLM inference

Transformer Efficiency Tricks: KV Caching and Continuous Batching in LLM Serving

KV caching and continuous batching are essential for efficient LLM serving. They reduce compute by 90% and boost throughput 3.8x, making long-context responses feasible. Without them, deploying LLMs at scale is prohibitively expensive.

Mar 22, 2026
Collin Pace
8
Permalink

Tags:
KV caching
continuous batching
LLM inference
transformer efficiency
LLM serving

Confidential Computing for Privacy-Preserving LLM Inference: How Secure AI Works Today

Confidential computing enables secure LLM inference by protecting data and model weights inside hardware-secured enclaves. Learn how AWS, Azure, and Google implement it, the real-world trade-offs, and why regulated industries are adopting it now.

Jan 21, 2026
Collin Pace
8
Permalink

Tags:
confidential computing
LLM inference
privacy-preserving AI
Trusted Execution Environment
secure AI

Tag: LLM inference

Transformer Efficiency Tricks: KV Caching and Continuous Batching in LLM Serving

Confidential Computing for Privacy-Preserving LLM Inference: How Secure AI Works Today

Categories

Archive