User Guide¶

Task-oriented guides for using BLIS effectively. Each guide covers a specific feature with practical CLI examples and expected output.

Guides¶

Guide	When to Use
Routing Policies	Choosing and configuring how requests are distributed across instances
Admission Control	Rate-limiting and traffic shaping at the cluster gateway
Scheduling & Priority	Controlling request processing order within each instance
Latency Models	Choosing between roofline (default, analytical), blackbox (data-driven), and cross-model (physics-informed) step time estimation
KV Cache & Memory	Tuning GPU/CPU memory allocation, prefix caching, and chunked prefill
Workload Specifications	Defining multi-client traffic patterns with YAML
Cluster Simulation	Running multi-instance simulations with the full pipeline
Metrics & Results	Understanding JSON output, metrics, anomaly counters, and fitness scores
Observe / Replay / Calibrate	Validating simulator accuracy against real inference servers
Hypothesis Experimentation	Running rigorous, reproducible experiments with the `/hypothesis-experiment` skill
Skills & Plugins	Claude Code skills, plugin marketplaces, and workflow tooling

Capacity planning: Quick Start → Tutorial → Cluster Simulation → Metrics & Results

Routing optimization: Routing Policies → Cluster Simulation → Metrics & Results

Memory tuning: KV Cache & Memory → Metrics & Results

New model evaluation: Latency Models → Workload Specifications → Metrics & Results

Calibration: Latency Models → Workload Specifications → Observe / Replay / Calibrate → Metrics & Results

Research: Hypothesis Experimentation → Metrics & Results