User Guide¶
Task-oriented guides for using BLIS effectively. Each guide covers a specific feature with practical CLI examples and expected output.
Guides¶
| Guide | When to Use |
|---|---|
| Routing Policies | Choosing and configuring how requests are distributed across instances |
| Admission Control | Rate-limiting and traffic shaping at the cluster gateway |
| Scheduling & Priority | Controlling request processing order within each instance |
| Latency Models | Choosing between roofline (default, analytical), blackbox (data-driven), and cross-model (physics-informed) step time estimation |
| KV Cache & Memory | Tuning GPU/CPU memory allocation, prefix caching, and chunked prefill |
| Workload Specifications | Defining multi-client traffic patterns with YAML |
| Cluster Simulation | Running multi-instance simulations with the full pipeline |
| Metrics & Results | Understanding JSON output, metrics, anomaly counters, and fitness scores |
| Observe / Replay / Calibrate | Validating simulator accuracy against real inference servers |
| Hypothesis Experimentation | Running rigorous, reproducible experiments with the /hypothesis-experiment skill |
| Skills & Plugins | Claude Code skills, plugin marketplaces, and workflow tooling |
Reading Paths¶
Capacity planning: Quick Start → Tutorial → Cluster Simulation → Metrics & Results
Routing optimization: Routing Policies → Cluster Simulation → Metrics & Results
Memory tuning: KV Cache & Memory → Metrics & Results
New model evaluation: Latency Models → Workload Specifications → Metrics & Results
Calibration: Latency Models → Workload Specifications → Observe / Replay / Calibrate → Metrics & Results
Research: Hypothesis Experimentation → Metrics & Results