Skip to content

User Guide

Task-oriented guides for using BLIS effectively. Each guide covers a specific feature with practical CLI examples and expected output.

Guides

Guide When to Use
Routing Policies Choosing and configuring how requests are distributed across instances
Admission Control Rate-limiting and traffic shaping at the cluster gateway
Scheduling & Priority Controlling request processing order within each instance
Latency Models Choosing between roofline (default, analytical), blackbox (data-driven), and cross-model (physics-informed) step time estimation
KV Cache & Memory Tuning GPU/CPU memory allocation, prefix caching, and chunked prefill
Workload Specifications Defining multi-client traffic patterns with YAML
Cluster Simulation Running multi-instance simulations with the full pipeline
Metrics & Results Understanding JSON output, metrics, anomaly counters, and fitness scores
Observe / Replay / Calibrate Validating simulator accuracy against real inference servers
Hypothesis Experimentation Running rigorous, reproducible experiments with the /hypothesis-experiment skill
Skills & Plugins Claude Code skills, plugin marketplaces, and workflow tooling

Reading Paths

Capacity planning: Quick StartTutorialCluster SimulationMetrics & Results

Routing optimization: Routing PoliciesCluster SimulationMetrics & Results

Memory tuning: KV Cache & MemoryMetrics & Results

New model evaluation: Latency ModelsWorkload SpecificationsMetrics & Results

Calibration: Latency ModelsWorkload SpecificationsObserve / Replay / CalibrateMetrics & Results

Research: Hypothesis ExperimentationMetrics & Results