Concepts¶
These pages explain BLIS's architecture and core mechanisms at the level needed to understand the system without reading source code. For task-oriented how-to guides, see the User Guide.
Concept Pages¶
| Page | Description |
|---|---|
| Glossary | Definitions of BLIS-specific terminology |
| Cluster Architecture | Multi-instance simulation: admission, routing, scorer composition, snapshot freshness, shared-clock event loop |
| Core Engine | Single-instance DES engine: event queue, Step() phases, request lifecycle, batch formation, KV cache, latency models |
| Roofline Estimation | Analytical GPU step time estimation without training data |
Diagrams¶
Each concept page includes inline Mermaid diagrams:
- Cluster Data Flow — End-to-end cluster pipeline: request arrival through metrics output
- Request Lifecycle — Request state machine: states, transitions, and metric recording points
- Event Processing Loop — DES event loop: min-heap queue, clock advancement, Step() decomposition
- Scoring Pipeline — Weighted scorer composition: per-scorer normalization, weight multiplication, argmax selection
Reading Order¶
For newcomers to BLIS:
- Start with Glossary to learn BLIS-specific terminology
- Read Core Engine to understand the DES architecture and single-instance simulation
- Read Cluster Architecture to understand multi-instance orchestration
- Consult Configuration Reference when running experiments
- See Extension Recipes when adding new policies or features