Skip to content

Concepts

These pages explain BLIS's architecture and core mechanisms at the level needed to understand the system without reading source code. For task-oriented how-to guides, see the User Guide.

Concept Pages

Page Description
Glossary Definitions of BLIS-specific terminology
Cluster Architecture Multi-instance simulation: admission, routing, scorer composition, snapshot freshness, shared-clock event loop
Core Engine Single-instance DES engine: event queue, Step() phases, request lifecycle, batch formation, KV cache, latency models
Roofline Estimation Analytical GPU step time estimation without training data

Diagrams

Each concept page includes inline Mermaid diagrams:

  • Cluster Data Flow — End-to-end cluster pipeline: request arrival through metrics output
  • Request Lifecycle — Request state machine: states, transitions, and metric recording points
  • Event Processing Loop — DES event loop: min-heap queue, clock advancement, Step() decomposition
  • Scoring Pipeline — Weighted scorer composition: per-scorer normalization, weight multiplication, argmax selection

Reading Order

For newcomers to BLIS:

  1. Start with Glossary to learn BLIS-specific terminology
  2. Read Core Engine to understand the DES architecture and single-instance simulation
  3. Read Cluster Architecture to understand multi-instance orchestration
  4. Consult Configuration Reference when running experiments
  5. See Extension Recipes when adding new policies or features