Installation¶
Prerequisites¶
- Go 1.21+ — Download Go
- Git — for cloning the repository
Build from Source¶
git clone https://github.com/inference-sim/inference-sim.git
cd inference-sim
go build -o blis main.go
Environment Setup¶
BLIS uses roofline mode by default, which auto-fetches model architecture configs from HuggingFace. Set HF_TOKEN to access gated models (e.g., Llama-2) and avoid rate limits:
Public models (e.g., Qwen3) work without a token. See HuggingFace access tokens to create a token.
Air-gapped / offline environments
The default roofline mode requires network access to HuggingFace on first run (configs are cached in model_configs/ after that). For environments without internet access:
- Use blackbox mode:
./blis run --model <name> --latency-model blackbox(uses pre-trained coefficients fromdefaults.yaml, no network needed) - Or pre-populate
model_configs/<model>/config.jsonand use--model-config-folder
For CI pipelines, set HF_TOKEN in your environment secrets to avoid rate limits on gated models.
Verify the Build¶
You should see JSON output on stdout containing fields like ttft_mean_ms, e2e_mean_ms, and responses_per_sec. This confirms BLIS is working correctly.
Optional: Local Documentation¶
To preview the documentation site locally:
Then open http://localhost:8000.
Optional: Linter¶
For contributors, install the linter used in CI:
What's Next¶
- Quick Start — Run your first simulation and understand the output
- Tutorial: Capacity Planning — Complete walkthrough of a capacity planning exercise