Installation¶

Prerequisites¶

Go 1.21+ — Download Go
Git — for cloning the repository

Build from Source¶

git clone https://github.com/inference-sim/inference-sim.git
cd inference-sim
go build -o blis main.go

Environment Setup¶

BLIS uses roofline mode by default, which auto-fetches model architecture configs from HuggingFace. Set HF_TOKEN to access gated models (e.g., Llama-2) and avoid rate limits:

export HF_TOKEN=your_token_here

Public models (e.g., Qwen3) work without a token. See HuggingFace access tokens to create a token.

Air-gapped / offline environments

The default roofline mode requires network access to HuggingFace on first run (configs are cached in model_configs/ after that). For environments without internet access:

Use blackbox mode: ./blis run --model <name> --latency-model blackbox (uses pre-trained coefficients from defaults.yaml, no network needed)
Or pre-populate model_configs/<model>/config.json and use --model-config-folder

For CI pipelines, set HF_TOKEN in your environment secrets to avoid rate limits on gated models.

Verify the Build¶

./blis run --model qwen/qwen3-14b --num-requests 10

You should see JSON output on stdout containing fields like ttft_mean_ms, e2e_mean_ms, and responses_per_sec. This confirms BLIS is working correctly.

Optional: Local Documentation¶

To preview the documentation site locally:

pip install mkdocs-material==9.7.3
mkdocs serve

Then open http://localhost:8000.

Optional: Linter¶

For contributors, install the linter used in CI:

go install github.com/golangci/golangci-lint/v2/cmd/golangci-lint@v2.9.0
golangci-lint run ./...

What's Next¶

Quick Start — Run your first simulation and understand the output
Tutorial: Capacity Planning — Complete walkthrough of a capacity planning exercise