Workload Spec Schema¶

Complete YAML schema reference for BLIS workload specifications (--workload-spec). For a guide-level introduction, see Workload Specifications.

Top-Level Fields¶

Field	Type	Required	Description
`version`	string	No	Schema version (`"2"` recommended; `"1"` auto-upgraded)
`seed`	int64	No	RNG seed (overridden by CLI `--seed` if set)
`category`	string	No	`language`, `multimodal`, `reasoning`, or empty
`aggregate_rate`	float64	Yes	Total arrival rate in requests/second
`num_requests`	int64	No	Total requests to generate (0 = unlimited, use horizon)
`horizon`	int64	No	Simulation time limit in ticks (overridden by CLI `--horizon` if set)
`clients`	list	Yes*	Client specifications (see below)
`cohorts`	list	No	Cohort specifications with population dynamics (diurnal, spike, drain patterns)
`servegen_data`	object	No	Native ServeGen data file loading
`inference_perf`	object	No	inference-perf format compatibility

*At least one client, cohort, or servegen_data is required.

Client Specification¶

Each entry in the clients list defines a traffic source:

Field	Type	Required	Description
`id`	string	No	Client identifier (for metrics grouping)
`tenant_id`	string	No	Tenant identifier
`slo_class`	string	No	SLO tier: `critical`, `standard`, `sheddable`, `batch`, `background`, or empty
`model`	string	No	Model name override (for multi-model workloads)
`rate_fraction`	float64	Yes	Fraction of `aggregate_rate` for this client (must be positive). When lifecycle windows are present, fractions are normalized per-phase (see Lifecycle Normalization)
`arrival`	object	Yes	Arrival process configuration
`input_distribution`	object	Yes	Input token length distribution
`output_distribution`	object	Yes	Output token length distribution
`prefix_group`	string	No	Prefix group name (requests in same group share prefixes)
`prefix_length`	int	No	Shared prefix token count (additive to input_distribution)
`streaming`	bool	No	Whether to simulate streaming output
`network`	object	No	Client-side network characteristics
`lifecycle`	object	No	Activity window configuration
`multimodal`	object	No	Multimodal token generation
`reasoning`	object	No	Reasoning multi-turn behavior
`timeout`	int64	No	Per-request timeout in µs. nil = default (300s for sessions). 0 = no timeout
`slo_target_us`	int64	No	Per-request SLO TTFT target in µs. nil/0 = no target. Used by `--dispatch-order slo-deadline`

Arrival Process¶

Field	Type	Values	Description
`process`	string	`poisson`, `gamma`, `weibull`, `constant`	Inter-arrival time distribution
`cv`	*float64	Required for `gamma` and `weibull`	Coefficient of variation (burstiness). CV > 1 = bursty, CV < 1 = regular

Distribution Specification¶

Used for input_distribution and output_distribution:

Field	Type	Description
`type`	string	`gaussian`, `exponential`, `pareto_lognormal`, `lognormal`, `constant`, `empirical`
`params`	map	Type-specific parameters (see below)
`file`	string	Reserved for future use (file-based loading not yet implemented). Use inline `params` instead.

Distribution Parameters¶

Type	Parameters
`gaussian`	`mean`, `std_dev`, `min`, `max`
`exponential`	`mean`
`pareto_lognormal`	`alpha`, `xm`, `mu`, `sigma`, `mix_weight`
`lognormal`	`mu`, `sigma` (mean and standard deviation of log-transformed values; fitted via method of moments)
`constant`	`value`
`empirical`	inline `params` map (key=token count, value=probability)

Network Specification¶

Field	Type	Description
`rtt_ms`	float64	Round-trip time in milliseconds
`bandwidth_mbps`	float64	Bandwidth in Mbps

Reasoning Specification¶

Field	Type	Description
`reason_ratio_distribution`	DistSpec	Distribution of reasoning-to-output ratio
`multi_turn`	object	Multi-turn conversation configuration
`multi_turn.max_rounds`	int	Maximum conversation rounds
`multi_turn.think_time_us`	int64	User think time between rounds (microseconds)
`multi_turn.context_growth`	string	`accumulate` (prepend prior context) or empty (fixed-length)
`multi_turn.single_session`	bool	If true, each client creates exactly one session instead of spawning new sessions per arrival. Used by inference-perf multi-turn expansion. Default: false

Cohort Specification¶

Each entry in the cohorts list defines a population with lifecycle dynamics. Cohorts expand into individual clients with lifecycle windows derived from diurnal, spike, or drain patterns.

Field	Type	Required	Description
`id`	string	No	Cohort identifier
`population`	int	Yes	Number of clients in this cohort (max 100,000)
`tenant_id`	string	No	Tenant identifier
`slo_class`	string	No	SLO tier: `critical`, `standard`, `sheddable`, `batch`, `background`
`model`	string	No	Model name override
`arrival`	object	Yes	Arrival process configuration (same as Client)
`input_distribution`	object	Yes	Input token length distribution
`output_distribution`	object	Yes	Output token length distribution
`prefix_group`	string	No	Prefix group name
`streaming`	bool	No	Whether to simulate streaming output
`rate_fraction`	float64	Yes	Fraction of `aggregate_rate` for each client in this cohort
`diurnal`	object	No	Sinusoidal rate modulation (see below)
`spike`	object	No	Traffic spike configuration (see below)
`drain`	object	No	Linear ramp-down to zero (see below)
`timeout`	int64	No	Per-request timeout in µs (same as Client)
`slo_target_us`	int64	No	Per-request SLO TTFT target in µs (same as Client)

Diurnal Pattern¶

Field	Type	Description
`peak_hour`	int	Hour of peak traffic (0-23)
`peak_to_trough_ratio`	float64	Ratio of peak to trough rate (≥ 1.0)

Spike Pattern¶

Field	Type	Description
`start_time_us`	int64	Spike start time in microseconds
`duration_us`	int64	Spike duration in microseconds
`trace_rate`	float64	Cohort-level arrival rate in req/s (required when `aggregate_rate: 0`); divided evenly across population members

Drain Pattern¶

Field	Type	Description
`start_time_us`	int64	Drain start time in microseconds
`ramp_duration_us`	int64	Ramp-down duration in microseconds

Lifecycle Specification¶

Activity window configuration for clients (used in the lifecycle field of Client Specification). Cohort patterns (diurnal, spike, drain) are converted into lifecycle windows internally.

Field	Type	Description
`windows`	list	List of active time windows

Active Window¶

Field	Type	Description
`start_us`	int64	Window start time in microseconds
`end_us`	int64	Window end time in microseconds
`trace_rate`	float64	Per-window rate override. In absolute rate mode (`aggregate_rate: 0`), this is the absolute arrival rate (req/s). In proportional mode, this is a weight for rate allocation.
`arrival`	ArrivalSpec	Per-window arrival process override (overrides client-level `arrival`)
`input_distribution`	DistSpec	Per-window input token distribution override (overrides client-level `input_distribution`)
`output_distribution`	DistSpec	Per-window output token distribution override (overrides client-level `output_distribution`)

Lifecycle Normalization¶

Proportional mode (aggregate_rate > 0): When clients have lifecycle windows, rate_fraction values are normalized per-phase rather than globally. For each client, the simulator sums the rate_fraction of all co-active clients (those whose lifecycle windows overlap) and divides by that sum. This ensures aggregate_rate is achieved during every active phase.

Absolute rate mode (aggregate_rate = 0): Each window's trace_rate is used directly as the arrival rate (requests/second) for that window, without scaling or normalization. This mode preserves time-varying aggregate load patterns from traces (e.g., ServeGen workloads) where the aggregate rate varies over time. Validation requires all rate-based clients to have explicit trace_rate on every window.

Clients without lifecycle windows are "always-on" and are counted as co-active with every phase.

Example: A two-phase workload with aggregate_rate: 40:

Phase 1 (0–50s): clients A (rate_fraction: 0.7) and B (rate_fraction: 0.3)
Phase 2 (50–100s): client C (rate_fraction: 1.0)

Each phase's fractions are normalized independently: A gets 40 × 0.7/1.0 = 28 req/s, B gets 40 × 0.3/1.0 = 12 req/s, C gets 40 × 1.0/1.0 = 40 req/s. Both phases produce the full 40 req/s.

Without per-phase normalization, the global sum would be 2.0, and every client's rate would be halved.

Limitation: Always-on clients compute a single rate using co-active sums across all phases they overlap with. When an always-on client coexists with multiple non-overlapping phased clients, per-phase totals may be less than aggregate_rate. For predictable results, use either all-phased or all-always-on clients.

Multimodal Specification¶

Configures multimodal request generation (used in the multimodal field of Client Specification). Each distribution follows the same Distribution Specification format.

Field	Type	Description
`text_distribution`	DistSpec	Text token distribution
`image_distribution`	DistSpec	Image token distribution
`image_count_distribution`	DistSpec	Number of images per request
`audio_distribution`	DistSpec	Audio token distribution
`audio_count_distribution`	DistSpec	Number of audio segments per request
`video_distribution`	DistSpec	Video token distribution
`video_count_distribution`	DistSpec	Number of video segments per request

ServeGen Data Specification¶

Native ServeGen data file loading (used in the servegen_data top-level field):

Field	Type	Required	Description
`path`	string	Yes	Path to ServeGen data directory (containing `chunk-*-trace.csv` and `dataset.json`)
`time_window`	string	No	Temporal snapshot extraction: `midnight` (0:00-0:30), `morning` (8:00-8:30), or `afternoon` (14:00-14:30). Filters chunks to the specified 30-minute window.
`span_start`	int64	No	Trace span start filter (microseconds)
`span_end`	int64	No	Trace span end filter (microseconds)

InferencePerf Specification¶

inference-perf format compatibility (used in the inference_perf top-level field):

Field	Type	Required	Description
`stages`	list	Yes	Rate/duration stages for load patterns
`shared_prefix`	object	Yes	Shared prefix expansion configuration

Stage¶

Field	Type	Description
`rate`	float64	Requests per second for this stage
`duration`	int64	Stage duration in seconds (note: unlike other time fields which use microseconds, this field uses seconds)

Shared Prefix¶

Field	Type	Description
`num_unique_system_prompts`	int	Number of unique system prompts
`num_users_per_system_prompt`	int	Users per system prompt
`system_prompt_len`	int	System prompt length in tokens
`question_len`	int	Question length in tokens
`output_len`	int	Output length in tokens
`enable_multi_turn_chat`	bool	When true, maps to BLIS reasoning.multi_turn with SingleSession mode and fixed-length inputs (no context accumulation). Computes MaxRounds and ThinkTimeUs from stage parameters. See #514.

Complete Example¶

version: "2"
seed: 42
category: reasoning
aggregate_rate: 500.0
num_requests: 500

clients:
  - id: "multi-turn-chat"
    tenant_id: "chat-users"
    slo_class: "standard"
    rate_fraction: 1.0
    streaming: true
    arrival:
      process: poisson
    input_distribution:
      type: gaussian
      params:
        mean: 128
        std_dev: 30
        min: 32
        max: 512
    output_distribution:
      type: gaussian
      params:
        mean: 64
        std_dev: 20
        min: 16
        max: 256
    reasoning:
      reason_ratio_distribution:
        type: gaussian
        params:
          mean: 0
          std_dev: 0
          min: 0
          max: 0
      multi_turn:
        max_rounds: 5
        think_time_us: 500000
        context_growth: accumulate

Validation¶

BLIS validates workload specs with strict YAML parsing (KnownFields(true)) — typos in field names cause errors. Additional validation:

aggregate_rate must be positive
Each client's rate_fraction must be positive
arrival.process must be one of the valid processes
cv for gamma/weibull must be finite and positive
Weibull cv must be in [0.01, 10.4]
Distribution types must be recognized
All numeric params must be finite (no NaN or Inf)
At least one client, cohort, or servegen_data is required
Cohort population must be positive and ≤ 100,000