Workload Spec Schema
Complete YAML schema reference for BLIS workload specifications (--workload-spec). For a guide-level introduction, see Workload Specifications.
Top-Level Fields
| Field |
Type |
Required |
Description |
version |
string |
No |
Schema version ("2" recommended; "1" auto-upgraded) |
seed |
int64 |
No |
RNG seed (overridden by CLI --seed if set) |
category |
string |
No |
language, multimodal, reasoning, or empty |
aggregate_rate |
float64 |
Yes |
Total arrival rate in requests/second |
num_requests |
int64 |
No |
Total requests to generate (0 = unlimited, use horizon) |
horizon |
int64 |
No |
Simulation time limit in ticks (overridden by CLI --horizon if set) |
clients |
list |
Yes* |
Client specifications (see below) |
cohorts |
list |
No |
Cohort specifications with population dynamics (diurnal, spike, drain patterns) |
servegen_data |
object |
No |
Native ServeGen data file loading |
inference_perf |
object |
No |
inference-perf format compatibility |
*At least one client, cohort, or servegen_data is required.
Client Specification
Each entry in the clients list defines a traffic source:
| Field |
Type |
Required |
Description |
id |
string |
No |
Client identifier (for metrics grouping) |
tenant_id |
string |
No |
Tenant identifier |
slo_class |
string |
No |
SLO tier: critical, standard, sheddable, batch, background, or empty |
model |
string |
No |
Model name override (for multi-model workloads) |
rate_fraction |
float64 |
Yes |
Fraction of aggregate_rate for this client (must be positive) |
arrival |
object |
Yes |
Arrival process configuration |
input_distribution |
object |
Yes |
Input token length distribution |
output_distribution |
object |
Yes |
Output token length distribution |
prefix_group |
string |
No |
Prefix group name (requests in same group share prefixes) |
prefix_length |
int |
No |
Shared prefix token count (additive to input_distribution) |
streaming |
bool |
No |
Whether to simulate streaming output |
network |
object |
No |
Client-side network characteristics |
lifecycle |
object |
No |
Activity window configuration |
multimodal |
object |
No |
Multimodal token generation |
reasoning |
object |
No |
Reasoning multi-turn behavior |
Arrival Process
| Field |
Type |
Values |
Description |
process |
string |
poisson, gamma, weibull, constant |
Inter-arrival time distribution |
cv |
*float64 |
Required for gamma and weibull |
Coefficient of variation (burstiness). CV > 1 = bursty, CV < 1 = regular |
Distribution Specification
Used for input_distribution and output_distribution:
| Field |
Type |
Description |
type |
string |
gaussian, exponential, pareto_lognormal, constant, empirical |
params |
map |
Type-specific parameters (see below) |
file |
string |
Reserved for future use (file-based loading not yet implemented). Use inline params instead. |
Distribution Parameters
| Type |
Parameters |
gaussian |
mean, std_dev, min, max |
exponential |
mean |
pareto_lognormal |
alpha, xm, mu, sigma, mix_weight |
constant |
value |
empirical |
inline params map (key=token count, value=probability) |
Network Specification
| Field |
Type |
Description |
rtt_ms |
float64 |
Round-trip time in milliseconds |
bandwidth_mbps |
float64 |
Bandwidth in Mbps |
Reasoning Specification
| Field |
Type |
Description |
reason_ratio_distribution |
DistSpec |
Distribution of reasoning-to-output ratio |
multi_turn |
object |
Multi-turn conversation configuration |
multi_turn.max_rounds |
int |
Maximum conversation rounds |
multi_turn.think_time_us |
int64 |
User think time between rounds (microseconds) |
multi_turn.context_growth |
string |
accumulate (prepend prior context) or empty (fixed-length) |
multi_turn.single_session |
bool |
If true, each client creates exactly one session instead of spawning new sessions per arrival. Used by inference-perf multi-turn expansion. Default: false |
Cohort Specification
Each entry in the cohorts list defines a population with lifecycle dynamics. Cohorts expand into individual clients with lifecycle windows derived from diurnal, spike, or drain patterns.
| Field |
Type |
Required |
Description |
id |
string |
No |
Cohort identifier |
population |
int |
Yes |
Number of clients in this cohort (max 100,000) |
tenant_id |
string |
No |
Tenant identifier |
slo_class |
string |
No |
SLO tier: critical, standard, sheddable, batch, background |
model |
string |
No |
Model name override |
arrival |
object |
Yes |
Arrival process configuration (same as Client) |
input_distribution |
object |
Yes |
Input token length distribution |
output_distribution |
object |
Yes |
Output token length distribution |
prefix_group |
string |
No |
Prefix group name |
streaming |
bool |
No |
Whether to simulate streaming output |
rate_fraction |
float64 |
Yes |
Fraction of aggregate_rate for each client in this cohort |
diurnal |
object |
No |
Sinusoidal rate modulation (see below) |
spike |
object |
No |
Traffic spike configuration (see below) |
drain |
object |
No |
Linear ramp-down to zero (see below) |
Diurnal Pattern
| Field |
Type |
Description |
peak_hour |
int |
Hour of peak traffic (0-23) |
peak_to_trough_ratio |
float64 |
Ratio of peak to trough rate (≥ 1.0) |
Spike Pattern
| Field |
Type |
Description |
start_time_us |
int64 |
Spike start time in microseconds |
duration_us |
int64 |
Spike duration in microseconds |
Drain Pattern
| Field |
Type |
Description |
start_time_us |
int64 |
Drain start time in microseconds |
ramp_duration_us |
int64 |
Ramp-down duration in microseconds |
Lifecycle Specification
Activity window configuration for clients (used in the lifecycle field of Client Specification). Cohort patterns (diurnal, spike, drain) are converted into lifecycle windows internally.
| Field |
Type |
Description |
windows |
list |
List of active time windows |
Active Window
| Field |
Type |
Description |
start_us |
int64 |
Window start time in microseconds |
end_us |
int64 |
Window end time in microseconds |
Multimodal Specification
Configures multimodal request generation (used in the multimodal field of Client Specification). Each distribution follows the same Distribution Specification format.
| Field |
Type |
Description |
text_distribution |
DistSpec |
Text token distribution |
image_distribution |
DistSpec |
Image token distribution |
image_count_distribution |
DistSpec |
Number of images per request |
audio_distribution |
DistSpec |
Audio token distribution |
audio_count_distribution |
DistSpec |
Number of audio segments per request |
video_distribution |
DistSpec |
Video token distribution |
video_count_distribution |
DistSpec |
Number of video segments per request |
ServeGen Data Specification
Native ServeGen data file loading (used in the servegen_data top-level field):
| Field |
Type |
Required |
Description |
path |
string |
Yes |
Path to ServeGen data directory (containing chunk-*-trace.csv and dataset.json) |
span_start |
int64 |
No |
Trace span start filter (microseconds) |
span_end |
int64 |
No |
Trace span end filter (microseconds) |
InferencePerf Specification
inference-perf format compatibility (used in the inference_perf top-level field):
| Field |
Type |
Required |
Description |
stages |
list |
Yes |
Rate/duration stages for load patterns |
shared_prefix |
object |
Yes |
Shared prefix expansion configuration |
Stage
| Field |
Type |
Description |
rate |
float64 |
Requests per second for this stage |
duration |
int64 |
Stage duration in seconds (note: unlike other time fields which use microseconds, this field uses seconds) |
Shared Prefix
| Field |
Type |
Description |
num_unique_system_prompts |
int |
Number of unique system prompts |
num_users_per_system_prompt |
int |
Users per system prompt |
system_prompt_len |
int |
System prompt length in tokens |
question_len |
int |
Question length in tokens |
output_len |
int |
Output length in tokens |
enable_multi_turn_chat |
bool |
When true, maps to BLIS reasoning.multi_turn with SingleSession mode and fixed-length inputs (no context accumulation). Computes MaxRounds and ThinkTimeUs from stage parameters. See #514. |
Complete Example
version: "2"
seed: 42
category: reasoning
aggregate_rate: 500.0
num_requests: 500
clients:
- id: "multi-turn-chat"
tenant_id: "chat-users"
slo_class: "standard"
rate_fraction: 1.0
streaming: true
arrival:
process: poisson
input_distribution:
type: gaussian
params:
mean: 128
std_dev: 30
min: 32
max: 512
output_distribution:
type: gaussian
params:
mean: 64
std_dev: 20
min: 16
max: 256
reasoning:
reason_ratio_distribution:
type: gaussian
params:
mean: 0
std_dev: 0
min: 0
max: 0
multi_turn:
max_rounds: 5
think_time_us: 500000
context_growth: accumulate
Validation
BLIS validates workload specs with strict YAML parsing (KnownFields(true)) — typos in field names cause errors. Additional validation:
aggregate_rate must be positive
- Each client's
rate_fraction must be positive
arrival.process must be one of the valid processes
cv for gamma/weibull must be finite and positive
- Weibull
cv must be in [0.01, 10.4]
- Distribution types must be recognized
- All numeric params must be finite (no NaN or Inf)
- At least one
client, cohort, or servegen_data is required
- Cohort
population must be positive and ≤ 100,000