Experiment Setup & Methodology
What make a good experiment?
Possible Request Load Patterns
- Uniform (baseline)
- Bursty (sudden spike than quiet)
- Ramp up (Gradual increaese)
Possible Token Load Patterns
- Low token (100)
- Medium size token (1k)
- High load token (10k)
Calibration note: Token sizing for long prompts uses the approximation 1 token ≈ 4.45 characters, measured empirically via vLLM
/tokenizeonmistralai/Mistral-7B-v0.1(6108 chars / 1373 tokens). Long prompts are sized to ~60% of the target token count so that input + output stays within 4096 tokens.