About
A quick compatibility check for local large language models.
Pick a device — or plug in custom specs — and see which models fit in memory, with token-per-second estimates grounded in the hardware that actually matters: memory bandwidth.
Why
Running LLMs locally is mostly a bandwidth problem, not a compute one. Most hardware comparisons make that unclear. This site tries to make it plain. The short version lives at Bandwidth, not FLOPS.
What’s here
- Preset devices — Apple silicon and NVIDIA GPUs with published memory and bandwidth figures.
- Custom rigs — plug in any memory and bandwidth pair and get the same analysis.
- Speed estimates — theoretical max token/sec adjusted by a ~0.6 efficiency factor. Real numbers shift with runtime, context length, and batch size.
- Install commands — copy the exact
ollama runcommand for any compatible model.
Caveats
Estimates are not benchmarks. They’re a first-pass filter. If a specific workload matters (long context, structured output, multiple users), run the model and measure.
Who
Built by Pithos Labs. Source at github.com/pithoslabs/canirunthis. Feedback and pull requests welcome.