LLM Model Size Calculator

Pick a model (or enter any parameter count), choose a quantization precision, and instantly see its disk size — plus which GPUs can actually run it.

Configuration

Choose a known model or "Custom".
Total parameter count. For MoE, use total (not active).
FP16 / BF16 — the default for training and high-quality inference.

Result

GB
Model size

Will it fit on your GPU?

Fits = model weights only. Add ~20-40% overhead for KV cache + activations at inference time.

How model size is calculated

An LLM's disk size is simply its parameter count times the number of bytes used to store each parameter. Quantization is the practice of shrinking those bytes.

Model size (GB) = parameters_in_billions × bytes_per_parameter

The four common precisions

FP32 (4 bytes) — full 32-bit float. Used in research and training math, rarely for storing a whole model (2× the size of FP16 for no inference benefit).

FP16 / BF16 (2 bytes) — 16-bit float. The default for training and high-quality inference. A 70B model is ~140 GB.

INT8 (1 byte) — 8-bit integer quantization. Halves the size vs FP16 with minimal quality loss. A 70B model is ~70 GB.

INT4 (0.5 bytes) — 4-bit quantization. Quarters the size vs FP16. Small quality loss that's usually acceptable for inference. A 70B model is ~35 GB — this is what lets large models run on consumer hardware.

Preset models

ModelParams (B)FP16INT8INT4