Pick a model (or enter any parameter count), choose a quantization precision, and instantly see its disk size — plus which GPUs can actually run it.
An LLM's disk size is simply its parameter count times the number of bytes used to store each parameter. Quantization is the practice of shrinking those bytes.
Model size (GB) = parameters_in_billions × bytes_per_parameter
FP32 (4 bytes) — full 32-bit float. Used in research and training math, rarely for storing a whole model (2× the size of FP16 for no inference benefit).
FP16 / BF16 (2 bytes) — 16-bit float. The default for training and high-quality inference. A 70B model is ~140 GB.
INT8 (1 byte) — 8-bit integer quantization. Halves the size vs FP16 with minimal quality loss. A 70B model is ~70 GB.
INT4 (0.5 bytes) — 4-bit quantization. Quarters the size vs FP16. Small quality loss that's usually acceptable for inference. A 70B model is ~35 GB — this is what lets large models run on consumer hardware.
| Model | Params (B) | FP16 | INT8 | INT4 |
|---|