Бенчмарк LLM GPU-инференса
Сравнение производительности моделей на различных GPU и инференсах
Модель / GPU | Инференс | Seq без SO | Seq с SO | Par без SO | Par с SO | |
|---|---|---|---|---|---|---|
unsloth/Qwen3-4B-Instruct-2507-GGUF:F16 ASUS GeForce RTX 5070 Ti PRIME OC Edition 16Gb ×2 | Llama.cpp 7973 | 89.72 | 89.11 | 48.66 | 49.5 | |
unsloth/Qwen3-4B-Instruct-2507-GGUF:F16 ASUS GeForce RTX 5070 Ti PRIME OC Edition 16Gb ×1 | Llama.cpp 7973 | 89.52 | 88.91 | 49.69 | 39.69 | |
ggml-org/gpt-oss-120b-GGUF ASUS GeForce RTX 5070 Ti PRIME OC Edition 16Gb ×2 | Llama.cpp 7717 | 26.55 | 23.96 | 3.31 | 2.74 | |
openai/gpt-oss-20b ASUS GeForce RTX 5070 Ti PRIME OC Edition 16Gb ×2 | vLLM 0.15.1 | 222.44 | 213.25 | 123.81 | 125.18 | |
Qwen/Qwen3-4B-Instruct-2507 ASUS GeForce RTX 5070 Ti PRIME OC Edition 16Gb ×2 | vLLM 0.15.1 | 131.7 | 124.37 | 109.78 | 99.28 | |
Qwen/Qwen3-4B-Instruct-2507 ASUS GeForce RTX 5070 Ti PRIME OC Edition 16Gb ×1 | vLLM 0.15.1 | 90.39 | 87.83 | 81.92 | 81.25 | |
ggml-org/gpt-oss-120b-GGUF ASUS GeForce RTX 5060 Ti DUAL OC 16Gb ×2 | Llama.cpp 7717 | 24 | 18.78 | 2.94 | 3.25 | |
openai/gpt-oss-20b ASUS GeForce RTX 5060 Ti DUAL OC 16Gb ×2 | vLLM 0.15.1 | 146.11 | 140.74 | 79.11 | 77.45 | |
Qwen/Qwen3-4B-Instruct-2507 ASUS GeForce RTX 5060 Ti DUAL OC 16Gb ×2 | vLLM 0.15.1 | 78.07 | 76.22 | 67.1 | 66.34 | |
Qwen/Qwen3-4B-Instruct-2507 ASUS GeForce RTX 5060 Ti DUAL OC 16Gb ×1 | vLLM 0.15.1 | 47.74 | 47.04 | 43.39 | 42.99 |