Бенчмарк LLM GPU-инференса
Сравнение производительности моделей на различных GPU и инференсах
Модель | Инференс | GPU | Seq без SO | Seq с SO | Par без SO | Par с SO | Throughput без SO | Throughput с SO | |
|---|---|---|---|---|---|---|---|---|---|
Qwen/Qwen3-4B-Instruct-2507 | vLLM 0.14.0 | GeForce RTX 3090 GAMING X TRIO 24Gb ×1 | 83.89 | 81.05 | 68.9 | 67.4 | 419.35 | 420.09 | |
Qwen/Qwen3-4B-Instruct-2507 | vLLM 0.14.0 | Tesla v100 32gb sxm2 ×1 | 67.37 | 65.81 | 45.12 | 44.14 | 284.19 | 283.68 | |
Qwen/Qwen3-4B-Instruct-2507 | vLLM 0.14.0 | GPU 1: Tesla v100 32gb sxm2 ×1 GPU 2: GeForce RTX 3090 GAMING X TRIO 24Gb ×1 | 100.88 | 98.14 | 60.5 | 58.84 | 380.61 | 401.99 | |
unsloth/Qwen3-4B-Instruct-2507-GGUF:F16 | Llama.cpp 7717 | GPU 1: Tesla v100 32gb sxm2 ×1 GPU 2: GeForce RTX 3090 GAMING X TRIO 24Gb ×1 | 84.58 | 82.86 | 48.5 | 44.97 | 250.41 | 259.17 | |
unsloth/Qwen3-4B-Instruct-2507-GGUF:F16 | Llama.cpp 7717 | Tesla v100 32gb sxm2 ×1 | 81.79 | 80.36 | 38.96 | 40.21 | 190.19 | 213.47 | |
unsloth/Qwen3-4B-Instruct-2507-GGUF:F16 | Llama.cpp 7717 | GeForce RTX 3090 GAMING X TRIO 24Gb ×1 | 90.56 | 88.34 | 46.99 | 46.79 | 262.74 | 257.08 |