Бенчмарк LLM GPU-инференса

Сравнение производительности моделей на различных GPU и инференсах

Фильтры
GPU
Модель
Инференс
Docker
Модель / GPU
Инференс
Seq без SO
Seq с SO
Par без SO
Par с SO
unsloth/Qwen3-4B-Instruct-2507-GGUF:F16
ASUS GeForce RTX 5070 Ti PRIME OC Edition 16Gb ×2
Llama.cpp 797389.7289.1148.6649.5
unsloth/Qwen3-4B-Instruct-2507-GGUF:F16
ASUS GeForce RTX 5070 Ti PRIME OC Edition 16Gb ×1
Llama.cpp 797389.5288.9149.6939.69
ggml-org/gpt-oss-120b-GGUF
ASUS GeForce RTX 5070 Ti PRIME OC Edition 16Gb ×2
Llama.cpp 771726.5523.963.312.74
openai/gpt-oss-20b
ASUS GeForce RTX 5070 Ti PRIME OC Edition 16Gb ×2
vLLM 0.15.1222.44213.25123.81125.18
Qwen/Qwen3-4B-Instruct-2507
ASUS GeForce RTX 5070 Ti PRIME OC Edition 16Gb ×2
vLLM 0.15.1131.7124.37109.7899.28
Qwen/Qwen3-4B-Instruct-2507
ASUS GeForce RTX 5070 Ti PRIME OC Edition 16Gb ×1
vLLM 0.15.190.3987.8381.9281.25
ggml-org/gpt-oss-120b-GGUF
ASUS GeForce RTX 5060 Ti DUAL OC 16Gb ×2
Llama.cpp 77172418.782.943.25
openai/gpt-oss-20b
ASUS GeForce RTX 5060 Ti DUAL OC 16Gb ×2
vLLM 0.15.1146.11140.7479.1177.45
Qwen/Qwen3-4B-Instruct-2507
ASUS GeForce RTX 5060 Ti DUAL OC 16Gb ×2
vLLM 0.15.178.0776.2267.166.34
Qwen/Qwen3-4B-Instruct-2507
ASUS GeForce RTX 5060 Ti DUAL OC 16Gb ×1
vLLM 0.15.147.7447.0443.3942.99