LLM Model Benchmarks - Search News

News

IEEE Spectrum on MSN1d

F or those who enjoy rooting for the underdog, the latest MLPerf benchmark results will disappoint: Nvidia’s GPUs have ...

20h

MLCommons' AI training tests show that the more chips you have, the more critical the network that's between them.

LMEval aims to help AI researchers and developers compare the performance of different large language models. Designed to be ...

MangoBoost, a provider of cutting-edge system solutions for maximizing compute efficiency and scalability, has validated the ...

9hon MSN

AI companies claim their tools couldn't exist without training on copyrighted material. It turns out, they could — it's just ...

The Allen Institute of AI updated its reward model evaluation RewardBench to better reflect real-life scenarios for enterprises.

Qwen 2.5 Coder/Max is currently the top open-source model for coding, with the highest HumanEval (~70–72%), LiveCodeBench (70 ...

The Bengaluru startup noted that Sarvam-M sets a new benchmark for models of its size in Indian languages, as well as in math ...

11d

M, a 24-billion-parameter hybrid language model boasting strong performance in math, programming, and Indian languages.

Sarvam AI claims that the advanced Sarvam-M model outperforms Meta 's LLaMA-4 Scout on most benchmarks and is comparable to ...

Results that may be inaccessible to you are currently showing.