Tag

#measurement

2 insights with this tag.

AI · arxiv/cs.AI · 8 min

Benchmark Rubrics Shift LLM Scores in Financial NLP Tasks

How wording changes in evaluation criteria and metric selection alter model rankings on financial text benchmarks, requiring governance over gold-label assumptions.

May 2, 2026 Read → →
AI · arxiv/cs.AI · 6 min

Measuring Where Chatbots Beat Humans on Tests

Researchers apply psychometric methods to identify test items where LLMs systematically outperform human learners, revealing assessment vulnerabilities.

April 17, 2026 Read → →

astrobobo

Bu site JavaScript gerektirir. Tarayıcında JavaScript'i etkinleştir.

This site requires JavaScript. Please enable it in your browser.