Contribute to the Agentic Benchmark Checklist
Upholding the validity of agentic benchmarks requires effort from the broader scientific community. If you’re passionate about reliable evaluation in AI, we’d love your help.
Here’s some ways to get involved: