Judge Arena: Benchmarking LLMs as Evaluators

Vote to help the community find the best LLM-as-a-judge to use!

๐Ÿ‘ฉโ€โš–๏ธ Judge A

Model: Hidden

๐Ÿง‘โ€โš–๏ธ Judge B

Model: Hidden




By default, we use the Prometheus absolute grading prompt template - see here.