English learderboard, v2
Last updated: 2024-09-19 21:21:32
# | Model name | Length norm score | Avg score | Refusal ratio | Stay in character score | Language fluency score | Entertain score | Num cases | Avg length |
---|---|---|---|---|---|---|---|---|---|
1 | claude_3_5_sonnet | 4.65 | 4.65 | 0.28 | 4.74 | 4.93 | 4.29 | 64 | 418 |
1 | llama_31_405b_it | 4.65 | 4.65 | 0.06 | 4.68 | 4.93 | 4.35 | 64 | 548 |
1 | llama_31_70b_it | 4.65 | 4.66 | 0.00 | 4.71 | 4.93 | 4.33 | 64 | 562 |
2 | gpt_4o_mini | 4.56 | 4.56 | 0.00 | 4.60 | 4.94 | 4.13 | 64 | 457 |
2 | claude_3_opus | 4.56 | 4.71 | 0.22 | 4.75 | 4.92 | 4.46 | 64 | 1032 |
2 | gemma2_ataraxy_9b | 4.52 | 4.52 | 0.00 | 4.60 | 4.79 | 4.17 | 64 | 358 |
2 | gemma2_27b_it | 4.51 | 4.51 | 0.00 | 4.56 | 4.92 | 4.06 | 64 | 291 |
3 | mistral_nemo_gutenberg_12b_v2 | 4.51 | 4.57 | 0.00 | 4.65 | 4.80 | 4.25 | 64 | 664 |
3 | gpt_4o | 4.50 | 4.50 | 0.00 | 4.56 | 4.94 | 4.02 | 64 | 484 |
3 | command_r_plus_104b_0824 | 4.50 | 4.50 | 0.00 | 4.58 | 4.90 | 4.04 | 64 | 553 |
3 | llama_31_8b_it | 4.50 | 4.51 | 0.02 | 4.50 | 4.83 | 4.20 | 64 | 568 |
3 | magnum_v2_123b | 4.50 | 4.59 | 0.00 | 4.54 | 4.94 | 4.28 | 64 | 768 |
3 | gemini_pro_1_5 | 4.50 | 4.50 | 0.02 | 4.54 | 4.88 | 4.07 | 64 | 265 |
3 | qwen2_72b_it | 4.49 | 4.49 | 0.00 | 4.49 | 4.93 | 4.06 | 64 | 510 |
3 | llama_31_euryale_70b_v2_2 | 4.48 | 4.48 | 0.02 | 4.48 | 4.88 | 4.08 | 64 | 384 |
3 | llama_3_lunaris_8b | 4.48 | 4.54 | 0.00 | 4.53 | 4.89 | 4.20 | 64 | 673 |
3 | nous_hermes_3_405b | 4.47 | 4.47 | 0.00 | 4.41 | 4.90 | 4.10 | 64 | 471 |
4 | mistral_large_123b_2407 | 4.45 | 4.45 | 0.02 | 4.55 | 4.86 | 3.94 | 64 | 325 |
4 | wizardlm_2_8x22b | 4.41 | 4.57 | 0.00 | 4.62 | 4.92 | 4.18 | 64 | 1143 |
5 | llama_31_8b_stheno_v3_4 | 4.37 | 4.45 | 0.00 | 4.44 | 4.77 | 4.14 | 64 | 736 |
5 | deepseek_chat_v2_0628 | 4.35 | 4.35 | 0.00 | 4.34 | 4.94 | 3.77 | 64 | 399 |
5 | claude_3_haiku | 4.35 | 4.43 | 0.03 | 4.36 | 4.89 | 4.04 | 64 | 750 |
5 | solar_pro | 4.33 | 4.33 | 0.00 | 4.30 | 4.92 | 3.77 | 63 | 300 |
6 | star_command_r_32b_v1 | 4.32 | 4.40 | 0.00 | 4.37 | 4.81 | 4.03 | 64 | 748 |
7 | llama_31_70b_arliai_rpmax_v1_1 | 4.20 | 4.22 | 0.00 | 4.07 | 4.79 | 3.81 | 63 | 587 |
7 | arliai_rpmax_12b_v1_1 | 4.17 | 4.25 | 0.02 | 4.27 | 4.55 | 3.93 | 64 | 743 |
8 | lyra4_gutenberg_12b | 4.14 | 4.30 | 0.00 | 4.38 | 4.71 | 3.81 | 64 | 1133 |
8 | mistral_nemo_starcannon_12b | 4.14 | 4.27 | 0.02 | 4.20 | 4.76 | 3.84 | 64 | 940 |
8 | jamba_1_5_large | 4.14 | 4.14 | 0.00 | 4.06 | 4.79 | 3.56 | 64 | 345 |
8 | mistral_nemo_12b | 4.13 | 4.13 | 0.00 | 4.22 | 4.80 | 3.38 | 64 | 224 |
8 | qwen2_7b_it | 4.11 | 4.11 | 0.02 | 4.01 | 4.79 | 3.53 | 64 | 354 |
9 | mythomax_13b | 4.01 | 4.01 | 0.00 | 3.83 | 4.80 | 3.39 | 64 | 388 |
10 | phi_3_5_mini_4b_it | 3.96 | 4.04 | 0.00 | 3.81 | 4.81 | 3.49 | 64 | 768 |