Russian learderboard, v2
Last updated: 2024-09-19 21:21:32
# | Model name | Length norm score | Avg score | Refusal ratio | Stay in character score | Language fluency score | Entertain score | Num cases | Avg length |
---|---|---|---|---|---|---|---|---|---|
1 | claude_3_5_sonnet | 4.63 | 4.68 | 0.30 | 4.80 | 4.80 | 4.44 | 64 | 388 |
2 | gemini_pro_1_5 | 4.49 | 4.49 | 0.02 | 4.60 | 4.75 | 4.13 | 64 | 213 |
2 | gpt_4o_mini | 4.49 | 4.49 | 0.00 | 4.62 | 4.82 | 4.04 | 64 | 329 |
2 | gpt_4o | 4.47 | 4.47 | 0.02 | 4.61 | 4.82 | 3.99 | 64 | 301 |
2 | qwen25_72b_it | 4.46 | 4.46 | 0.02 | 4.55 | 4.80 | 4.02 | 64 | 326 |
2 | gemma2_ataraxy_9b | 4.45 | 4.45 | 0.00 | 4.61 | 4.53 | 4.21 | 64 | 302 |
3 | nous_hermes_3_405b | 4.44 | 4.44 | 0.00 | 4.53 | 4.74 | 4.05 | 62 | 286 |
3 | claude_3_opus | 4.44 | 4.62 | 0.05 | 4.72 | 4.67 | 4.48 | 64 | 753 |
3 | gemma2_ifable_9b | 4.43 | 4.43 | 0.00 | 4.60 | 4.46 | 4.24 | 64 | 314 |
3 | qwen25_32b_it | 4.42 | 4.42 | 0.00 | 4.54 | 4.71 | 4.01 | 64 | 267 |
3 | qwen2_72b_it | 4.41 | 4.41 | 0.00 | 4.43 | 4.85 | 3.96 | 64 | 242 |
3 | llama_31_405b_it | 4.41 | 4.54 | 0.00 | 4.66 | 4.69 | 4.26 | 64 | 536 |
3 | gemma2_27b_it | 4.41 | 4.41 | 0.00 | 4.63 | 4.73 | 3.88 | 64 | 210 |
4 | command_r_plus_104b_0824 | 4.37 | 4.47 | 0.00 | 4.52 | 4.73 | 4.16 | 64 | 470 |
4 | mistral_nemo_gutenberg_12b_v2 | 4.36 | 4.52 | 0.00 | 4.53 | 4.73 | 4.30 | 64 | 661 |
4 | llama_31_70b_it | 4.33 | 4.44 | 0.00 | 4.61 | 4.38 | 4.31 | 64 | 499 |
4 | gemma2_9b_it_sppo_iter3 | 4.32 | 4.32 | 0.00 | 4.54 | 4.38 | 4.05 | 64 | 226 |
5 | claude_3_haiku | 4.32 | 4.46 | 0.00 | 4.45 | 4.79 | 4.13 | 64 | 589 |
5 | magnum_v2_123b | 4.28 | 4.39 | 0.00 | 4.39 | 4.66 | 4.11 | 64 | 506 |
5 | qwen25_14b_it | 4.27 | 4.27 | 0.00 | 4.35 | 4.58 | 3.89 | 64 | 278 |
6 | command_r_35b_0824 | 4.20 | 4.20 | 0.00 | 4.15 | 4.79 | 3.67 | 64 | 209 |
6 | gemma2_9b_it_simpo | 4.20 | 4.20 | 0.00 | 4.45 | 4.10 | 4.05 | 64 | 322 |
6 | command_r_plus_104b_0424 | 4.20 | 4.34 | 0.00 | 4.33 | 4.63 | 4.07 | 64 | 615 |
6 | deepseek_chat_v2_0628 | 4.18 | 4.19 | 0.00 | 4.21 | 4.66 | 3.69 | 64 | 337 |
7 | wizardlm_2_8x22b | 4.12 | 4.31 | 0.00 | 4.29 | 4.49 | 4.15 | 64 | 832 |
7 | llama_31_8b_it | 4.09 | 4.09 | 0.00 | 4.30 | 4.17 | 3.80 | 64 | 325 |
8 | gemma2_9b_it | 4.03 | 4.03 | 0.00 | 4.34 | 3.93 | 3.81 | 64 | 224 |
8 | gemma2_9b_it_abl | 4.03 | 4.03 | 0.00 | 4.19 | 4.18 | 3.71 | 64 | 162 |
8 | jamba_1_5_large | 3.99 | 3.99 | 0.00 | 4.07 | 4.50 | 3.38 | 64 | 203 |
9 | mini_magnum_12b_v1_1 | 3.96 | 4.08 | 0.00 | 4.02 | 4.50 | 3.72 | 64 | 575 |
9 | saiga_llama3_8b | 3.94 | 3.94 | 0.00 | 3.93 | 4.57 | 3.32 | 64 | 207 |
9 | qwen2_7b_it | 3.94 | 3.94 | 0.00 | 3.77 | 4.61 | 3.42 | 64 | 276 |
9 | ruadapt_llama3_kto_abl | 3.93 | 3.96 | 0.00 | 4.02 | 4.26 | 3.58 | 64 | 357 |
10 | llama_31_euryale_70b_v2_2 | 3.49 | 3.56 | 0.00 | 3.85 | 3.25 | 3.57 | 63 | 439 |
11 | vikhr_gemma_2b_it | 2.81 | 2.90 | 0.00 | 3.00 | 2.60 | 3.09 | 63 | 576 |
11 | phi_35_mini_4b_it | 2.81 | 2.85 | 0.00 | 2.94 | 2.62 | 2.99 | 64 | 417 |