Model name | Final score | Length norm score | Refusal ratio | Stay in character score | Language fluency score | Entertainment score | Num situations | Avg length |
---|---|---|---|---|---|---|---|---|
claude_3_5_sonnet | 8.12 | 7.84 | 0.28 | 8.46 | 7.98 | 7.91 | 64 | 387 |
claude_3_haiku | 7.91 | 7.33 | 0.02 | 8.15 | 7.99 | 7.58 | 64 | 520 |
llama31_405b_it | 7.83 | 7.49 | 0.02 | 8.19 | 7.76 | 7.55 | 64 | 415 |
llama31_70b_it | 7.70 | 7.37 | 0.00 | 8.13 | 7.51 | 7.47 | 64 | 410 |
wizardlm_2_8x22b | 7.64 | 6.70 | 0.00 | 7.78 | 7.84 | 7.32 | 64 | 699 |
gpt_4o_mini | 7.52 | 7.51 | 0.00 | 7.84 | 7.60 | 7.11 | 64 | 275 |
gpt_4o | 7.46 | 7.46 | 0.02 | 7.77 | 7.63 | 6.96 | 64 | 237 |
gemma2_27b_it | 7.45 | 7.45 | 0.00 | 7.94 | 7.44 | 6.97 | 64 | 249 |
llama31_8b_it | 7.40 | 7.40 | 0.02 | 7.71 | 7.39 | 7.12 | 64 | 267 |
saiga_gemma2_9b | 7.32 | 7.32 | 0.00 | 7.56 | 7.40 | 7.00 | 64 | 266 |
mini_magnum_12b_v1_1 | 7.29 | 6.97 | 0.00 | 7.56 | 7.54 | 6.76 | 64 | 405 |
magnum_72b | 7.24 | 7.20 | 0.00 | 7.51 | 7.47 | 6.75 | 64 | 289 |
saiga_tlite_8b | 7.11 | 7.11 | 0.00 | 7.34 | 7.49 | 6.51 | 64 | 171 |
gemma2_9b_it | 7.10 | 7.10 | 0.00 | 7.48 | 7.06 | 6.78 | 64 | 174 |
saiga_llama3_8b_v7 | 6.89 | 6.89 | 0.00 | 7.17 | 7.28 | 6.23 | 64 | 203 |
gemma2_2b_it_abl | 6.17 | 6.17 | 0.00 | 6.44 | 6.51 | 5.55 | 64 | 140 |