dataset,prompt,metric,value xcopa_zh,C1 or C2? premise_zhht,accuracy,0.65 xcopa_zh,best_option_zhht,accuracy,0.77 xcopa_zh,cause_effect_zhht,accuracy,0.76 xcopa_zh,i_am_hesitating_zhht,accuracy,0.73 xcopa_zh,plausible_alternatives_zhht,accuracy,0.78 xcopa_zh,median,accuracy,0.76 xstory_cloze_zh,Answer Given options_zhht,accuracy,0.7299801455989411 xstory_cloze_zh,Choose Story Ending_zhht,accuracy,0.8537392455327598 xstory_cloze_zh,Generate Ending_zhht,accuracy,0.6082064857710126 xstory_cloze_zh,Novel Correct Ending_zhht,accuracy,0.8246194573130378 xstory_cloze_zh,Story Continuation and Options_zhht,accuracy,0.8166776968894772 xstory_cloze_zh,median,accuracy,0.8166776968894772 xwinograd_zh,Replace_zhht,accuracy,0.5972222222222222 xwinograd_zh,True or False_zhht,accuracy,0.5218253968253969 xwinograd_zh,does underscore refer to_zhht,accuracy,0.5059523809523809 xwinograd_zh,stand for_zhht,accuracy,0.5059523809523809 xwinograd_zh,underscore refer to_zhht,accuracy,0.5099206349206349 xwinograd_zh,median,accuracy,0.5099206349206349 multiple,average,multiple,0.6955327772700374