陈俊杰 commited on
Commit
3d4d37c
1 Parent(s): d943032
Files changed (1) hide show
  1. app.py +2 -0
app.py CHANGED
@@ -135,7 +135,9 @@ st.markdown("""
135
  if page == "Introduction":
136
  st.header("Introduction")
137
  st.markdown("""
 
138
  The Automatic Evaluation of LLMs (AEOLLM) task is a new core task in [NTCIR-18](http://research.nii.ac.jp/ntcir/ntcir-18) to support in-depth research on large language models (LLMs) evaluation. As LLMs grow popular in both fields of academia and industry, how to effectively evaluate the capacity of LLMs becomes an increasingly critical but still challenging issue. Existing methods can be divided into two types: manual evaluation, which is expensive, and automatic evaluation, which faces many limitations including the task format (the majority belong to multiple-choice questions) and evaluation criteria (occupied by reference-based metrics). To advance the innovation of automatic evaluation, we proposed the Automatic Evaluation of LLMs (AEOLLM) task which focuses on generative tasks and encourages reference-free methods. Besides, we set up diverse subtasks such as summary generation, non-factoid question answering, text expansion, and dialogue generation to comprehensively test different methods. We believe that the AEOLLM task will facilitate the development of the LLMs community.
 
139
  """)
140
 
141
  elif page == "Methodology":
 
135
  if page == "Introduction":
136
  st.header("Introduction")
137
  st.markdown("""
138
+ <div style='font-size: 48px;line-height: 1.8;'>
139
  The Automatic Evaluation of LLMs (AEOLLM) task is a new core task in [NTCIR-18](http://research.nii.ac.jp/ntcir/ntcir-18) to support in-depth research on large language models (LLMs) evaluation. As LLMs grow popular in both fields of academia and industry, how to effectively evaluate the capacity of LLMs becomes an increasingly critical but still challenging issue. Existing methods can be divided into two types: manual evaluation, which is expensive, and automatic evaluation, which faces many limitations including the task format (the majority belong to multiple-choice questions) and evaluation criteria (occupied by reference-based metrics). To advance the innovation of automatic evaluation, we proposed the Automatic Evaluation of LLMs (AEOLLM) task which focuses on generative tasks and encourages reference-free methods. Besides, we set up diverse subtasks such as summary generation, non-factoid question answering, text expansion, and dialogue generation to comprehensively test different methods. We believe that the AEOLLM task will facilitate the development of the LLMs community.
140
+ </div>
141
  """)
142
 
143
  elif page == "Methodology":