Chenxi Whitehouse commited on
Commit
147d3e2
1 Parent(s): 90d6532

update file name

Browse files
README.md CHANGED
@@ -87,13 +87,13 @@ bash script/scraper.sh <split> <start_idx> <end_idx>
87
 
88
  ### 2. Rank the sentences in the knowledge store with BM25
89
  Then, we rank the scraped sentences for each claim using BM25 (based on the similarity to the claim), keeping the top 100 sentences per claim.
90
- See [bm25_sentences.py](https://huggingface.co/chenxwh/AVeriTeC/blob/main/src/reranking/bm25_sentences.py) for more argument options. We provide the output file for this step on the dev set [here]().
91
  ```bash
92
  python -m src.reranking.bm25_sentences
93
  ```
94
 
95
  ### 3. Generate questions-answer pair for the top sentences
96
- We use [BLOOM](https://huggingface.co/bigscience/bloom-7b1) to generate QA paris for each of the top 100 sentence, providing 10 closest claim-QA-pairs from the training set as in-context examples. See [question_generation_top_sentences.py](https://huggingface.co/chenxwh/AVeriTeC/blob/main/src/reranking/question_generation_top_sentences.py) for more argument options. We provide the output file for this step on the dev set [here]().
97
  ```bash
98
  python -m src.reranking.question_generation_top_sentences
99
  ```
 
87
 
88
  ### 2. Rank the sentences in the knowledge store with BM25
89
  Then, we rank the scraped sentences for each claim using BM25 (based on the similarity to the claim), keeping the top 100 sentences per claim.
90
+ See [bm25_sentences.py](https://huggingface.co/chenxwh/AVeriTeC/blob/main/src/reranking/bm25_sentences.py) for more argument options. We provide the output file for this step on the dev set [here](https://huggingface.co/chenxwh/AVeriTeC/blob/main/data_store/dev_top_k_sentences.json).
91
  ```bash
92
  python -m src.reranking.bm25_sentences
93
  ```
94
 
95
  ### 3. Generate questions-answer pair for the top sentences
96
+ We use [BLOOM](https://huggingface.co/bigscience/bloom-7b1) to generate QA paris for each of the top 100 sentence, providing 10 closest claim-QA-pairs from the training set as in-context examples. See [question_generation_top_sentences.py](https://huggingface.co/chenxwh/AVeriTeC/blob/main/src/reranking/question_generation_top_sentences.py) for more argument options. We provide the output file for this step on the dev set [here](https://huggingface.co/chenxwh/AVeriTeC/blob/main/data_store/ddev_top_k_qa.json).
97
  ```bash
98
  python -m src.reranking.question_generation_top_sentences
99
  ```
src/reranking/question_generation_top_sentences.py CHANGED
@@ -58,13 +58,13 @@ if __name__ == "__main__":
58
  parser.add_argument(
59
  "-i",
60
  "--top_k_target_knowledge",
61
- default="data_store/dev_top_k.json",
62
  help="Directory where the sentences for the scraped data is saved.",
63
  )
64
  parser.add_argument(
65
  "-o",
66
  "--output_questions",
67
- default="data_store/dev_bm25_questions.json",
68
  help="Directory where the sentences for the scraped data is saved.",
69
  )
70
  parser.add_argument(
 
58
  parser.add_argument(
59
  "-i",
60
  "--top_k_target_knowledge",
61
+ default="data_store/dev_top_k_sentences.json",
62
  help="Directory where the sentences for the scraped data is saved.",
63
  )
64
  parser.add_argument(
65
  "-o",
66
  "--output_questions",
67
+ default="data_store/dev_top_k_qa.json",
68
  help="Directory where the sentences for the scraped data is saved.",
69
  )
70
  parser.add_argument(