File size: 7,091 Bytes
f31798b
252448b
 
 
f31798b
252448b
02c8f96
252448b
 
 
 
 
 
 
 
 
 
 
 
02c8f96
252448b
 
 
 
 
 
d6fe50e
f31798b
252448b
02c8f96
252448b
 
02c8f96
 
252448b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
02c8f96
252448b
02c8f96
252448b
 
 
 
02c8f96
252448b
 
 
 
02c8f96
252448b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
---
language: fr
datasets:
- FrenchMedMCQA
license: apache-2.0
model-index:
- name: qanastek/FrenchMedMCQA-BART-base-Wikipedia-BM25
  results:
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      name: FrenchMedMCQA
      type: FrenchMedMCQA
      config: FrenchMedMCQA
      split: validation
    metrics:
    - name: Exact Match
      type: exact_match
      value: 18.64
      verified: true
    - name: Hamming Score
      type: hamming score
      value: 38.72
      verified: true
widget:
- text: "Parmi les bactéries suivantes, laquelle est un agent habituel de méningite néonatale? \n (A) Clostridium tetani (B) Salmonella sérovar Typhimurium (C) Streptococcus agalactiae . (D) Haemophilus influenzae (E) Vibrio cholerae\nLes premiers cas d’infection néonatale à streptocoques du groupe B ont été décrits par Eickhoff en 1964.Cette bactérie est aussi responsable d'infection chez les personnes âgées."
---

# FrenchMedMCQA : Multiple-choice question answering on pharmacology exams using BART-base, Wikipedia external knowledge and BM25 retriever

- Corpora: [FrenchMedMCQA](https://github.com/qanastek/FrenchMedMCQA)
- Model: [BART Base](https://huggingface.co/facebook/bart-base)
- Number of Epochs: 30

**People Involved**

* [Yanis LABRAK](https://www.linkedin.com/in/yanis-labrak-8a7412145/) (1)
* [Adrien BAZOGE](https://fr.linkedin.com/in/adrien-bazoge-6b511b145) (2)
* [Richard DUFOUR](https://cv.archives-ouvertes.fr/richard-dufour) (2)
* [Béatrice DAILLE](https://scholar.google.com/citations?user=-damXYEAAAAJ&hl=fr) (2)
* [Pierre-Antoine GOURRAUD](https://fr.linkedin.com/in/pierre-antoine-gourraud-35779b6) (3)
* [Emmanuel MORIN](https://scholar.google.fr/citations?user=tvTEtM0AAAAJ&hl=fr) (2)
* [Mickael ROUVIER](https://scholar.google.fr/citations?user=0fmu-VsAAAAJ&hl=fr) (1)

**Affiliations**

1. [LIA, NLP team](https://lia.univ-avignon.fr/), Avignon University, Avignon, France.
2. [LS2N, TALN team](https://www.ls2n.fr/equipe/taln/), Nantes University, Nantes, France.
3. [CHU Nantes](https://www.chu-nantes.fr/), Nantes University, Nantes, France.

## Demo: How to use in HuggingFace Transformers

Requires [Transformers](https://pypi.org/project/transformers/): ```pip install transformers```

```python
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

path_model = "qanastek/FrenchMedMCQA-BART-base-Wikipedia-BM25"

tokenizer = AutoTokenizer.from_pretrained(path_model)
model = AutoModelForSequenceClassification.from_pretrained(path_model)

pipeline = pipeline(task="summarization", model=model, tokenizer=tokenizer) # CPU

dataset  = load_dataset("qanastek/FrenchMedMCQA")["test"]

for e in dataset:
    prediction = pipeline(source, truncation=True, max_length=900)[0]["summary_text"]
```

Output:

![Preview Output](preview.PNG)

## Training data

The questions and their associated candidate answer(s) were collected from real French pharmacy exams on the remede website. Questions and answers were manually created by medical experts and used during  examinations. The dataset is composed of 2,025 questions with multiple answers and 1,080 with a single one, for a total of 3,105 questions. Each instance of the dataset contains an identifier, a question, five options (labeled from A to E) and correct answer(s). The average question length is 14.17 tokens and the average answer length is 6.44 tokens. The vocabulary size is of 13k words, of which 3.8k are estimated medical domain-specific words (i.e. a word related to the medical field). We find an average of 2.49 medical domain-specific words in each question (17 % of the words) and 2 in each answer (36 % of the words). On average, a medical domain-specific word is present in 2 questions and in 8 answers.

| # Answers | Training | Validation | Test | Total |
|:---------:|:--------:|:----------:|:----:|:-----:|
|     1     |    595   |     164    |  321 | 1,080 |
|     2     |    528   |     45     |  97  |  670  |
|     3     |    718   |     71     |  141 |  930  |
|     4     |    296   |     30     |  56  |  382  |
|     5     |    34    |      2     |   7  |   43  |
|   Total   |   2171   |     312    |  622 | 3,105 |

## Evaluation results

The test corpora used for this evaluation is available on [Github](https://github.com/qanastek/FrenchMedMCQA).

|   Architecture   | Hamming |  EMR  | Hamming |  EMR  | Hamming |  EMR  | Hamming |  EMR  | Hamming |  EMR  |
|:----------------:|:-------:|:-----:|:-------:|:-----:|:-------:|:-----:|:-------:|:-----:|:-------:|:-----:|
|   BioBERT V1.1   |  36.19  | 15.43 |  **38.72**  | 16.72 |  33.33  | 14.14 |  35.13  | 16.23 |  34.27  | 13.98 |
|    PubMedBERT    |  33.98  | 14.14 |  34.00  | 13.98 |  35.66  | 15.59 |  33.87  | 14.79 |  35.44  | 14.79 |
|  CamemBERT-base  |  36.24  | 16.55 |  34.19  | 14.46 |  34.78  | 15.43 |  34.66  | 14.79 |  34.61  | 14.95 |
| XLM-RoBERTa-base |  37.92  | 17.20 |  31.26  | 11.89 |  35.84  | 16.07 |  32.47  | 14.63 |  33.00  | 14.95 |
|     BART-base    |  31.93  | 15.91 |  34.98  | **18.64** |  33.80  | 17.68 |  29.65  | 12.86 |  34.65  | 18.32 |

## BibTeX Citations

Please cite the following paper when using this model.

FrenchMedMCQA corpus and linked tools:

```latex
@unpublished{labrak:hal-03824241,
  TITLE = {{FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain}},
  AUTHOR = {Labrak, Yanis and Bazoge, Adrien and Dufour, Richard and Daille, B{\'e}atrice and Gourraud, Pierre-Antoine and Morin, Emmanuel and Rouvier, Mickael},
  URL = {https://hal.archives-ouvertes.fr/hal-03824241},
  NOTE = {working paper or preprint},
  YEAR = {2022},
  MONTH = Oct,
  PDF = {https://hal.archives-ouvertes.fr/hal-03824241/file/LOUHI_2022___QA-3.pdf},
  HAL_ID = {hal-03824241},
  HAL_VERSION = {v1},
}
```

HuggingFace's Transformers :

```latex
@misc{https://doi.org/10.48550/arxiv.1910.03771,
    doi = {10.48550/ARXIV.1910.03771},
    url = {https://arxiv.org/abs/1910.03771},
    author = {Wolf, Thomas and Debut, Lysandre and Sanh, Victor and Chaumond, Julien and Delangue, Clement and Moi, Anthony and Cistac, Pierric and Rault, Tim and Louf, Rémi and Funtowicz, Morgan and Davison, Joe and Shleifer, Sam and von Platen, Patrick and Ma, Clara and Jernite, Yacine and Plu, Julien and Xu, Canwen and Scao, Teven Le and Gugger, Sylvain and Drame, Mariama and Lhoest, Quentin and Rush, Alexander M.},
    keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
    title = {HuggingFace's Transformers: State-of-the-art Natural Language Processing},
    publisher = {arXiv},
    year = {2019}, 
    copyright = {arXiv.org perpetual, non-exclusive license}
}
```

## Acknowledgment

This work was financially supported by [Zenidoc](https://zenidoc.fr/), the [DIETS](https://anr-diets.univ-avignon.fr/) project financed by the Agence Nationale de la Recherche (ANR) under contract ANR-20-CE23-0005 and the ANR [AIBy4](https://aiby4.ls2n.fr/) (ANR-20-THIA-0011).