cornelius commited on
Commit
9c9e091
1 Parent(s): 7745d5f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -19
README.md CHANGED
@@ -1,47 +1,153 @@
1
  ---
 
 
 
 
 
 
2
  tags:
3
- - generated_from_keras_callback
4
- model-index:
5
- - name: partypress-monolingual-germany
6
- results: []
 
 
7
  ---
8
 
9
- <!-- This model card has been generated automatically according to the information Keras had access to. You should
10
- probably proofread and complete it, then remove this comment. -->
 
 
 
11
 
12
- # partypress-monolingual-germany
13
 
14
- This model is a fine-tuned version of [cornelius/partypress-monolingual-germany](https://huggingface.co/cornelius/partypress-monolingual-germany) on an unknown dataset.
15
- It achieves the following results on the evaluation set:
16
 
17
 
18
  ## Model description
19
 
20
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  ## Intended uses & limitations
23
 
24
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
- ## Training and evaluation data
27
 
28
- More information needed
 
 
 
 
29
 
30
  ## Training procedure
31
 
32
- ### Training hyperparameters
 
 
 
 
33
 
34
- The following hyperparameters were used during training:
35
- - optimizer: None
36
- - training_precision: float32
37
 
38
- ### Training results
39
 
 
40
 
 
41
 
42
- ### Framework versions
 
 
 
43
 
44
  - Transformers 4.28.0
45
  - TensorFlow 2.12.0
46
  - Datasets 2.12.0
47
  - Tokenizers 0.13.3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: cc-by-sa-4.0
3
+ language:
4
+ - de
5
+ metrics:
6
+ - accuracy
7
+ pipeline_tag: text-classification
8
  tags:
9
+ - partypress
10
+ - political science
11
+ - parties
12
+ - press releases
13
+ widget:
14
+ - text: 'Zur Forderung des DGB-Chefs Hoffmann nach einer Debatte über Soziale Marktwirtschaft, erklärt der Sozialpolitische Sprecher der Af D-Bundestagsfraktion, Uwe Witt: „Die Soziale Marktwirtschaft steht vor der größten Herausforderung seit Bestehen der Bundesrepublik. Eine Beschäftigung damit, wie es in Zukunft weitergehen soll, ist dringend geboten. Wir haben in Deutschland noch immer den besten Sozialstaat Europas. Wenn wir diesen erhalten wollen, müssen wir aufhören, ihm die ökonomische Grundlage zu entziehen. Soziale Marktwirtschaft braucht zwingend einen funktionierenden und konkurrenzfähigen Mittelstand.'
15
  ---
16
 
17
+ # PARTYPRESS monolingual Germany
18
+
19
+
20
+ Fine-tuned model, based on [dbmdz/bert-base-german-cased](https://huggingface.co/dbmdz/bert-base-german-cased). Used in Erfort et al. (2023), building on the PARTYPRESS database. For the downstream task of classyfing press releases from political parties into 23 unique policy areas we achieve a performance comparable to expert human coders.
21
+
22
 
 
23
 
 
 
24
 
25
 
26
  ## Model description
27
 
28
+ The PARTYPRESS monolingual model builds on [dbmdz/bert-base-german-cased](https://huggingface.co/dbmdz/bert-base-german-cased) but has a supervised component. This means, it was fine-tuned using texts labeled by humans. The labels indicate 23 different political issue categories derived from the Comparative Agendas Project (CAP):
29
+ | Code | Issue |
30
+ |--|-------|
31
+ | 1 | Macroeconomics |
32
+ | 2 | Civil Rights |
33
+ | 3 | Health |
34
+ | 4 | Agriculture |
35
+ | 5 | Labor |
36
+ | 6 | Education |
37
+ | 7 | Environment |
38
+ | 8 | Energy |
39
+ | 9 | Immigration |
40
+ | 10 | Transportation |
41
+ | 12 | Law and Crime |
42
+ | 13 | Social Welfare |
43
+ | 14 | Housing |
44
+ | 15 | Domestic Commerce |
45
+ | 16 | Defense |
46
+ | 17 | Technology |
47
+ | 18 | Foreign Trade |
48
+ | 19.1 | International Affairs |
49
+ | 19.2 | European Union |
50
+ | 20 | Government Operations |
51
+ | 23 | Culture |
52
+ | 98 | Non-thematic |
53
+ | 99 | Other |
54
+
55
+ ## Model variations
56
+
57
+ There are several monolingual models for different countries, and a multilingual model. The multilingual model can be easily extended to other languages, country contexts, or time periods by fine-tuning it with minimal additional labeled texts.
58
 
59
  ## Intended uses & limitations
60
 
61
+ The main use of the model is for text classification of press releases from political parties. It may also be useful for other political texts.
62
+
63
+ The classification can then be used to measure which issues parties are discussing in their communication.
64
+
65
+ ### How to use
66
+
67
+ This model can be used directly with a pipeline for text classification:
68
+
69
+ ```python
70
+ >>> from transformers import pipeline
71
+ >>> tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512}
72
+ >>> partypress = pipeline("text-classification", model = "cornelius/partypress-monolingual-germany", tokenizer = "cornelius/partypress-monolingual-germany", **tokenizer_kwargs)
73
+ >>> partypress("Your text here.")
74
+ ```
75
+
76
+ ### Limitations and bias
77
+
78
+ The model was trained with data from parties in Germany. For use in other countries, the model may be further fine-tuned. Without further fine-tuning, the performance of the model may be lower.
79
 
80
+ The model may have biased predictions. We discuss some biases by country, party, and over time in the release paper for the PARTYPRESS database. For example, the performance is highest for press releases from Ireland (75%) and lowest for Poland (55%).
81
 
82
+ ## Training data
83
+
84
+ The PARTYPRESS multilingual model was fine-tuned with about 3,000 press releases from parties in Germany. The press releases were labeled by two expert human coders.
85
+
86
+ For the training data of the underlying model, please refer to [dbmdz/bert-base-german-cased](https://huggingface.co/dbmdz/bert-base-german-cased)
87
 
88
  ## Training procedure
89
 
90
+ ### Preprocessing
91
+
92
+ For the preprocessing, please refer to [dbmdz/bert-base-german-cased](https://huggingface.co/dbmdz/bert-base-german-cased)
93
+
94
+ ### Pretraining
95
 
96
+ For the pretraining, please refer to [dbmdz/bert-base-german-cased](https://huggingface.co/dbmdz/bert-base-german-cased)
 
 
97
 
98
+ ### Fine-tuning
99
 
100
+ We fine-tuned the model using about 3,000 labeled press releases from political parties in Germany.
101
 
102
+ #### Training Hyperparameters
103
 
104
+ The batch size for training was 12, for testing 2, with four epochs. All other hyperparameters were the standard from the transformers library.
105
+
106
+
107
+ #### Framework versions
108
 
109
  - Transformers 4.28.0
110
  - TensorFlow 2.12.0
111
  - Datasets 2.12.0
112
  - Tokenizers 0.13.3
113
+
114
+
115
+ ## Evaluation results
116
+
117
+ Fine-tuned on our downstream task, this model achieves the following results in a five-fold cross validation that are comparable to the performance of our expert human coders. Please refer to Erfort et al. (2023)
118
+
119
+ ### BibTeX entry and citation info
120
+
121
+ ```bibtex
122
+ @article{erfort_partypress_2023,
123
+ author = {Cornelius Erfort and
124
+ Lukas F. Stoetzer and
125
+ Heike Klüver},
126
+ title = {The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases},
127
+ journal = {Research and Politics},
128
+ volume = {forthcoming},
129
+ year = {2023},
130
+ }
131
+ ```
132
+
133
+ ### Further resources
134
+
135
+ Github: [cornelius-erfort/partypress](https://github.com/cornelius-erfort/partypress)
136
+
137
+ Research and Politics Dataverse: [Replication Data for: The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FOINX7Q)
138
+
139
+
140
+
141
+ ## Acknowledgements
142
+
143
+ Research for this contribution is part of the Cluster of Excellence "Contestations of the Liberal Script" (EXC 2055, Project-ID: 390715649), funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany´s Excellence Strategy. Cornelius Erfort is moreover grateful for generous funding provided by the DFG through the Research Training Group DYNAMICS (GRK 2458/1).
144
+
145
+ ## Contact
146
+
147
+ Cornelius Erfort
148
+
149
+ Humboldt-Universität zu Berlin
150
+
151
+ [corneliuserfort.de](corneliuserfort.de)
152
+
153
+