partypress
/

partypress-monolingual-ireland

@@ -1,150 +1,48 @@
 ---
 license: cc-by-sa-4.0
-language:
-- en
-metrics:
-- accuracy
-pipeline_tag: text-classification
 tags:
-- partypress
-- political science
-- parties
-- press releases
-widget:
- - text: "The Labour Party will seek to re-open negotiations on sugar production on the possibility of regaining sugar quota when the matter comes up for review at EU level.The commitment follows last week's hearing of the Oireachtas Agricultural Committee at which it was stated that Ireland could not have access to sugar quota at this time.Sean Sherlock said:  What this means in real terms is that Ireland is not allowed to produce sugar for the EU domestic market nor for export to countries outside the EU. However, the review of the current EU sugar regime scheduled to take place in 2014 must be seized as an opportunity to secure an allocation of tonnage or quota to enable Ireland produce sugar again. While this is an issue dominated by countries such as Germany and France the Labour Party will adopt a strong position and we will be seeking a feasibility study involving all stakeholders and a renegotiation at European level through intense dialogue with our European partners."
 ---
-# PARTYPRESS monolingual Ireland
-Fine-tuned model, based on [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english). Used in Erfort et al. (2023), building on the PARTYPRESS database. For the downstream task of classyfing press releases from political parties into 23 unique policy areas we achieve a performance comparable to expert human coders.
 ## Model description
-The PARTYPRESS monolingual model builds on [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english) but has a supervised component. This means, it was fine-tuned using texts labeled by humans. The labels indicate 23 different political issue categories derived from the Comparative Agendas Project (CAP):
-| Code | Issue |
-|--|-------|
-| 1 | Macroeconomics |
-| 2 | Civil Rights |
-| 3 | Health |
-| 4 | Agriculture |
-| 5 | Labor |
-| 6 | Education |
-| 7 | Environment |
-| 8 | Energy |
-| 9 | Immigration |
-| 10 | Transportation |
-| 12 | Law and Crime |
-| 13 | Social Welfare |
-| 14 | Housing |
-| 15 | Domestic Commerce |
-| 16 | Defense |
-| 17 | Technology |
-| 18 | Foreign Trade |
-| 19.1 | International Affairs |
-| 19.2 | European Union |
-| 20 | Government Operations |
-| 23 | Culture |
-| 98 | Non-thematic |
-| 99 | Other |
-## Model variations
-There are several monolingual models for different countries, and a multilingual model. The multilingual model can be easily extended to other languages, country contexts, or time periods by fine-tuning it with minimal additional labeled texts.
 ## Intended uses & limitations
-The main use of the model is for text classification of press releases from political parties. It may also be useful for other political texts.
-The classification can then be used to measure which issues parties are discussing in their communication.
-### How to use
-This model can be used directly with a pipeline for text classification:
-```python
->>> from transformers import pipeline
->>> tokenizer_kwargs = {'padding':True,'truncation':True,'max_length':512}
->>> partypress = pipeline("text-classification", model = "cornelius/partypress-monolingual-ireland", tokenizer = "cornelius/partypress-monolingual-ireland", **tokenizer_kwargs)
->>> partypress("Your text here.")
-```
-### Limitations and bias
-The model was trained with data from parties in Ireland. For use in other countries, the model may be further fine-tuned. Without further fine-tuning, the performance of the model may be lower.
-The model may have biased predictions. We discuss some biases by country, party, and over time in the release paper for the PARTYPRESS database. For example, the performance is highest for press releases from Ireland (75%) and lowest for Poland (55%).
-## Training data
-The PARTYPRESS multilingual model was fine-tuned with about 3,000 press releases from parties in Ireland. The press releases were labeled by two expert human coders.
-For the training data of the underlying model, please refer to [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
 ## Training procedure
-### Preprocessing
-For the preprocessing, please refer to [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
-### Pretraining
-For the pretraining, please refer to [distilbert-base-uncased-finetuned-sst-2-english](https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english)
-### Fine-tuning
-We fine-tuned the model using about 3,000 labeled press releases from political parties in Ireland.
-#### Training Hyperparameters
-The batch size for training was 12, for testing 2, with four epochs. All other hyperparameters were the standard from the transformers library.
-#### Framework versions
 - Transformers 4.28.0
 - TensorFlow 2.12.0
 - Datasets 2.12.0
 - Tokenizers 0.13.3
-## Evaluation results
-Fine-tuned on our downstream task, this model achieves the following results in a five-fold cross validation that are comparable to the performance of our expert human coders. Please refer to Erfort et al. (2023)
-### BibTeX entry and citation info
-```bibtex
-@article{erfort_partypress_2023,
-  author    = {Cornelius Erfort and
-               Lukas F. Stoetzer and
-               Heike Klüver},
-  title     = {The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases},
-  journal   = {Research and Politics},
-  volume    = {forthcoming},
-  year      = {2023},
-}
-```
-### Further resources
-Github: [cornelius-erfort/partypress](https://github.com/cornelius-erfort/partypress)
-Research and Politics Dataverse: [Replication Data for: The PARTYPRESS Database: A New Comparative Database of Parties’ Press Releases](https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi%3A10.7910%2FDVN%2FOINX7Q)
-## Acknowledgements
-Research for this contribution is part of the Cluster of Excellence "Contestations of the Liberal Script" (EXC 2055, Project-ID: 390715649), funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Ireland´s Excellence Strategy. Cornelius Erfort is moreover grateful for generous funding provided by the DFG through the Research Training Group DYNAMICS (GRK 2458/1).
-## Contact
-Cornelius Erfort
-Humboldt-Universität zu Berlin
-[corneliuserfort.de](corneliuserfort.de)

 ---
 license: cc-by-sa-4.0
 tags:
+- generated_from_keras_callback
+model-index:
+- name: partypress-monolingual-ireland
+  results: []
 ---
+<!-- This model card has been generated automatically according to the information Keras had access to. You should
+probably proofread and complete it, then remove this comment. -->
+# partypress-monolingual-ireland
+This model is a fine-tuned version of [cornelius/partypress-monolingual-ireland](https://huggingface.co/cornelius/partypress-monolingual-ireland) on an unknown dataset.
+It achieves the following results on the evaluation set:
 ## Model description
+More information needed
 ## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
 ## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- optimizer: None
+- training_precision: float32
+### Training results
+### Framework versions
 - Transformers 4.28.0
 - TensorFlow 2.12.0
 - Datasets 2.12.0
 - Tokenizers 0.13.3

config.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "_name_or_path": "cornelius/partypress-monolingual-ireland",
   "architectures": [
-    "BertForSequenceClassification"
   ],
   "attention_probs_dropout_prob": 0.1,
   "bos_token_id": 0,
@@ -64,7 +64,7 @@
   },
   "layer_norm_eps": 1e-05,
   "max_position_embeddings": 514,
-  "model_type": "bert",
   "num_attention_heads": 12,
   "num_hidden_layers": 12,
   "pad_token_id": 1,

 {
   "_name_or_path": "cornelius/partypress-monolingual-ireland",
   "architectures": [
+    "RobertaForSequenceClassification"
   ],
   "attention_probs_dropout_prob": 0.1,
   "bos_token_id": 0,
   },
   "layer_norm_eps": 1e-05,
   "max_position_embeddings": 514,
+  "model_type": "roberta",
   "num_attention_heads": 12,
   "num_hidden_layers": 12,
   "pad_token_id": 1,

tf_model.h5 CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:bf9fc2dab165f12db2bed75d873be844c233a9413dcb52d1a395b1b1567d1a19
-size 498941292

 version https://git-lfs.github.com/spec/v1
+oid sha256:75c528cf65de88a65572556e6f76b36e887bf2ccde4932a400b3b21c12e8782d
+size 498942784