README.md · stormsidali2001/IMRAD_introduction_moves

metadata

library_name: tf-keras
pipeline_tag: text-classification
widget:
  - text: >-
      Electronic cigarettes (also known as vapes, vaporizers, or vape pens) were
      introduced into the US market in 2007.
    output:
      - label: Establishing a Research Territory
        score: 0.9
      - label: Establishing a Niche
        score: 0.05
      - label: Occupying the Niche
        score: 0.05
license: mit
datasets:
  - stormsidali2001/IMRAD-introduction-sentences-moves-sub-moves-dataset
language:
  - en
metrics:
  - f1
  - accuracy
base_model: google/bert-base-cased

IMRaD Introduction Move Classifier

This model is a fine-tuned BERT model designed to classify sentences from the introductions of scientific research papers into one of three IMRaD moves:

Establishing a Research Territory: Setting the context and background information for the research.
Establishing a Niche: Identifying a gap or problem in existing research.
Occupying the Niche: Proposing a solution or approach to address the identified gap.

Intended Uses & Limitations

Intended Uses:

Scientific Writing Assistance: Help researchers and students analyze and improve the structure of their introductions by identifying the IMRaD moves present in each sentence.
Literature Review Analysis: Assist in quickly understanding the rhetorical structure of introductions in a set of research papers.
Educational Tool: Illustrate IMRaD concepts and their practical application in scientific writing.

Limitations:

Domain Specificity: The model was trained on a dataset of scientific research papers and might not perform as well on other types of text.
Accuracy: While the model achieves good accuracy, it's not perfect. Predictions should be reviewed carefully, especially in complex or ambiguous sentences.
Sentence-Level Classification: The model classifies individual sentences. It does not provide an overall analysis of the entire introduction.

Training and Evaluation Data

The model was trained and evaluated on the "IMRAD Introduction Sentences Moves & Sub-moves Dataset" available on Hugging Face: https://huggingface.co/datasets/stormsidali2001/IMRAD-introduction-sentences-moves-sub-moves-dataset

The dataset consists of sentences extracted from scientific research paper introductions, manually labeled with their corresponding IMRaD moves.

Training Details:

The bert-base-cased model from Google was used as the base model.
Fine-tuning was performed using a TensorFlow/Keras implementation.
Evaluation metrics include F1 score and accuracy.

How to Use

You can use this model with the pipeline function from the transformers library:

from transformers import pipeline

classifier = pipeline("text-classification", model="your-username/your-model-name")

sentence = "Electronic cigarettes were introduced into the US market in 2007."
result = classifier(sentence)

print(result)