FineTuned_Finbert / README.md
kdave's picture
Update README.md
d919c26
|
raw
history blame
9.24 kB
metadata
language:
  - en
tags:
  - financial-sentiment-analysis
  - Transformers
  - ' sentiment-analysis'
  - TensorFlow
  - PyTorch
  - Text Classification
  - bert
  - Inference Endpoints

Model Card for FineTuned finbert model

Our fine-tuned FinBERT model is a powerful tool designed for sentiment analysis specifically tailored to Indian stock market news. Leveraging the foundation of FinBERT, a BERT model pre-trained on extensive financial communication text (https://huggingface.co/yiyanghkust/finbert-tone) , our model focuses on enhancing sentiment analysis within the context of the Indian financial landscape.

Model Details

Model Description

  • Developed by: Khushi Dave
  • Model type: BERT (Bidirectional Encoder Representations from Transformers)
  • Language: English
  • Finetuned from model: yiyanghkust/finbert-tone

Model Sources

Uses

The Fine-Tuned FinBERT model is designed for sentiment analysis in Indian stock market news. It's beneficial for researchers, financial analysts, and developers aiming to enhance sentiment assessments. Users include those making investment decisions and the academic community. Responsible usage and acknowledgment of the original FinBERT model are encouraged.

In essence, it's a valuable tool for understanding market sentiment in the Indian context, catering to professionals and individuals engaged in financial analysis and research.

Direct Use

from transformers import BertTokenizer, BertForSequenceClassification
from transformers import pipeline

# Load the fine-tuned FinBERT model and tokenizer
finbert = BertForSequenceClassification.from_pretrained('kdave/FineTuned_Finbert', num_labels=3)
tokenizer = BertTokenizer.from_pretrained('kdave/FineTuned_Finbert')

# Create a sentiment-analysis pipeline
nlp_pipeline = pipeline("sentiment-analysis",  model =  finbert, tokenizer = tokenizer)

# Example sentences related to Indian stock market news
sentences = [
    "The Indian stock market experienced a surge in trading activity.",
    "Investors are optimistic about the future of Indian financial markets.",
    "Concerns about economic uncertainties are affecting stock prices in India.",
    "Earnings reports from Indian companies show a positive trend."
]

# Perform sentiment analysis using the fine-tuned FinBERT model for Indian stock market news
results = nlp_pipeline(sentences)
print(results)

Out-of-Scope Use

  1. Misuse:

Deliberate Misinformation: The model may be misused if fed with intentionally crafted misinformation to manipulate sentiment analysis results. Users should ensure the input data is authentic and unbiased.

  1. Malicious Use:

Market Manipulation Attempts: Any attempt to use the model to propagate false sentiment for the purpose of market manipulation is strictly unethical and against the intended use of the model.

  1. Limitations:

Non-Financial Texts: The model is fine-tuned specifically for Indian stock market news. It may not perform optimally when applied to non-financial texts or unrelated domains.

Extreme Outliers: Unusual or extreme cases in sentiment expression might pose challenges. The model's performance might be less reliable for exceptionally rare or unconventional sentiment expressions.

Non-Standard Language: The model's training data primarily comprises standard financial language. It may not perform as well when faced with non-standard language, colloquialisms, or slang.

Bias, Risks, and Limitations

Technical Limitations:

  1. Domain Specificity:

    • The model is finely tuned for Indian stock market news, limiting its effectiveness when applied to texts outside this domain.
  2. Data Representativeness:

    • The model's performance is contingent on the representativeness of the training data. It may not capture nuances in sentiment expressions not well-represented in the training corpus.
  3. Language Complexity:

    • Non-standard language, colloquialisms, or slang may pose challenges, as the model is primarily trained on standard financial language.

Sociotechnical Considerations:

  1. Bias in Training Data:

    • The model inherits biases present in the training data. Efforts have been made to curate diverse data, but biases, if present, may affect the model's outputs.
  2. Ethical Usage:

    • Users are urged to employ the model ethically, avoiding misuse or malicious applications that may impact market sentiment or manipulate results.

Risks:

  1. Decisions Based Solely on Model Output:

    • Relying solely on the model for decision-making is discouraged. Users should supplement model insights with additional research and expert judgment.
  2. Market Dynamics:

    • The model might not account for sudden and unprecedented market events, and decisions should consider real-time market dynamics.

Responsible Model Usage:

Understanding these limitations, users are advised to interpret model outputs judiciously, considering the context and potential biases. Transparent communication and awareness of both technical and sociotechnical constraints are essential for responsible model usage. While the model is a valuable tool, it is not infallible, and decision-makers should exercise prudence and diligence.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

How to Get Started with the Model

Use the code below to get started with the model.

Step 1: Install Required Libraries Ensure you have the necessary libraries installed by running:

pip install transformers

Step 2: Load the Fine-Tuned Model Use the following Python code to load the model and tokenizer:

from transformers import BertTokenizer, BertForSequenceClassification
from transformers import pipeline

# Load the fine-tuned FinBERT model and tokenizer
finbert = BertForSequenceClassification.from_pretrained('kdave/FineTuned_Finbert', num_labels=3)
tokenizer = BertTokenizer.from_pretrained('kdave/FineTuned_Finbert')

# Create a sentiment-analysis pipeline
nlp_pipeline = pipeline("sentiment-analysis", model=finbert, tokenizer=tokenizer)

Step 3: Perform Sentiment Analysis Now, you're ready to analyze sentiment! Provide the model with sentences related to Indian stock market news:

# Example sentences related to Indian stock market news
sentences = [
    "The Indian stock market experienced a surge in trading activity.",
    "Investors are optimistic about the future of Indian financial markets.",
    "Concerns about economic uncertainties are affecting stock prices in India.",
    "Earnings reports from Indian companies show a positive trend."
]

# Perform sentiment analysis using the fine-tuned FinBERT model
results = nlp_pipeline(sentences)
print(results)

Run the code, and voilà! You'll receive sentiment insights for each sentence.

Step 4: Incorporate into Your Workflow Integrate this model seamlessly into your financial NLP research or analysis workflows to elevate the accuracy and depth of sentiment assessments related to the Indian stock market.

Now, you're all set to harness the power of the Fine-Tuned FinBERT model. Happy analyzing! 📈🚀

Training Details

Dataset Information:

The Fine-Tuned FinBERT model was trained on a carefully curated dataset consisting of Indian financial news articles with summaries. Here's a brief overview of the dataset and its preparation:

  1. Data Source:

    • The dataset encompasses a wide array of Indian financial news articles, ensuring a diverse and representative sample of content related to the stock market.
  2. Text Summarization:

    • The T5-base model from Hugging Face was employed for text summarization. This step aimed to distill the essential information from each article, providing concise summaries for training the model.
  3. Sentiment Labeling:

    • Sentiment labels for the curated dataset were derived through the GPT add-on for Google Sheets. This process involved annotating the articles with positive, negative, or neutral sentiments, enhancing the model's ability to discern nuanced expressions.
  4. Contextual Richness:

    • The dataset was designed to be contextually rich, exposing the model to a spectrum of sentiment expressions within the Indian stock market landscape. This diversity ensures the model's adaptability to varied scenarios.

Dataset Card: For more detailed information on the dataset, including statistics, features, and documentation related to data pre-processing, please refer to the associated Dataset Card.

This meticulous curation and diverse data incorporation contribute to the model's proficiency in capturing nuanced sentiment expressions relevant to the Indian stock market.