BGE-Small Fine-Tuned on USCode-QueryPairs
This is a fine-tuned version of the BGE Small embedding model, trained on the USCode-QueryPairs dataset, a subset of the USLawQA corpus. The model is optimized for generating embeddings for legal text, achieving 75% accuracy on the test set.
Overview
- Base Model: BGE Small
- Dataset: USCode-QueryPairs
- Training Details:
- Hardware: Google Colab (T4 GPU)
- Training Time: 2 hours
- Accuracy: 75% on the test set from USLawQA
Applications
This model is ideal for:
- Legal Text Retrieval: Efficient semantic search across legal documents.
- Question Answering: Answering legal queries based on context from the US Code.
- Embeddings Generation: Generating high-quality embeddings for downstream legal NLP tasks.
Usage
The model can be used with model.encode
for generating embeddings. Below is an example usage snippet:
# Load model directly
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("ArchitRastogi/BGE-Small-LegalEmbeddings-USCode")
model = AutoModel.from_pretrained("ArchitRastogi/BGE-Small-LegalEmbeddings-USCode")
text = "Duties of the president"
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
#Printing the Embeddings
print(outputs)
Evaluation
The model was evaluated on the test set of USLawQA and achieved the following metrics:
- Accuracy: 75%
- Task: Semantic similarity and legal question answering.
Related Resources
π§ Contact
For any inquiries, suggestions, or feedback, feel free to reach out:
Archit Rastogi
π§ architrastogi20@gmail.com
π License
This dataset is distributed under the Apache 2.0 License. Please ensure compliance with applicable copyright laws when using this dataset.
- Downloads last month
- 34
Inference API (serverless) does not yet support transformers models for this pipeline type.
Model tree for ArchitRastogi/BGE-Small-LegalEmbeddings-USCode
Base model
BAAI/bge-small-en-v1.5Dataset used to train ArchitRastogi/BGE-Small-LegalEmbeddings-USCode
Evaluation results
- Accuracy on USCode-QAPairs-FinetuningEvaluation on USLawQA Dataset0.720
- Recall on USCode-QAPairs-FinetuningEvaluation on USLawQA Dataset0.750