--- datasets: [] language: [] library_name: sentence-transformers pipeline_tag: sentence-similarity tags: - sentence-transformers - sentence-similarity - feature-extraction widget: [] --- # SentenceTransformer This is a finetuned version of [bge-m3](https://huggingface.co/BAAI/bge-m3) for the task of SQL table retrieval and ranking. ## Model Details This model can be used to identify relevant SQL tables for query to SQL translation. The model was finetuned using a curated dataset of SQL table definitions and corresponding natural language queries. The script used for finetuning: [Flag Embeddings](https://github.com/FlagOpen/FlagEmbedding/blob/master/examples/unified_finetune/unified_finetune_bge-m3_exmaple.sh) ### Model Description - **Model Type:** Sentence Transformer - **Maximum Sequence Length:** 8192 tokens - **Output Dimensionality:** 1024 tokens - **Similarity Function:** Cosine Similarity ### Model Sources - **Documentation:** [Sentence Transformers Documentation](https://sbert.net) - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) ### Full Model Architecture ``` SentenceTransformer( (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) (2): Normalize() ) ``` ## Usage ### Direct Usage (Sentence Transformers) First install the Sentence Transformers library: ```bash pip install -U sentence-transformers ``` Then you can load this model and run inference. ```python from sentence_transformers import SentenceTransformer # Download from the 🤗 Hub model = SentenceTransformer("sentence_transformers_model_id") # Run inference sentences = [ 'What is the total biomass of fish in farms with a water temperature above 25 degrees Celsius?', 'CREATE TABLE Farm (FarmID INT, FarmName VARCHAR(50), WaterTemperature DECIMAL, Biomass DECIMAL)', 'CREATE TABLE Locations (id INT PRIMARY KEY, name VARCHAR(50), region VARCHAR(50), depth INT)', ] embeddings = model.encode(sentences) print(embeddings.shape) # [3, 1024] # Get the similarity scores for the embeddings similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] ``` ## Training Details ### Framework Versions - Python: 3.10.14 - Sentence Transformers: 3.0.1 - Transformers: 4.42.3 - PyTorch: 2.3.1+cu121 - Accelerate: 0.31.0 - Datasets: 2.20.0 - Tokenizers: 0.19.1 ## Citation ### BibTeX