--- license: apache-2.0 library_name: peft tags: - alignment-handbook - trl - sft - generated_from_trainer base_model: mistralai/Mistral-7B-v0.1 model-index: - name: Cimphony-Mistral-Law-7B results: - task: type: text-generation dataset: type: cais/mmlu name: MMLU metrics: - name: International Law type: accuracy value: 0.802 verified: false - task: type: text-generation dataset: type: cais/mmlu name: MMLU metrics: - name: Jurisprudence type: accuracy value: 0.704 verified: false - task: type: text-generation dataset: type: cais/mmlu name: MMLU metrics: - name: Professional Law type: accuracy value: 0.416 verified: false - task: type: text-generation dataset: type: coastalcph/lex_glue name: LexGLUE metrics: - name: ECtHR A type: balanced accuracy value: 0.631 verified: false - task: type: text-generation dataset: type: coastalcph/lex_glue name: LexGLUE metrics: - name: LEDGAR type: balanced accuracy value: 0.741 verified: false - task: type: text-generation dataset: type: coastalcph/lex_glue name: LexGLUE metrics: - name: CaseHOLD type: accuracy value: 0.776 verified: false - task: type: text-generation dataset: type: coastalcph/lex_glue name: LexGLUE metrics: - name: Unfair-ToS type: balanced accuracy value: 0.809 verified: false pipeline_tag: text-generation --- # Cimphony-Mistral-Law-7B We introduce Cimphony-Mistral-Law-7B, a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1). Cimphony’s LLMs present state-of-the-art performance on legal benchmarks, suppressing models trained on a much larger corpus with significantly more resources, even GPT-4, OpenAI’s flagship model. Checkout and register on our [https://cimphony.ai](https://app.cimphony.ai/signup?callbackUrl=https://app.cimphony.ai/) ![image/png](https://cdn-uploads.huggingface.co/production/uploads/657d36d3647c0211e7746ed9/Yjx96bC58SPgNwmDxx_yx.png) ## Model description The model was trained on 600M tokens. We use novel methods to expose the model to this corpus during training, blending a variety of legal reading comprehension tasks, as well as general language data. ## Legal Evaluation Results We evaluate on the legal splits of the MMLU benchmark, as well as LexGLUE. While both are multiple option benchmarks, prompts were adapted so that the models output a single answer. In some cases, additional post-processing was required. Benchmarks for which the labels were A-E multiple-choice options use an accuracy mertic. Benchmarks that have a closed list of options (e.g. Unfair-ToS) use a balanced-accuracy metric, as classes may not be balanced. | Model / Benchmark | International Law (MMLU) | Jurisprudence (MMLU) | Professional law (MMLU) | ECtHR A (LexGlue) | LEDGAR (LexGlue) | CaseHOLD (LexGlue) | Unfair-ToS (LexGlue) | |:-----------------------------------|:--------------------------|:----------------------|:-------------------------|:-------------------|:------------------|:--------------------|:-----------------------| | Mistral-7B-Instruct-v0.2 | 73.6% | 69.4% | 41.2% | 67.5% | 50.6% | 56.3% | 36.6% | | AdaptLLM | 57.0% | 52.8% | 36.1% | 51.9% | 46.3% | 50.0% | 51.3% | | Saul-7B | 69.4% | 63.0% | **43.2%** | **71.2%** | 55.9% | 65.8% | 80.3% | |