Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge
Abstract
Rare diseases present unique challenges in healthcare, often suffering from delayed diagnosis and fragmented information landscapes. The scarcity of reliable knowledge in these conditions poses a distinct challenge for Large Language Models (LLMs) in supporting clinical management and delivering precise patient information underscoring the need for focused training on these 'zebra' cases. We present Zebra-Llama, a specialized context-aware language model with high precision Retrieval Augmented Generation (RAG) capability, focusing on Ehlers-Danlos Syndrome (EDS) as our case study. EDS, affecting 1 in 5,000 individuals, exemplifies the complexities of rare diseases with its diverse symptoms, multiple subtypes, and evolving diagnostic criteria. By implementing a novel context-aware fine-tuning methodology trained on questions derived from medical literature, patient experiences, and clinical resources, along with expertly curated responses, Zebra-Llama demonstrates unprecedented capabilities in handling EDS-related queries. On a test set of real-world questions collected from EDS patients and clinicians, medical experts evaluated the responses generated by both models, revealing Zebra-Llama's substantial improvements over base model (Llama 3.1-8B-Instruct) in thoroughness (77.5% vs. 70.1%), accuracy (83.0% vs. 78.8%), clarity (74.7% vs. 72.0%) and citation reliability (70.6% vs. 52.3%). Released as an open-source resource, Zebra-Llama not only provides more accessible and reliable EDS information but also establishes a framework for developing specialized AI solutions for other rare conditions. This work represents a crucial step towards democratizing expert-level knowledge in rare disease management, potentially transforming how healthcare providers and patients navigate the complex landscape of rare diseases.
Community
Title:
Zebra-Llama: A Context-Aware Large Language Model for Democratizing Rare Disease Knowledge
TL;DR:
We present Zebra-Llama, an open-source specialized LLM with enhanced context-aware RAG capabilities for rare disease Ehlers-Danlos Syndrome (EDS), demonstrating significant improvements in accuracy, thoroughness, and citation reliability on real-world patient and clinician queries.
Key Points:
- Novel context-aware fine-tuning methodology optimized for rare disease knowledge management
- Evaluated by medical experts on real-world EDS questions with superior performance metrics
- Open-source model with high-precision RAG capabilities and robust citation accuracy
- Potential framework for developing specialized AI solutions for other rare diseases
Resources:
- Paper: https://arxiv.org/abs/2411.02657
- Model Weights: https://huggingface.co/zebraLLAMA/zebra-Llama-v0.2
- RAG-API: https://zebra-llama-rag.onrender.com
- Demo: https://github.com/karthiksoman/zebra-Llama/blob/main/code/notebook/zebra_llama_v0.2_demo.ipynb
- Code: https://github.com/karthiksoman/zebra-llama
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper