--- license: apache-2.0 datasets: - HuggingFaceFW/fineweb - PleIAs/YouTube-Commons - allenai/WildChat-1M - Salesforce/xlam-function-calling-60k - ShareGPT4Video/ShareGPT4Video - OpenGVLab/ShareGPT-4o - TempoFunk/webvid-10M - MBZUAI/VideoInstruct-100K - Isaak-Carter/j.o.s.i.e.v4.0.1o - NousResearch/dolma-v1_7-c4 - NousResearch/dolma-v1_7-cc_en_head language: - de - en library_name: mlx tags: - moe - multimodal - vision - audio - endtoend - j.o.s.i.e. --- # STILL IN BETA!!! # This will be the repo for J.O.S.I.E.v4o Like **OpenAIs GPT-4o**, it's natively Multimodal, based on the **NExT-GPT** combined with **ROPE**, **RMS Normalisation**, and **MoE**, parred with the **GPT-4o Tokenizer** from OpenAI. This is a *future project* and will take it's time. Further more, I will probably make a **UI application** with that model too. Further updates comming soon!!! Source code and more info will be available on my GitHub Repo # Update 1: The model will go through multible training stages: - *Stage 1 :* Instruction finetuning the LLM on custom dataset and Prompt format. - *Stage 2 :* Encoder side alignment using the contrastive learning technique. - *Stage 3 :* Instruction finetuning the full model. - Changes and more stages will be comming. # Update 2: Encoders are created and function as expected. # Update 4: Creating the full model and succesfully running inference with vision and audio. # Update 3: First encoder side alignment training steps successfuly worked. # Update 5 Prompt Template: The prompt template used in this project is inspired by the ChatML template but includes several customized adjustments to fit the specific requirements of our application: ``` <|begin_of_text|>system You are J.O.S.I.E. which is an acronym for "Just an Outstandingly Smart Intelligent Entity", a private and super-intelligent AI assistant, created by Gökdeniz Gülmez.<|end_of_text|> <|begin_of_text|>main user "Gökdeniz Gülmez" {{ .Prompt }}<|end_of_text|> <|begin_of_text|>josie {{ .Response }}<|end_of_text|> ``` 1. **User Types and Access Levels:** - **Main User:** - The main user is identified as "Gökdeniz Gülmez." This designation can be personalized and updated with your name as needed. The system will recognize and prioritize the main user's commands and queries, ensuring they have full control and access to all functionalities and information within the smart home system. - **Authorized User:** - Authorized users are those who have been granted permission by the main user. They are designated as `authorized user "{name}"`. While these users can interact with the system, their access will be restricted to guest-level privileges, preventing them from controlling or accessing sensitive smart home information. This ensures a balance between usability and security. - **Unauthorized User:** - Unauthorized users are designated as `unauthorized user "name if possible else unknown"`. These users will have no access to J.O.S.I.E.'s capabilities. Any attempt to interact with the system by unauthorized users will result in immediate redirection to the main user for verification. Additional security measures may be enacted, such as alerts or system lockdowns, to protect the integrity of the smart home environment. 2. **Template Structure:** - The template structure is designed to facilitate clear and structured interactions between users and the system. It consists of the following components: - **System Message:** `system` This section includes any predefined system messages or configurations necessary for the interaction. - **Main User Identification:** `main user "Gökdeniz Gülmez"` This line identifies the main user of the system, which can be dynamically updated. - **User Prompt:** `{{ .Prompt }}` This placeholder is where the user’s input or command is placed. - **J.O.S.I.E. Response:** `josie` `{{ .Response }}` This section is dedicated to J.O.S.I.E.'s responses, providing structured and relevant information or actions based on the user’s prompt. By incorporating these elements, the template ensures a secure, personalized, and efficient interaction model for managing and operating the smart home system through J.O.S.I.E. # Update 6: Starting to train the first models (Llama3 8B, Qwen2 0.5B). # Update 7: Will use Meta's ImageBind model at first but will then use the own encoders later on. # Update 8: For voice mode, the `CAMB-AI/MARS5-TTS` model will be used.