File size: 5,009 Bytes
1247a15
 
02162cc
 
 
 
78c0f10
 
 
 
 
81ae1df
9ca90e4
 
4af7ce0
07ccef2
 
 
 
02162cc
 
 
 
 
 
 
78c0f10
 
 
02162cc
1247a15
 
139296d
 
4af7ce0
 
1247a15
 
 
 
 
402ed7d
1247a15
d159d67
 
44d9a0b
d159d67
44d9a0b
 
402ed7d
44d9a0b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75d2037
 
 
e204b87
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
---
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb
- PleIAs/YouTube-Commons
- allenai/WildChat-1M
- Salesforce/xlam-function-calling-60k
- ShareGPT4Video/ShareGPT4Video
- OpenGVLab/ShareGPT-4o
- TempoFunk/webvid-10M
- MBZUAI/VideoInstruct-100K
- Isaak-Carter/j.o.s.i.e.v4.0.1o
- NousResearch/dolma-v1_7-c4
- NousResearch/dolma-v1_7-cc_en_head
- nyu-visionx/Cambrian-10M
- LargeWorldModel/ultrachat_qa_mix_1M
- LargeWorldModel/ultrachat_qa_mix_512K
- LargeWorldModel/ultrachat_qa_mix_256K
- LargeWorldModel/ultrachat_qa_mix_128K
language:
- de
- en
library_name: mlx
tags:
- moe
- multimodal
- vision
- audio
- endtoend
- j.o.s.i.e.
---

# STILL IN BETA!!!

# The newest Text to text version is Beta 2.3.1 `Isaak-Carter/JOSIEv4o-8b-stage1-beta2.3.1`, here is the 4 K S quant version `Isaak-Carter/JOSIEv4o-8b-stage1-beta2.3.1-Q4_K_S-GGUF`

# This will be the repo for J.O.S.I.E.v4o

Like **OpenAIs GPT-4o**, it's natively Multimodal, based on the **NExT-GPT** combined with **ROPE**, **RMS Normalisation**, and **MoE**, parred with the **GPT-4o Tokenizer** from OpenAI.
This is a *future project* and will take it's time.

Further more, I will probably make a **UI application** with that model too.

Further updates comming soon!!!

Source code and more info will be available on my <a href="https://github.com/Goekdeniz-Guelmez/J.O.S.I.E.v4-o.git">GitHub Repo</a>

# Update 1:
The model will go through multible training stages:

- *Stage 1 :* Instruction finetuning the LLM on custom dataset and Prompt format.
- *Stage 2 :* Encoder side alignment using the contrastive learning technique.
- *Stage 3 :* Instruction finetuning the full model.
- Changes and more stages will be comming.

# Update 2:
Encoders are created and function as expected.

# Update 4:
Creating the full model and succesfully running inference with vision and audio.

# Update 3:
First encoder side alignment training steps successfuly worked.


# Update 5 Prompt Template:
The prompt template used in this project is inspired by the ChatML template but includes several customized adjustments to fit the specific requirements of our application:

```
<|begin_of_text|>system
You are J.O.S.I.E. which is an acronym for "Just an Outstandingly Smart Intelligent Entity", a private and super-intelligent AI assistant, created by Gökdeniz Gülmez.<|end_of_text|>
<|begin_of_text|>main user "Gökdeniz Gülmez"
{{ .Prompt }}<|end_of_text|>
<|begin_of_text|>josie
{{ .Response }}<|end_of_text|>
```

1. **User Types and Access Levels:**
   - **Main User:**
     - The main user is identified as "Gökdeniz Gülmez." This designation can be personalized and updated with your name as needed. The system will recognize and prioritize the main user's commands and queries, ensuring they have full control and access to all functionalities and information within the smart home system.
   - **Authorized User:**
     - Authorized users are those who have been granted permission by the main user. They are designated as `authorized user "{name}"`. While these users can interact with the system, their access will be restricted to guest-level privileges, preventing them from controlling or accessing sensitive smart home information. This ensures a balance between usability and security.
   - **Unauthorized User:**
     - Unauthorized users are designated as `unauthorized user "name if possible else unknown"`. These users will have no access to J.O.S.I.E.'s capabilities. Any attempt to interact with the system by unauthorized users will result in immediate redirection to the main user for verification. Additional security measures may be enacted, such as alerts or system lockdowns, to protect the integrity of the smart home environment.

2. **Template Structure:**
   - The template structure is designed to facilitate clear and structured interactions between users and the system. It consists of the following components:
     - **System Message:**  
       `system`  
       This section includes any predefined system messages or configurations necessary for the interaction.
     - **Main User Identification:**  
       `main user "Gökdeniz Gülmez"`  
       This line identifies the main user of the system, which can be dynamically updated.
     - **User Prompt:**  
       `{{ .Prompt }}`  
       This placeholder is where the user’s input or command is placed.
     - **J.O.S.I.E. Response:**  
       `josie`  
       `{{ .Response }}`  
       This section is dedicated to J.O.S.I.E.'s responses, providing structured and relevant information or actions based on the user’s prompt.

By incorporating these elements, the template ensures a secure, personalized, and efficient interaction model for managing and operating the smart home system through J.O.S.I.E.

# Update 6:
Starting to train the first models (Llama3 8B, Qwen2 0.5B).

# Update 7:
Will use Meta's ImageBind model at first but will then use the own encoders later on.

# Update 8:
For voice mode, the `CAMB-AI/MARS5-TTS` model will be used.