File size: 1,047 Bytes
1247a15
 
02162cc
 
 
 
 
 
 
 
 
 
 
 
 
 
1247a15
 
 
 
 
 
 
402ed7d
1247a15
d159d67
 
 
402ed7d
 
 
 
 
 
 
 
d159d67
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
---
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb
- PleIAs/YouTube-Commons
- allenai/WildChat-1M
language:
- de
- en
- ja
- fr
library_name: mlx
tags:
- moe
- multimodal
- j.o.s.i.e.
---

# This will be the repo for J.O.S.I.E.v4o

Like **OpenAIs GPT-4o**, it's natively Multimodal, based on the **NExT-GPT** combined with **ROPE**, **RMS Normalisation**, and **MoE**, parred with the **GPT-4o Tokenizer** from OpenAI.
This is a *future project* and will take it's time.

Further more, I will probably make a **UI application** with that model too.

Further updates comming soon!!!


First architecture Overview:

First Beta will utilize the already pretrained ImageBind Model. The linear input Projection is because the outputs of the ImageBind model are not in the correct dimensions.
Later on the input projection will be removed.

<img src="Architecture_overview_beta3.png" width="100%" height="auto"/>


Source code and more info will be available on my <a href="https://github.com/Goekdeniz-Guelmez/J.O.S.I.E.v4-o.git">GitHub Repo</a>