File size: 1,047 Bytes

---
license: apache-2.0
datasets:
- HuggingFaceFW/fineweb
- PleIAs/YouTube-Commons
- allenai/WildChat-1M
language:
- de
- en
- ja
- fr
library_name: mlx
tags:
- moe
- multimodal
- j.o.s.i.e.
---

# This will be the repo for J.O.S.I.E.v4o

Like **OpenAIs GPT-4o**, it's natively Multimodal, based on the **NExT-GPT** combined with **ROPE**, **RMS Normalisation**, and **MoE**, parred with the **GPT-4o Tokenizer** from OpenAI.
This is a *future project* and will take it's time.

Further more, I will probably make a **UI application** with that model too.

Further updates comming soon!!!


First architecture Overview:

First Beta will utilize the already pretrained ImageBind Model. The linear input Projection is because the outputs of the ImageBind model are not in the correct dimensions.
Later on the input projection will be removed.

<img src="Architecture_overview_beta3.png" width="100%" height="auto"/>


Source code and more info will be available on my <a href="https://github.com/Goekdeniz-Guelmez/J.O.S.I.E.v4-o.git">GitHub Repo</a>