Flova
/

omr_transformer

vision-encoder-decoder

image-text-to-text

Inference Endpoints

Model card Files Files and versions Community

Edit model card

Optical Music Recognition Transformer

Image-To-Text model for optical music recognition. The model is trained to predict simple notes in the LilyPond format from a given image. Training data consists of artificial, handwritten and white board images. The model itself is based on Donut.

Demo

Prediction: c'2 a''8 c''8 r4 c'1 e'8 c'8 c'8 a''8 f'4 a'8 c'8

Prediction: d'8 g'8 c''8 a'8 d'2 c'8 f''8 d'4 c''4 e'8 r8 g'8 b'8 e'8 g'8 d'2

Prediction: g'4 c'4 r8 f''8 e'8 d'8 r8 c'4 c'2 a'2 b'4 r4 a'8 r8 r4

Repo: https://github.com/UHHRobotics22-23/robot_project/tree/main/marimbabot_vision

Downloads last month: 25

Inference Examples

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Space using Flova/omr_transformer 1