🚩 Report

by gileneo - opened Sep 9

Discussion

gileneo

Sep 9

Reflection 70b benchmarks are not real

The whole drama is described here:
https://x.com/shinboson/status/1832933753837982024

nisten

Sep 9

This is literally a model posted for you to run.

You're making an assumption from a broken openrouter implementation that no one can reproduce

sneed1488

Sep 9

This is literally a model posted for you to run.

You're making an assumption from a broken openrouter implementation that no one can reproduce

It was not OpenRouter's implementation, they just forwarded requests to Matt's privately hosted API. (which was just a proxy for sonnet 3.5)

jukofyork

Sep 9

•

edited Sep 9

@nisten

The evidence based on tokenisation, <META> tag, getting it to output its system message, the questions that revealed it was really Claude are clear proof it wasn't the correct end-point.

If it was an honest mistake and openrouter was accidentally routing the model to the wrong end-point then it wouldn't be filtering and replacing the word "Claude".

sneed1488

Sep 9

@nisten

The evidence based on tokenisation, <META> tag, getting it to output its system message, the questions that revealed it was really Claude are clear proof it wasn't the correct end-point.

If it was an honest mistake and openrouter was accidentally routing the model to the wrong end-point then it wouldn't be filtering and replacing the word "Claude".

The model also changed to GPT-4o ( I assume, they changed it from that pretty quickly )

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment