Commit History

Sync with data tooling repo, using edugp/kenlm models, updating viz to use quantiles for coloring and ad-hoc viz for the registry dataset
3c30fa3

edugp commited on

Run tokenizer before computing perplexity and format
7b62017

edugp commited on

Replicate default cc_net preprocessing at inference time on KenlmModel.get_perplexity
0def03f

edugp commited on

Format file
ab846df

edugp commited on

Add distiluse-base-multilingual-cased-v2
ee732fe

edugp commited on

Add tests and fix issue when splitting into sentences, to grab the minimum number between total sentences and sample size, rather than total original documents and sample size
d131aa3

edugp commited on

Update README
6d1a001

edugp commited on

Check if file exists before attempting to remove
ab7449f

edugp commited on

Remove also the '{language}.sp.model' file on failure
38b6530

edugp commited on

Add CLI and refactor
86e673e

edugp commited on

Remove corrupt KenLM model files.
9ec7b19

edugp commited on

Support visualizing both sentences and whole documents. Smooth down color assignment in visualization.
a86046b

edugp commited on

Rename text input description
bf3498e

edugp commited on

Use latest embedding-lenses
92cad16

edugp commited on

Upgrade dependencies to match embedding-lenses
7cfaf1c

edugp commited on

Upgrade streamlit
64facb8

edugp commited on

Install embedding-lenses from wheel
ff180ca

edugp commited on

Fix issue with csv files
abf62cb

edugp commited on

Do not lock the version of embedding-lenses
9ebb2b0

edugp commited on

Update requirements.txt to install kenlm
7a089f0

edugp commited on

Inital commit for perplexity lenses
1f30dbc

edugp commited on

initial commit
77d22a6

system HF staff commited on