Commit History

Merging from rollback
ec99b37

meg HF staff commited on

merging dataset statistics file
c24f881

meg HF staff commited on

Merging back dataset statistics
e8ac901

meg-huggingface commited on

Be gone, you merge conflicting filegit rm data_measurements/dataset_statistics.py
2981bb2

meg-huggingface commited on

Updated cache from today
2947ee2

meg-huggingface commited on

Pushing back cache after rollback
97591e6

meg-huggingface commited on

Cache from rollback
5429ae3

meg-huggingface commited on

Updating from rollback
0b7eeeb

meg-huggingface commited on

Update from rollback
f9936fb

meg-huggingface commited on

Adding dependencies for images
deefca3

meg-huggingface commited on

Merge branch 'main' of https://huggingface.co/spaces/huggingface/data-measurements-tool-2 into main
14ce207

meg HF staff commited on

dup counts cache
91c90ea

meg HF staff commited on

Script to run through cache creation
ccfd542

meg HF staff commited on

Change to npmi display ordering
5546565

meg-huggingface commited on

Merge branch 'main' of https://huggingface.co/spaces/huggingface/data-measurements-tool-2 into main
c4e990f

meg-huggingface commited on

c4 realnewslike train text
ba51326

meg-huggingface commited on

Loading per-widget. Various changes to streamlit interactions for efficiency.
d3c28ec

meg-huggingface commited on

More cache; this time adding length_df.feather
e3f7160

meg-huggingface commited on

More cache
80f1b62

meg-huggingface commited on

Merge branch 'main' of https://huggingface.co/spaces/huggingface/data-measurements-tool-2 into main
5d4982b

meg-huggingface commited on

One more flag passing needed for setting live deployment
e122a90

meg-huggingface commited on

Adds flag for live deployment so that things will not be all recalculated when live.
7c5239c

meg-huggingface commited on

Trying cache push without the largest files, as they throttle our pushes
58471d2

meg-huggingface commited on

wiki general stats
ff8aca1

meg-huggingface commited on

Finishing c4 en train text cache
63ef066

meg-huggingface commited on

c4 en train text cache
b1e4418

meg-huggingface commited on

A variety of cache
11c0439

meg-huggingface commited on

text dset cache
1652bd6

meg-huggingface commited on

glue cola train sentence cache
edf068c

meg-huggingface commited on

removing extraneous backup
724d1b1

meg-huggingface commited on

Merge branch 'main' of https://huggingface.co/spaces/huggingface/data-measurements-tool-2 into main
b28e93b

meg-huggingface commited on

Hate speech offensive
e530aff

meg-huggingface commited on

General stats cache
adb962b

meg HF staff commited on

dset peek caches
a400d60

meg HF staff commited on

c4 en noblocklist cache
fd75df7

meg HF staff commited on

c4 ennoblocklist cache
13856fd

meg HF staff commited on

Starting imdb cache
090cc42

meg-huggingface commited on

Changing cache naming scheme to make consistent.
cb64f9a

meg-huggingface commited on

Hate speech cache
201f0a7

meg HF staff commited on

More flexibility in specifying cache directory.
101aa18

meg-huggingface commited on

Scripts to generate cache
db74ba9

meg-huggingface commited on

Standardizing filenaming a bit.
0803ab3

meg-huggingface commited on

More modularizing; npmi and labels
a2ae370

meg-huggingface commited on

Some additional modularizing and caching of the text lengths widget
335424f

meg-huggingface commited on

Modularization and caching of text length widget
85cf91c

meg-huggingface commited on

Removes extraneous debugging print statements
6a9c993

meg-huggingface commited on

Missing a dependency; adding to requirements.txt
6557527

meg-huggingface commited on

Begins modularizing so that each widget can be independently loaded without having a requirement on the ordering of load_or_preparing in app.py. This means that each function corresponding to a widget will check if the variables it depends on have been calculated yet. If not, it will call back to calculate them. Because of the messiness this causes with passing the use_cache variable around, I've now set use_cache as a global variable, set when the DatasetStatisticsCacheClass is initialized, and removed the use_cache arguments appearing in nearly every function.
4b53042

meg-huggingface commited on

Removing need to keep around base dset for the header widget; now just saving what is shown -- the first n lines of the base dataset -- as a json, and loading if it's cached.
66693d5

meg-huggingface commited on

Removing any need for a dataframe in expander_general_stats; instead making sure to cache and load the small amount of details needed for this widget. Note I also moved around a couple functions -- same content, just moved -- so that it was easier for me to navigate through the code. I also pulled out a couple of sub-functions from larger functions, again to make the code easier to work with/understand, as well as helping to further modularize so we can limit what needs to be cached.
e1f2cc3

meg-huggingface commited on