Spaces:
Sleeping
Sleeping
import streamlit as st | |
import nannyml as nml | |
from sklearn.metrics import f1_score | |
import numpy as np | |
st.title('Is your model degrading?') | |
st.caption('### :violet[_Estimate_] the performance of an ML model. :violet[_Without ground truth_].') | |
st.markdown(""" | |
If you have been previously exposed to concepts like [covariate shift or concept drift]('https://www.nannyml.com/blog/types-of-data-shift'), | |
you may be aware that changes in the distribution of | |
the production data can affect the model's performance. | |
""") | |
st.markdown("""Recently a paper from MIT, Harvard and other institutions showed how [91% of their ML models | |
experiments degradated]('https://www.nannyml.com/blog/91-of-ml-perfomance-degrade-in-time') in time.""") | |
st.markdown("""Typically, to know if a model is degrading we need access ground truth. But most of the times | |
getting new labeled data is either expensive, takes lots of time or imposible. So we end up blindless without | |
knowing how the model is performing in production. | |
""") | |
st.markdown(""" | |
To overcome this issue, we at NannyML created two methods to :violet[_estimate_] the performance of ML models without needing access to | |
new labeled data. In this demo, we show the **Confidence-based Performance Estimation (CBPE)** method, specially designed to estimate | |
the performance of **classification** models. | |
""") | |
reference_df, analysis_df, analysis_target_df = nml.load_synthetic_car_loan_dataset() | |
test_f1_score = f1_score(reference_df['repaid'], reference_df['y_pred']) | |
st.markdown("#### The prediction task") | |
st.markdown(""" | |
A model was trained to predict whether or not a person will repay their car loan. The model used features like: | |
car_value, salary_range, loan_lenght, etc. | |
""") | |
st.dataframe(analysis_df.head(3)) | |
st.markdown(""" | |
We know that the model had a **Test F1-Score of: 0.943**. But, what guarantees us that the F1-Score | |
will continue to be good on production data? | |
""") | |
st.markdown("#### Estimating the Model Performance") | |
st.markdown(""" | |
Instead of waiting for ground truth we can use NannyML's | |
[CBPE]("https://nannyml.readthedocs.io/en/stable/tutorials/performance_estimation/binary_performance_estimation/standard_metric_estimation.html") | |
method to estimate the performance of an ML model. | |
CBPE's trick is to use the confidence scores of the ML model. It calibrates the scores to turn them into actual probabilities. | |
Once the probabilities are calibrate it can estimate any performance metric that can be computed from the confusion matrix elements. | |
""") | |
chunk_size = st.slider('Chunk/Sample Size', 2500, 7500, 5000, 500) | |
metric = st.selectbox( | |
'Performance Metric', | |
('f1', 'roc_auc', 'precision', 'recall', 'specificity', 'accuracy')) | |
plot_realized_performance = st.checkbox('Compare NannyML estimation with actual outcomes') | |
if st.button('**_Estimate_ Performance**'): | |
with st.spinner('Running...'): | |
estimator = nml.CBPE( | |
y_pred_proba='y_pred_proba', | |
y_pred='y_pred', | |
y_true='repaid', | |
timestamp_column_name='timestamp', | |
metrics=[metric], | |
chunk_size=chunk_size, | |
problem_type='classification_binary' | |
) | |
estimator.fit(reference_df) | |
estimated_performance = estimator.estimate(analysis_df) | |
if plot_realized_performance: | |
analysis_with_targets = analysis_df.merge(analysis_target_df, left_index=True, right_index=True) | |
calculator = nml.PerformanceCalculator( | |
y_pred_proba='y_pred_proba', | |
y_pred='y_pred', | |
y_true='repaid', | |
timestamp_column_name='timestamp', | |
metrics=[metric], | |
chunk_size=chunk_size, | |
problem_type='classification_binary' | |
) | |
calculator.fit(reference_df) | |
realized_performance = calculator.calculate(analysis_with_targets) | |
st.plotly_chart(estimated_performance.compare(realized_performance).plot(), use_container_width=False) | |
else: | |
st.plotly_chart(estimated_performance.plot(), use_container_width=False) | |
st.divider() | |
st.markdown("""Created by [santiviquez](https://twitter.com/santiviquez) from NannyML""") | |
st.markdown(""" | |
NannyML is an open-source library for post-deployment data science. Leave us a π on [GitHub]("https://github.com/NannyML/nannyml") | |
or [check our docs]('https://nannyml.readthedocs.io/en/stable/landing_page.html') to learn more. | |
""") |