Spaces:
Sleeping
Sleeping
import streamlit as st | |
import nannyml as nml | |
def get_data(): | |
reference_df, analysis_df, analysis_target_df = nml.load_synthetic_car_loan_dataset() | |
return reference_df, analysis_df, analysis_target_df | |
st.title('Is your model degrading?') | |
st.caption('### :violet[_Estimate_] the performance of an ML model. :violet[_Without ground truth_].') | |
st.markdown(""" | |
If you have been previously exposed to concepts like [covariate shift or concept drift](https://www.nannyml.com/blog/types-of-data-shift), | |
you may be aware that changes in the distribution of | |
the production data can affect the model's performance. | |
""") | |
st.markdown("""Recently a paper from MIT, Harvard, and other institutions showed how [91% of their ML models | |
experiments degraded](https://www.nannyml.com/blog/91-of-ml-perfomance-degrade-in-time) in time.""") | |
st.markdown("""Typically, we need access to ground truth to know if a model is degrading. | |
But most of the time, getting new labeled data is expensive, time-consuming, or impossible. | |
So we end up blindless without knowing how the model performs in production. | |
""") | |
st.markdown(""" | |
To overcome this issue, we at NannyML created two methods to :violet[_estimate_] the performance of ML models without needing access to | |
new labeled data. In this demo, we show the **Confidence-based Performance Estimation (CBPE)** method, specially designed to estimate | |
the performance of **classification** models. | |
""") | |
reference_df, analysis_df, analysis_target_df = get_data() | |
st.markdown("#### The prediction task") | |
st.markdown(""" | |
A model was trained to predict whether or not a person will repay their car loan. The model used features like: | |
car_value, salary_range, loan_lenght, etc. | |
""") | |
st.dataframe(analysis_df.head(3)) | |
st.markdown(""" | |
We know that the model had a **Test F1-Score of: 0.943**. But what guarantees us that the F1-Score | |
will continue to be good on production data? | |
""") | |
st.markdown("#### Estimating the Model Performance") | |
st.markdown(""" | |
Instead of waiting for ground truth, we can use NannyML's | |
[CBPE](https://nannyml.readthedocs.io/en/stable/tutorials/performance_estimation/binary_performance_estimation/standard_metric_estimation.html) | |
method to estimate the performance of an ML model. | |
CBPE's trick is to use the confidence scores of the ML model. It calibrates the scores to turn them into actual probabilities. | |
Once the probabilities are calibrated, it can estimate any performance metric that can be computed from the confusion matrix elements. | |
""") | |
chunk_size = st.slider('Chunk/Sample Size', 2500, 7500, 5000, 500) | |
metric = st.selectbox( | |
'Performance Metric', | |
('f1', 'roc_auc', 'precision', 'recall', 'specificity', 'accuracy')) | |
plot_realized_performance = st.checkbox('Compare NannyML estimation with actual outcomes') | |
if st.button('**_Estimate_ Performance**'): | |
with st.spinner('Running...'): | |
estimator = nml.CBPE( | |
y_pred_proba='y_pred_proba', | |
y_pred='y_pred', | |
y_true='repaid', | |
timestamp_column_name='timestamp', | |
metrics=[metric], | |
chunk_size=chunk_size, | |
problem_type='classification_binary' | |
) | |
estimator.fit(reference_df) | |
estimated_performance = estimator.estimate(analysis_df) | |
if plot_realized_performance: | |
analysis_with_targets = analysis_df.merge(analysis_target_df, left_index=True, right_index=True) | |
calculator = nml.PerformanceCalculator( | |
y_pred_proba='y_pred_proba', | |
y_pred='y_pred', | |
y_true='repaid', | |
timestamp_column_name='timestamp', | |
metrics=[metric], | |
chunk_size=chunk_size, | |
problem_type='classification_binary' | |
) | |
calculator.fit(reference_df) | |
realized_performance = calculator.calculate(analysis_with_targets) | |
st.plotly_chart(estimated_performance.compare(realized_performance).plot(), use_container_width=False) | |
else: | |
st.plotly_chart(estimated_performance.plot(), use_container_width=False) | |
st.divider() | |
st.markdown("""Created by [santiviquez](https://twitter.com/santiviquez) from NannyML.""") | |
st.markdown(""" | |
NannyML is an open-source library for post-deployment data science. Leave us a π on [GitHub](https://github.com/NannyML/nannyml) | |
or [check our docs](https://nannyml.readthedocs.io/en/stable/landing_page.html) to learn more. | |
""") |