Spaces:

NannyML
/

nannyml-cbpe

Sleeping

App Files Files Community

santiviquez commited on Jul 17, 2023

Commit

882c546

•

1 Parent(s): 1f1c5c3

app

Browse files

Files changed (1) hide show

app.py +109 -0

app.py ADDED Viewed

	@@ -0,0 +1,109 @@

+import streamlit as st
+import nannyml as nml
+from sklearn.metrics import f1_score
+import numpy as np
+st.title('Is your model degrading?')
+st.caption('### :violet[_Estimate_] the performance of an ML model. :violet[_Without ground truth_].')
+st.markdown("""
+If you have been previously exposed to concepts like [covariate shift or concept drift]('https://www.nannyml.com/blog/types-of-data-shift'),
+you may be aware that changes in the distribution of
+the production data can affect the model's performance.
+""")
+st.markdown("""Recently a paper from MIT, Harvard and other institutions showed how [91% of their ML models
+experiments degradated]('https://www.nannyml.com/blog/91-of-ml-perfomance-degrade-in-time') in time.""")
+st.markdown("""Typically, to know if a model is degrading we need access ground truth. But most of the times
+getting new labeled data is either expensive, takes lots of time or imposible. So we end up blindless without
+knowing how the model is performing in production.
+""")
+st.markdown("""
+To overcome this issue, we at NannyML created two methods to :violet[_estimate_] the performance of ML models without needing access to
+new labeled data. In this demo, we show the **Confidence-based Performance Estimation (CBPE)** method, specially designed to estimate
+the performance of **classification** models.
+""")
+reference_df, analysis_df, analysis_target_df = nml.load_synthetic_car_loan_dataset()
+test_f1_score = f1_score(reference_df['repaid'], reference_df['y_pred'])
+st.markdown("#### The prediction task")
+st.markdown("""
+A model was trained to predict whether or not a person will repay their car loan. The model used features like:
+car_value, salary_range, loan_lenght, etc.
+""")
+st.dataframe(analysis_df.head(3))
+st.markdown("""
+We know that the model had a **Test F1-Score of: 0.943**. But, what guarantees us that the F1-Score
+will continue to be good on production data?
+""")
+st.markdown("#### Estimating the Model Performance")
+st.markdown("""
+Instead of waiting for ground truth we can use NannyML's
+[CBPE]("https://nannyml.readthedocs.io/en/stable/tutorials/performance_estimation/binary_performance_estimation/standard_metric_estimation.html")
+method to estimate the performance of an ML model.
+CBPE's trick is to use the confidence scores of the ML model. It calibrates the scores to turn them into actual probabilities.
+Once the probabilities are calibrate it can estimate any performance metric that can be computed from the confusion matrix elements.
+            """)
+chunk_size = st.slider('Chunk/Sample Size', 2500, 7500, 5000, 500)
+metric = st.selectbox(
+'Performance Metric',
+('f1', 'roc_auc', 'precision', 'recall', 'specificity', 'accuracy'))
+plot_realized_performance = st.checkbox('Compare NannyML estimation with actual outcomes')
+if st.button('**_Estimate_ Performance**'):
+    with st.spinner('Running...'):
+        estimator = nml.CBPE(
+            y_pred_proba='y_pred_proba',
+            y_pred='y_pred',
+            y_true='repaid',
+            timestamp_column_name='timestamp',
+            metrics=[metric],
+            chunk_size=chunk_size,
+            problem_type='classification_binary'
+        )
+        estimator.fit(reference_df)
+        estimated_performance = estimator.estimate(analysis_df)
+        if plot_realized_performance:
+            analysis_with_targets = analysis_df.merge(analysis_target_df, left_index=True, right_index=True)
+            calculator = nml.PerformanceCalculator(
+                y_pred_proba='y_pred_proba',
+                y_pred='y_pred',
+                y_true='repaid',
+                timestamp_column_name='timestamp',
+                metrics=[metric],
+                chunk_size=chunk_size,
+                problem_type='classification_binary'
+            )
+            calculator.fit(reference_df)
+            realized_performance = calculator.calculate(analysis_with_targets)
+            st.plotly_chart(estimated_performance.compare(realized_performance).plot(), use_container_width=False)
+        else:
+            st.plotly_chart(estimated_performance.plot(), use_container_width=False)
+st.divider()
+st.markdown("""Created by [santiviquez](https://twitter.com/santiviquez) from NannyML""")
+st.markdown("""
+NannyML is an open-source library for post-deployment data science. Leave us a 🌟 on [GitHub]("https://github.com/NannyML/nannyml")
+            or [check our docs]('https://nannyml.readthedocs.io/en/stable/landing_page.html') to learn more.
+""")