Spaces:

NannyML
/

nannyml-cbpe

Sleeping

App Files Files Community

nannyml-cbpe / app.py

santiviquez

cache dataset

5db7a68 over 1 year ago

raw

history blame contribute delete

4.55 kB

	import streamlit as st
	import nannyml as nml

	@st.cache_resource
	def get_data():
	reference_df, analysis_df, analysis_target_df = nml.load_synthetic_car_loan_dataset()
	return reference_df, analysis_df, analysis_target_df


	st.title('Is your model degrading?')
	st.caption('### :violet[_Estimate_] the performance of an ML model. :violet[_Without ground truth_].')

	st.markdown("""
	If you have been previously exposed to concepts like [covariate shift or concept drift](https://www.nannyml.com/blog/types-of-data-shift),
	you may be aware that changes in the distribution of
	the production data can affect the model's performance.
	""")

	st.markdown("""Recently a paper from MIT, Harvard, and other institutions showed how [91% of their ML models
	experiments degraded](https://www.nannyml.com/blog/91-of-ml-perfomance-degrade-in-time) in time.""")

	st.markdown("""Typically, we need access to ground truth to know if a model is degrading.
	But most of the time, getting new labeled data is expensive, time-consuming, or impossible.
	So we end up blindless without knowing how the model performs in production.
	""")

	st.markdown("""
	To overcome this issue, we at NannyML created two methods to :violet[_estimate_] the performance of ML models without needing access to
	new labeled data. In this demo, we show the Confidence-based Performance Estimation (CBPE) method, specially designed to estimate
	the performance of classification models.
	""")

	reference_df, analysis_df, analysis_target_df = get_data()

	st.markdown("#### The prediction task")

	st.markdown("""
	A model was trained to predict whether or not a person will repay their car loan. The model used features like:
	car_value, salary_range, loan_lenght, etc.
	""")

	st.dataframe(analysis_df.head(3))

	st.markdown("""
	We know that the model had a Test F1-Score of: 0.943. But what guarantees us that the F1-Score
	will continue to be good on production data?
	""")

	st.markdown("#### Estimating the Model Performance")
	st.markdown("""
	Instead of waiting for ground truth, we can use NannyML's
	[CBPE](https://nannyml.readthedocs.io/en/stable/tutorials/performance_estimation/binary_performance_estimation/standard_metric_estimation.html)
	method to estimate the performance of an ML model.

	CBPE's trick is to use the confidence scores of the ML model. It calibrates the scores to turn them into actual probabilities.
	Once the probabilities are calibrated, it can estimate any performance metric that can be computed from the confusion matrix elements.
	""")

	chunk_size = st.slider('Chunk/Sample Size', 2500, 7500, 5000, 500)
	metric = st.selectbox(
	'Performance Metric',
	('f1', 'roc_auc', 'precision', 'recall', 'specificity', 'accuracy'))
	plot_realized_performance = st.checkbox('Compare NannyML estimation with actual outcomes')

	if st.button('_Estimate_ Performance'):
	with st.spinner('Running...'):
	estimator = nml.CBPE(
	y_pred_proba='y_pred_proba',
	y_pred='y_pred',
	y_true='repaid',
	timestamp_column_name='timestamp',
	metrics=[metric],
	chunk_size=chunk_size,
	problem_type='classification_binary'
	)

	estimator.fit(reference_df)
	estimated_performance = estimator.estimate(analysis_df)

	if plot_realized_performance:
	analysis_with_targets = analysis_df.merge(analysis_target_df, left_index=True, right_index=True)
	calculator = nml.PerformanceCalculator(
	y_pred_proba='y_pred_proba',
	y_pred='y_pred',
	y_true='repaid',
	timestamp_column_name='timestamp',
	metrics=[metric],
	chunk_size=chunk_size,
	problem_type='classification_binary'
	)

	calculator.fit(reference_df)

	realized_performance = calculator.calculate(analysis_with_targets)

	st.plotly_chart(estimated_performance.compare(realized_performance).plot(), use_container_width=False)


	else:
	st.plotly_chart(estimated_performance.plot(), use_container_width=False)


	st.divider()



	st.markdown("""Created by [santiviquez](https://twitter.com/santiviquez) from NannyML.""")

	st.markdown("""
	NannyML is an open-source library for post-deployment data science. Leave us a 🌟 on [GitHub](https://github.com/NannyML/nannyml)
	or [check our docs](https://nannyml.readthedocs.io/en/stable/landing_page.html) to learn more.
	""")