ZeroCommand/test-giskard-evaluator

👉Robustness issues (1)

When feature “Name” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 6.67% of the cases. We expected the predictions not to be affected by this transformation.

Level	Metric	Transformation	Deviation
medium 🟡	Fail rate = 0.067	Transform to title case	1/15 tested samples (6.67%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201

🔍✨Examples

	Name	Transform to title case(Name)	Original prediction	Prediction after perturbation
505	Penasco y Castellana, Mr. Victor de Satode	Penasco Y Castellana, Mr. Victor De Satode	yes (p = 0.50)	no (p = 0.51)

👉Overconfidence issues (6)

For records in the dataset where Name contains "mr", we found a significantly higher number of overconfident wrong predictions (31 samples, corresponding to 62.00% of the wrong predictions in the data slice).

Level	Data slice	Metric	Deviation
major 🔴	`Name` contains "mr"	Overconfidence rate = 0.620	+59.19% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Name	Survived	Predicted `Survived`
838	Chip, Mr. Chang	yes	no (p = 0.96)
			yes (p = 0.04)
744	Stranden, Mr. Juho	yes	no (p = 0.96)
			yes (p = 0.04)
429	Pickard, Mr. Berk (Berk Trembisky)	yes	no (p = 0.96)
			yes (p = 0.04)

For records in the dataset where text_length(Name) < 28.500, we found a significantly higher number of overconfident wrong predictions (33 samples, corresponding to 55.93% of the wrong predictions in the data slice).

Level	Data slice	Metric	Deviation
major 🔴	`text_length(Name)` < 28.500	Overconfidence rate = 0.559	+43.61% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Name	text_length(Name)	Survived	Predicted `Survived`
838	Chip, Mr. Chang	15	yes	no (p = 0.96)
				yes (p = 0.04)
744	Stranden, Mr. Juho	18	yes	no (p = 0.96)
				yes (p = 0.04)
643	Foo, Mr. Choong	15	yes	no (p = 0.95)
				yes (p = 0.05)

For records in the dataset where Fare < 14.850, we found a significantly higher number of overconfident wrong predictions (22 samples, corresponding to 53.66% of the wrong predictions in the data slice).

Level	Data slice	Metric	Deviation
major 🔴	`Fare` < 14.850	Overconfidence rate = 0.537	+37.77% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Fare	Survived	Predicted `Survived`
744	7.925	yes	no (p = 0.96)
			yes (p = 0.04)
429	8.05	yes	no (p = 0.96)
			yes (p = 0.04)
338	8.05	yes	no (p = 0.95)
			yes (p = 0.05)

For records in the dataset where Sex == "male", we found a significantly higher number of overconfident wrong predictions (32 samples, corresponding to 53.33% of the wrong predictions in the data slice).

Level	Data slice	Metric	Deviation
major 🔴	`Sex` == "male"	Overconfidence rate = 0.533	+36.94% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Sex	Survived	Predicted `Survived`
838	male	yes	no (p = 0.96)
			yes (p = 0.04)
744	male	yes	no (p = 0.96)
			yes (p = 0.04)
429	male	yes	no (p = 0.96)
			yes (p = 0.04)

For records in the dataset where Parch == 0, we found a significantly higher number of overconfident wrong predictions (31 samples, corresponding to 47.69% of the wrong predictions in the data slice).

Level	Data slice	Metric	Deviation
major 🔴	`Parch` == 0	Overconfidence rate = 0.477	+22.45% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Parch	Survived	Predicted `Survived`
838	0	yes	no (p = 0.96)
			yes (p = 0.04)
744	0	yes	no (p = 0.96)
			yes (p = 0.04)
429	0	yes	no (p = 0.96)
			yes (p = 0.04)

For records in the dataset where SibSp == 0, we found a significantly higher number of overconfident wrong predictions (27 samples, corresponding to 44.26% of the wrong predictions in the data slice).

Level	Data slice	Metric	Deviation
medium 🟡	`SibSp` == 0	Overconfidence rate = 0.443	+13.65% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	SibSp	Survived	Predicted `Survived`
838	0	yes	no (p = 0.96)
			yes (p = 0.04)
744	0	yes	no (p = 0.96)
			yes (p = 0.04)
429	0	yes	no (p = 0.96)
			yes (p = 0.04)

👉Spurious Correlation issues (3)

Data slice Sex == "female" seems to be highly associated to prediction Survived = yes (92.67% of predictions in the data slice).

Level	Data slice	Metric	Deviation
minor 🟡	`Sex` == "female"	Nominal association (Theil's U) = 0.697	Prediction Survived = `yes` for 92.67% of samples in the slice

Taxonomy

avid-effect:performance:P0103

🔍✨Examples

	Sex	Survived	Predicted `Survived`
123	female	yes	yes (p = 0.72)
412	female	yes	yes (p = 0.91)
849	female	yes	yes (p = 0.92)

Data slice Sex == "male" seems to be highly associated to prediction Survived = no (96.28% of predictions in the data slice).

Level	Data slice	Metric	Deviation
minor 🟡	`Sex` == "male"	Nominal association (Theil's U) = 0.697	Prediction Survived = `no` for 96.28% of samples in the slice

Taxonomy

avid-effect:performance:P0103

🔍✨Examples

	Sex	Survived	Predicted `Survived`
714	male	no	no (p = 0.94)
81	male	yes	no (p = 0.95)
555	male	no	no (p = 0.80)

Data slice Name contains "mr" seems to be highly associated to prediction Survived = no (98.48% of predictions in the data slice).

Level	Data slice	Metric	Deviation
minor 🟡	`Name` contains "mr"	Nominal association (Theil's U) = 0.609	Prediction Survived = `no` for 98.48% of samples in the slice

Taxonomy

avid-effect:performance:P0103

🔍✨Examples

	Name	Survived	Predicted `Survived`
714	Greenberg, Mr. Samuel	no	no (p = 0.94)
81	Sheerlinck, Mr. Jan Baptist	yes	no (p = 0.95)
555	Wright, Mr. George	no	no (p = 0.80)

👉Performance issues (10)

For records in the dataset where Name contains "mr", the Recall is 96.85% lower than the global Recall.

Level	Data slice	Metric	Deviation
major 🔴	`Name` contains "mr"	Recall = 0.021	-96.85% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Name	Survived	Predicted `Survived`
81	Sheerlinck, Mr. Jan Baptist	yes	no (p = 0.95)
543	Beane, Mr. Edward	yes	no (p = 0.87)
390	Carter, Mr. William Ernest	yes	no (p = 0.77)

For records in the dataset where Sex == "male", the Recall is 83.19% lower than the global Recall.

Level	Data slice	Metric	Deviation
major 🔴	`Sex` == "male"	Recall = 0.111	-83.19% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Sex	Survived	Predicted `Survived`
81	male	yes	no (p = 0.95)
125	male	yes	no (p = 0.76)
543	male	yes	no (p = 0.87)

For records in the dataset where Pclass == 3, the Precision is 36.89% lower than the global Precision.

Level	Data slice	Metric	Deviation
major 🔴	`Pclass` == 3	Precision = 0.475	-36.89% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Pclass	Survived	Predicted `Survived`
81	3	yes	no (p = 0.95)
125	3	yes	no (p = 0.76)
483	3	yes	no (p = 0.64)

For records in the dataset where Name contains "master", the Accuracy is 10.0% lower than the global Accuracy.

Level	Data slice	Metric	Deviation
medium 🟡	`Name` contains "master"	Accuracy = 0.708	-10.00% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Name	Survived	Predicted `Survived`
125	Nicola-Yarred, Master. Elias	yes	no (p = 0.76)
348	Coutts, Master. William Loch "William"	yes	no (p = 0.61)
869	Johnson, Master. Harold Theodor	yes	no (p = 0.56)

For records in the dataset where Parch == 0, the Recall is 9.2% lower than the global Recall.

Level	Data slice	Metric	Deviation
medium 🟡	`Parch` == 0	Recall = 0.600	-9.20% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Survived	Predicted `Survived`
81	yes	no (p = 0.95)
125	yes	no (p = 0.76)
543	yes	no (p = 0.87)

For records in the dataset where Parch == 2, the Precision is 8.1% lower than the global Precision.

Level	Data slice	Metric	Deviation
medium 🟡	`Parch` == 2	Precision = 0.692	-8.10% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Parch	Survived	Predicted `Survived`
436	2	no	yes (p = 0.58)
390	2	yes	no (p = 0.77)
593	2	no	yes (p = 0.60)

For records in the dataset where Embarked == "S", the Recall is 7.52% lower than the global Recall.

Level	Data slice	Metric	Deviation
medium 🟡	`Embarked` == "S"	Recall = 0.611	-7.52% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Embarked	Survived	Predicted `Survived`
81	S	yes	no (p = 0.95)
543	S	yes	no (p = 0.87)
483	S	yes	no (p = 0.64)

For records in the dataset where Pclass == 1, the Accuracy is 6.82% lower than the global Accuracy.

Level	Data slice	Metric	Deviation
medium 🟡	`Pclass` == 1	Accuracy = 0.733	-6.82% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Pclass	Survived	Predicted `Survived`
390	1	yes	no (p = 0.77)
740	1	yes	no (p = 0.63)
701	1	yes	no (p = 0.70)

For records in the dataset where Name contains "miss", the Accuracy is 6.31% lower than the global Accuracy.

Level	Data slice	Metric	Deviation
medium 🟡	`Name` contains "miss"	Accuracy = 0.737	-6.31% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Name	Survived	Predicted `Survived`
882	Dahlberg, Miss. Gerda Ulrika	no	yes (p = 0.55)
436	Ford, Miss. Doolina Margaret "Daisy"	no	yes (p = 0.58)
205	Strom, Miss. Telma Matilda	no	yes (p = 0.83)

For records in the dataset where Embarked == "Q", the Precision is 5.18% lower than the global Precision.

Level	Data slice	Metric	Deviation
medium 🟡	`Embarked` == "Q"	Precision = 0.714	-5.18% than global

Taxonomy

avid-effect:performance:P0204

🔍✨Examples

	Embarked	Survived	Predicted `Survived`
593	Q	no	yes (p = 0.60)
657	Q	no	yes (p = 0.75)
885	Q	no	yes (p = 0.62)

Spaces:

ZeroCommand
/

test-giskard-evaluator

Sleeping

test new report

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy

Taxonomy