test new report

#13
by ZeroCommand - opened
👉Robustness issues (1)

When feature “Name” is perturbed with the transformation “Transform to title case”, the model changes its prediction in 6.67% of the cases. We expected the predictions not to be affected by this transformation.

Level Metric Transformation Deviation
medium 🟡 Fail rate = 0.067 Transform to title case 1/15 tested samples (6.67%) changed prediction after perturbation

Taxonomy

avid-effect:performance:P0201
🔍✨Examples
Name Transform to title case(Name) Original prediction Prediction after perturbation
505 Penasco y Castellana, Mr. Victor de Satode Penasco Y Castellana, Mr. Victor De Satode yes (p = 0.50) no (p = 0.51)
👉Overconfidence issues (6)

For records in the dataset where Name contains "mr", we found a significantly higher number of overconfident wrong predictions (31 samples, corresponding to 62.00% of the wrong predictions in the data slice).

Level Data slice Metric Deviation
major 🔴 Name contains "mr" Overconfidence rate = 0.620 +59.19% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Name Survived Predicted Survived
838 Chip, Mr. Chang yes no (p = 0.96)
yes (p = 0.04)
744 Stranden, Mr. Juho yes no (p = 0.96)
yes (p = 0.04)
429 Pickard, Mr. Berk (Berk Trembisky) yes no (p = 0.96)
yes (p = 0.04)

For records in the dataset where text_length(Name) < 28.500, we found a significantly higher number of overconfident wrong predictions (33 samples, corresponding to 55.93% of the wrong predictions in the data slice).

Level Data slice Metric Deviation
major 🔴 text_length(Name) < 28.500 Overconfidence rate = 0.559 +43.61% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Name text_length(Name) Survived Predicted Survived
838 Chip, Mr. Chang 15 yes no (p = 0.96)
yes (p = 0.04)
744 Stranden, Mr. Juho 18 yes no (p = 0.96)
yes (p = 0.04)
643 Foo, Mr. Choong 15 yes no (p = 0.95)
yes (p = 0.05)

For records in the dataset where Fare < 14.850, we found a significantly higher number of overconfident wrong predictions (22 samples, corresponding to 53.66% of the wrong predictions in the data slice).

Level Data slice Metric Deviation
major 🔴 Fare < 14.850 Overconfidence rate = 0.537 +37.77% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Fare Survived Predicted Survived
744 7.925 yes no (p = 0.96)
yes (p = 0.04)
429 8.05 yes no (p = 0.96)
yes (p = 0.04)
338 8.05 yes no (p = 0.95)
yes (p = 0.05)

For records in the dataset where Sex == "male", we found a significantly higher number of overconfident wrong predictions (32 samples, corresponding to 53.33% of the wrong predictions in the data slice).

Level Data slice Metric Deviation
major 🔴 Sex == "male" Overconfidence rate = 0.533 +36.94% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Sex Survived Predicted Survived
838 male yes no (p = 0.96)
yes (p = 0.04)
744 male yes no (p = 0.96)
yes (p = 0.04)
429 male yes no (p = 0.96)
yes (p = 0.04)

For records in the dataset where Parch == 0, we found a significantly higher number of overconfident wrong predictions (31 samples, corresponding to 47.69% of the wrong predictions in the data slice).

Level Data slice Metric Deviation
major 🔴 Parch == 0 Overconfidence rate = 0.477 +22.45% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Parch Survived Predicted Survived
838 0 yes no (p = 0.96)
yes (p = 0.04)
744 0 yes no (p = 0.96)
yes (p = 0.04)
429 0 yes no (p = 0.96)
yes (p = 0.04)

For records in the dataset where SibSp == 0, we found a significantly higher number of overconfident wrong predictions (27 samples, corresponding to 44.26% of the wrong predictions in the data slice).

Level Data slice Metric Deviation
medium 🟡 SibSp == 0 Overconfidence rate = 0.443 +13.65% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
SibSp Survived Predicted Survived
838 0 yes no (p = 0.96)
yes (p = 0.04)
744 0 yes no (p = 0.96)
yes (p = 0.04)
429 0 yes no (p = 0.96)
yes (p = 0.04)
👉Spurious Correlation issues (3)

Data slice Sex == "female" seems to be highly associated to prediction Survived = yes (92.67% of predictions in the data slice).

Level Data slice Metric Deviation
minor 🟡 Sex == "female" Nominal association (Theil's U) = 0.697 Prediction Survived = yes for 92.67% of samples in the slice

Taxonomy

avid-effect:performance:P0103
🔍✨Examples
Sex Survived Predicted Survived
123 female yes yes (p = 0.72)
412 female yes yes (p = 0.91)
849 female yes yes (p = 0.92)

Data slice Sex == "male" seems to be highly associated to prediction Survived = no (96.28% of predictions in the data slice).

Level Data slice Metric Deviation
minor 🟡 Sex == "male" Nominal association (Theil's U) = 0.697 Prediction Survived = no for 96.28% of samples in the slice

Taxonomy

avid-effect:performance:P0103
🔍✨Examples
Sex Survived Predicted Survived
714 male no no (p = 0.94)
81 male yes no (p = 0.95)
555 male no no (p = 0.80)

Data slice Name contains "mr" seems to be highly associated to prediction Survived = no (98.48% of predictions in the data slice).

Level Data slice Metric Deviation
minor 🟡 Name contains "mr" Nominal association (Theil's U) = 0.609 Prediction Survived = no for 98.48% of samples in the slice

Taxonomy

avid-effect:performance:P0103
🔍✨Examples
Name Survived Predicted Survived
714 Greenberg, Mr. Samuel no no (p = 0.94)
81 Sheerlinck, Mr. Jan Baptist yes no (p = 0.95)
555 Wright, Mr. George no no (p = 0.80)
👉Performance issues (10)

For records in the dataset where Name contains "mr", the Recall is 96.85% lower than the global Recall.

Level Data slice Metric Deviation
major 🔴 Name contains "mr" Recall = 0.021 -96.85% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Name Survived Predicted Survived
81 Sheerlinck, Mr. Jan Baptist yes no (p = 0.95)
543 Beane, Mr. Edward yes no (p = 0.87)
390 Carter, Mr. William Ernest yes no (p = 0.77)

For records in the dataset where Sex == "male", the Recall is 83.19% lower than the global Recall.

Level Data slice Metric Deviation
major 🔴 Sex == "male" Recall = 0.111 -83.19% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Sex Survived Predicted Survived
81 male yes no (p = 0.95)
125 male yes no (p = 0.76)
543 male yes no (p = 0.87)

For records in the dataset where Pclass == 3, the Precision is 36.89% lower than the global Precision.

Level Data slice Metric Deviation
major 🔴 Pclass == 3 Precision = 0.475 -36.89% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Pclass Survived Predicted Survived
81 3 yes no (p = 0.95)
125 3 yes no (p = 0.76)
483 3 yes no (p = 0.64)

For records in the dataset where Name contains "master", the Accuracy is 10.0% lower than the global Accuracy.

Level Data slice Metric Deviation
medium 🟡 Name contains "master" Accuracy = 0.708 -10.00% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Name Survived Predicted Survived
125 Nicola-Yarred, Master. Elias yes no (p = 0.76)
348 Coutts, Master. William Loch "William" yes no (p = 0.61)
869 Johnson, Master. Harold Theodor yes no (p = 0.56)

For records in the dataset where Parch == 0, the Recall is 9.2% lower than the global Recall.

Level Data slice Metric Deviation
medium 🟡 Parch == 0 Recall = 0.600 -9.20% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Parch Survived Predicted Survived
81 0 yes no (p = 0.95)
125 0 yes no (p = 0.76)
543 0 yes no (p = 0.87)

For records in the dataset where Parch == 2, the Precision is 8.1% lower than the global Precision.

Level Data slice Metric Deviation
medium 🟡 Parch == 2 Precision = 0.692 -8.10% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Parch Survived Predicted Survived
436 2 no yes (p = 0.58)
390 2 yes no (p = 0.77)
593 2 no yes (p = 0.60)

For records in the dataset where Embarked == "S", the Recall is 7.52% lower than the global Recall.

Level Data slice Metric Deviation
medium 🟡 Embarked == "S" Recall = 0.611 -7.52% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Embarked Survived Predicted Survived
81 S yes no (p = 0.95)
543 S yes no (p = 0.87)
483 S yes no (p = 0.64)

For records in the dataset where Pclass == 1, the Accuracy is 6.82% lower than the global Accuracy.

Level Data slice Metric Deviation
medium 🟡 Pclass == 1 Accuracy = 0.733 -6.82% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Pclass Survived Predicted Survived
390 1 yes no (p = 0.77)
740 1 yes no (p = 0.63)
701 1 yes no (p = 0.70)

For records in the dataset where Name contains "miss", the Accuracy is 6.31% lower than the global Accuracy.

Level Data slice Metric Deviation
medium 🟡 Name contains "miss" Accuracy = 0.737 -6.31% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Name Survived Predicted Survived
882 Dahlberg, Miss. Gerda Ulrika no yes (p = 0.55)
436 Ford, Miss. Doolina Margaret "Daisy" no yes (p = 0.58)
205 Strom, Miss. Telma Matilda no yes (p = 0.83)

For records in the dataset where Embarked == "Q", the Precision is 5.18% lower than the global Precision.

Level Data slice Metric Deviation
medium 🟡 Embarked == "Q" Precision = 0.714 -5.18% than global

Taxonomy

avid-effect:performance:P0204
🔍✨Examples
Embarked Survived Predicted Survived
593 Q no yes (p = 0.60)
657 Q no yes (p = 0.75)
885 Q no yes (p = 0.62)

Sign up or log in to comment