# EQUALIZED ROBUSTNESS: TOWARDS SUSTAINABLE FAIRNESS UNDER DISTRIBUTIONAL SHIFTS

**Anonymous authors**
Paper under double-blind review

ABSTRACT

Increasing concerns have been raised on deep learning fairness in recent years. Existing fairness metrics and algorithms mainly focus on the discrimination of model
performance across different groups on in-distribution data. It remains unclear
whether the fairness achieved on in-distribution data can be generalized to data
with unseen distribution shifts, which are commonly encountered in real-world
applications. In this paper, we first propose a new fairness goal, termed Equalized Robustness (ER), to impose fair model robustness against unseen distribution
shifts across majority and minority groups. ER measures robustness disparity
by the maximum mean discrepancy (MMD) distance between the loss curvature
distributions of two groups of data. We show that previous fairness learning algorithms designed for in-distribution fairness fail to meet the new robust fairness
goal. We further propose a novel fairness learning algorithm, termed Curvature
Matching (CUMA), to simultaneously achieve both traditional in-distribution fairness and our new robust fairness. CUMA debiases the model robustness by minimizing the MMD distance between loss curvature distributions of two groups.
Experiments on three popular datasets show CUMA achieves superior fairness
in robustness against distribution shifts, without more sacrifice on either overall
accuracies or the in-distribution fairness.

1 INTRODUCTION

With the wide deployment of deep learning in modern business applications concerning individual
lives and privacy, there naturally emerge concerns on machine learning fairness (Podesta et al.,
2014; Mu˜noz et al., 2016; Smuha, 2019). Research efforts on various fairness evaluation metrics
and corresponding enforcing methods have been carried out (Edwards & Storkey, 2016; Hardt et al.,
2016; Du et al., 2020). Specifically, many such metrics require some form of “equalized model
performance” across different groups on in-distribution data. Examples include Demographic parity
(DP) (Edwards & Storkey, 2016), Equalized Opportunity (EOpp), and Equalized Odds (EO) (Hardt
et al., 2016).

Unfortunately, when deployed for real-world applications, deep models commonly encounter data
with unforeseeable distribution shifts (Hendrycks & Dietterich, 2019; Hendrycks et al., 2020; 2021).
It has been shown that deep learning models can have drastically degraded performance (Hendrycks
& Dietterich, 2019; Hendrycks et al., 2020; 2021; Taori et al., 2020) and show unreliable behaviors
(Qiu et al., 2019; Yan et al., 2021) under unseen distribution shifts. Intuitively speaking, previous
fairness learning algorithms aim to optimize the model to a local minimum where data from majority
and minority groups have similar average loss values (and thus similar in-distribution performance).
However, those algorithms do not take into consideration the the stability or “robustness” of their
found fairness-aware minima. Taking object detection in a self-driving car for example, it might
have been calibrated over high-quality clear images to be “fair” with different skin colors; however
such fairness may severely break down when applied to data collected in adverse visual conditions,
such as inclement weather, poor lighting, or other digital artifacts. Our experiments also find that
previous state-of-the-art fairness algorithms would be jeopardized if distributional shifts are present
in test data, as illustrated in Figure 1 (b). The above findings beg the following question:

_How to achieve practically sustainable fairness, e.g., even under unseen distribution shifts?_


-----

𝐿(𝑥) 𝐿(𝑥)

𝑥! 𝑥" 𝑥 𝑥! 𝑥!" 𝑥# 𝑥#" 𝑥


𝐿(𝑥)

𝑥! 𝑥!" 𝑥# 𝑥#" 𝑥


(a) Normal training (b) Traditional fair training (c) Robust fair training
(Unfair) (In-distribution fairness) (In-distribution & robust fairness)

Figure 1: Illustrating the achieved fairness of normal training, traditional fair training and our
proposed robust fair training algorithms. Horizontal and vertical axes represent input x and corresponding loss value L(x), respectively. Solid blue curves show the loss landscapes. Circles denote
majority data points (xa and x[′]a[), while triangles denote minority data points (][x][i] [and][ x][′]i[). Green]
points (xa and xi) are in-distribution data while red ones (x[′]a [and][ x]i[′] [) are sampled from test sets with]
distribution shifts. (a) Normal training results in unfair models: minority group has worse performance (i.e., larger loss values). (b) Traditional fair training algorithms can achieve in-distribution
fairness but not in a robust way: a small distribution shift can break the fairness due to loss curvature biases across different groups. In fact, such learned fair models can have almost the same
large bias as the normally trained models when facing distribution shifts. (c) Our robust fair training
algorithm can simultaneously achieve fairness both on in-distribution data and at distribution shifts,
by matching both loss values and loss curvatures across different groups.

To answer that, we first propose a new fairness objective, termed Equalized Robustness (ER),
which aims to impose “equalized robustness” against unseen distribution shifts across the majority
and minority groups, so that the learned fairness can sustain even with test data perturbed. ER
explicitly considers a new dimension of fairness that is practically significant yet so far largely
overlooked. In other words, ER assesses fairness on “out-of-distribution”. Therefore it works as
a complement instead of a replacement for previous fairness metrics, which focus on assessing the
“in-distribution” fairness.

Previous research has shown that model robustness against input perturbation is highly correlated
with loss curvature smoothness (Bartlett et al., 2017; Moosavi-Dezfooli et al., 2019; Weng et al.,
2018). Our experiments also observed that, the local loss curvature of minority group is often larger
than that of majority group, leading to the two group’s robustness discrepancy against distribution
shifts. To this end, we propose to empirically quantify the robustness discrepancy as the maximum
mean discrepancy (MMD) (Gretton et al., 2012) distance between the local model smoothness distributions, for data samples from the majority and minority groups. We experimentally demonstrate
that our new metric aligns well with model performance under real-world distribution shifts. On top
of that, we further propose a new fair learning algorithm, termed Curvature Matching (CUMA),
to simultaneously achieve both traditional in-distribution fairness and ER. CUMA matches the local
curvature distribution between data points from the two different groups, as illustrated in Figure 1
(c), by adding a curvature-matching regularizer that can be efficiently computed via a one-shot power
iteration method. Our codes will be released upon acceptance.

Our contributions can be summarized as bellow:

-  We propose Equalized Robustness (ER), a new fairness objective for machine learning models, to
impose equalized model robustness against unforeseeable distributions shifts across majority and
minority groups.

-  We further propose a new fairness learning algorithm dubbed Curvature Matching (CUMA),
which enforces ER during training by utilizing a one-shot power iteration method.

-  Experiments show that CUMA achieves much more robust fairness against distribution shifts,
without more sacrifice on either overall accuracies or the in-distribution fairness, compared with
traditional in-distribution fair learning methods.


-----

2 PRELIMINARIES

2.1 MACHINE LEARNING FAIRNESS

**Problem Setting and Metrics** Machine learning fairness can be generally categorized into individual fairness and group fairness (Du et al., 2020). Individual fairness requires similar inputs to
have similar predictions (Dwork et al., 2012). Compared with individual fairness, group fairness is
a more popular setting and thus the focus of our paper. Given input data X ∈ R[n] with sensitive attributes A ∈{0, 1} and their corresponding ground truth labels Y ∈{0, 1}, group fairness requires
a learned binary classifier f (·; θ) : R[n] _→{0, 1} parameterized by θ to give equally accurate predic-_
tions (denoted as _Y[ˆ] := f_ (X)) on the two groups with A = 0 and A = 1. Multiple fairness criteria
have been defined in this context. Demographic parity (DP) (Edwards & Storkey, 2016) requires
identical ratio of positive predictions between two groups: P ( Y[ˆ] = 1|A = 0) = P ( Y[ˆ] = 1|A = 1).
Equalized Odds (EO) (Hardt et al., 2016) requires identical false positive rates (FPRs) and false
negative rates (FNRs) between the two groups: P ( Y[ˆ] ̸= Y |A = 0, Y = y) = P ( Y[ˆ] ̸= Y |A =
1, Y = y), ∀y ∈{0, 1}. Equalized Opportunity (EOpp) (Hardt et al., 2016) requires only equal
FNRs between the groups: P ( Y[ˆ] ̸= Y |A = 0, Y = 0) = P ( Y[ˆ] ̸= Y |A = 1, Y = 0). Based on these
fairness criteria, quantified metrics are defined to measure fairness. Specifically, DP, EO and EOpp
distances (Madras et al., 2018) are defined as follows:
∆DP := |P ( Y[ˆ] = 1|A = 0) − _P_ ( Y[ˆ] = 1|A = 1)| (1)

∆EO := _P_ ( Y[ˆ] = Y _A = 0, Y = y)_ _P_ ( Y[ˆ] = Y _A = 1, Y = y)_

_|_ _̸_ _|_ _−_ _̸_ _|_ _|_ (2)
_y∈{X0,1}_

∆EOpp := |P ( Y[ˆ] ̸= Y |A = 0, Y = 0) − _P_ ( Y[ˆ] ̸= Y |A = 1, Y = 0)| (3)
MMD has been previously used to define fairness metric in (Quadrianto & Sharmanska, 2017) defines a more general fairness metric using MMD distance, and shows ∆DP, ∆EO and ∆EOpp to be
spatial cases of their unified metric. All these metrics consider the in-distribution fairness, while our
Equalized Generalizibility is the first fairness metric explicitly aware of robust generalization ability
on unseen distributions.

**Bias Mitigation Methods** Many methods have been proposed to mitigate model bias. Data preprocessing methods such as re-weighting (Kamiran & Calders, 2012) and data-transformation (Calmon et al., 2017) have been used to reduce discrimination before model training. In contrast, Hardt
et al. (2016) and Zhao et al. (2017) propose post-processing methods to calibrate model predictions towards a desired fair distribution after model training. Instead of pre- or post-processing,
researchers have explored to enhance fairness during training. For example, Madras et al. (2018)
uses a adversarial training technique and shows the learned fair representations can transfer to unseen
target tasks. The key technique, adversarial training (Edwards & Storkey, 2016), was designed for
feature disentanglement on hidden representations such that sensitive (Edwards & Storkey, 2016) or
domain-specific information (Ganin et al., 2016) will be removed while keeping other useful information for the target task. The hidden representations are typically the output of intermediate layers
of neural networks (Ganin et al., 2016; Edwards & Storkey, 2016; Madras et al., 2018). Instead,
methods, like adversarial debiasing (Zhang et al., 2018) and its simplified version (Wadsworth et al.,
2018), directly apply the adversary on the output layer of the classifier, which also promotes the
model fairness. Observing the unfairness due to ignoring the worst learning risk of specific samples,
Hashimoto et al. (2018) proposes to use distributionally robust optimization which provably bounds
the worst-case risk over groups. Creager et al. (2019) proposes a flexible fair representation learning
framework based on VAE (Kingma & Welling, 2013), that can be easily adapted for different sensitive attribute settings during run-time. Sarhan et al. (2020) uses orthogonality constraints as a proxy
for independence to disentangles the utility and sensitive representations. Martinez et al. (2020) formulates group fairness with multiple sensitive attributes as a multi-objective learning problem and
proposes a simple optimization algorithm to find the Pareto optimality. Another line of research focuses on learning unbiased representations from biased ones (Bahng et al., 2020; Nam et al., 2020).
Bahng et al. (2020) proposes a novel framework to learn unbiased representations by explicitly enforcing them to be different from a set of pre-defined biased representations. Nam et al. (2020)
observes that data bias can be either benign or malicious, and removing malicious bias along can
achieve fairness. Li & Vasconcelos (2019) jointly learns a data re-sampling weight distribution that
penalizes easy samples and network parameters.


-----

**Applications in Computer Vision** When many fairness metrics and debiasing algorithms are designed for general learning problems as aforementioned, there are a line of research and applications
focusing on fairness-encouraged computer vision tasks. For instance, Buolamwini et al. (Buolamwini & Gebru, 2018) shows current commercial gender-recognition systems have substantial accuracy disparities among groups with different genders and skin colors. Wilson et al. (2019) observe
that state-of-the-art segmentation models achieve better performance on pedestrians with lighter skin
colors. In (Shankar et al., 2017; de Vries et al., 2019), it is found that the common geographical bias
in public image databases can lead to strong performance disparities among images from locales
with different income levels. Nagpal et al. (2019) reveal that the focus region of face-classification
models depends on people’s ages or races, which may explain the source of age- and race-biases
of classifiers. On the awareness of the unfairness, many efforts have been devoted to mitigate such
biases in computer vision tasks. Wang et al. (2019) shows the effectiveness of adversarial debiasing
technique (Zhang et al., 2018) in fair image classification and activity recognition tasks. Beyond the
supervised learning, FairFaceGAN (Hwang et al., 2020) is proposed to prevent undesired sensitive
feature translation during image editing. Similar ideas have also been successfully applied to visual
question answering (Park et al., 2020).

2.2 MODEL ROBUSTNESS AND SMOOTHNESS

Model generalization ability and robustness has been shown to be highly correlated with model
smoothness (Moosavi-Dezfooli et al., 2019; Weng et al., 2018). Weng et al. (2018) and Guo et al.
(2018) use local Lipschitz constant to estimate model robustness against small perturbations on
inputs within a hyper-ball. Moosavi-Dezfooli et al. (2019) proposes to improve model robustness by
adding a curvature constraint to encourage model smoothness. Miyato et al. (2018) approximates
model local smoothness by the spectral norm of Hessian matrix, and improves model robustness
against adversarial attacks by regularizing model smoothness.

3 EQUALIZED ROBUSTNESS: A NEW METRIC FOR FAIR GENERALIZATION
AND ROBUSTNESS

Consider a binary classifier f (·; θ) trained on two groups of data X1 and X2 respectively. Our goal
is to define a metric to measure the gap of model robustness between the two groups. Formulating
such a metric is highly non-trivial, with difficulties from mainly two aspects.

The first challenge is that we need to ensure fair generalization against multiple unseen distribution
shifts that may encounter in real world applications. A trivial solution would be selecting a set of
predefined distribution shifts and measuring the average performance gap (e.g., ∆EO) against them.
However, this approach requires engineering overhead in handcrafting the predefined distribution
shifts, and the predefined distribution shifts may not be representative enough to cover all unseen
cases. Previous research (Miyato et al., 2018; Moosavi-Dezfooli et al., 2019; Guo et al., 2018;
Weng et al., 2018) has shown both theoretically and empirically that deep model robustness scales
with its model smoothness. Following (Miyato et al., 2018; Moosavi-Dezfooli et al., 2019), we
use the spectral norm of Hessian matrix to approximate local smoothness as an indicator of model
robustness. Specifically, given an input x, the Hessian matrix H(x) is defined as the second-order
gradient of L(x) with respect to input x: H(x) = ∇x[2] _[L][(][x][)][. The approximated local curvature][ C][(][x][)]_
at point x is thus defined as:
_C(x) = σ(H(x)),_ (4)

where σ(H) is the spectral norm of H: σ(H) = supv:∥v∥2=1 ∥Hv∥2. Intuitively, C(x) measures
the maximal directional curvature or change rate at x. Thus, smaller C(x) indicates better local
smoothness around x (Miyato et al., 2018; Moosavi-Dezfooli et al., 2019).

For the second difficulty, unlike previous fairness metrics where the target random variable[1] follows
a Bernoulli distribution, the local curvature used in ER is a continuous random variable without a
simple underlying distribution. The unknown distribution form makes it difficult to directly measure
the difference between the curvature distributions by a parametric statistic test (e.g., t-test or KL
divergence). To tackle this problem, we utilize maximum mean discrepancy (MMD) (Gretton et al.,

1Such as Y = 1 in DP and Y ̸= ˆY in EO and EOpp. (See Section 2.1.)


-----

2012) to do a two-sample test on (X1) and (X2). MMD is a distribution distance measure,
_C_ _C_
agnostic to the exact distribution formulation and only based on the mean difference. Formally, our
new fairness metric for equalized robustness is defined as follows:

**Our new fairness metric ∆ER** Consider a machine learning model f trained on two groups of
data X1 and X2 respectively. Suppose C(X1) ∼P1 and C(X2) ∼P2, then the model’s ∆ER is
defined as the squared maximum-mean-discrepancy (MMD) distance between (X1) and (X2):
_C_ _C_

∆ER = MMD[2](P1, P2). (5)

MMD is widely used to measure the distance between two high-dimensional distributions in deep
learning (Li et al., 2015; 2017; Bi´nkowski et al., 2018). The MMD distance between two distributions P and Q is defined as

MMD[2]( _,_ ) = _µ_ _µ_ [=][ E][P] [[][k][(][X, X][)]][ −] [2][E][P][,][Q][[][k][(][X, Y][ )] +][ E][Q][[][k][(][Y, Y][ )]] (6)
_P_ _Q_ _∥_ _P −_ _Q∥H[2]_

where X ∼P, Y ∼Q and k(·, ·) is the kernel function. In practice, we use finite samples from P
and Q to statistically estimate their MMD distance:


_N_

_k(xi, yj) + N[1][2]_
_j=1_

X


MMD[2](P, Q) =


_k(yj, yj′_ ) (7)
_j[′]=1_

X


_k(xi, xi′_ )
_−_
_i[′]=1_

X


_M_ [2]


_MN_


_i=1_


_i=1_


_j=1_


where {xi ∼P}i[M]=1[,][ {][y][j][ ∼Q}][N]j=1[, and we use the mixed RBF kernel function][ k][(][x, y][) =]

_σ_ S _[e][−]_ _[∥][x]2[−]σ[y][2][∥][2]_ with hyperparameter S = 1, 2, 4, 8, 16 . Ablation studies on S values are con_∈_ _{_ _}_

ducted in Section 5.3.

P

4 CURVATURE MATCHING: FAIR MACHINE LEARNING TOWARDS
EQUALIZED ROBUSTNESS

4.1 PRACTICAL CURVATURE APPROXIMATION


In order to achieve equalized robustness, one intuitive solution is to add ∆ER (Eq. (5)) as an regularization term in the loss function during training phase. However, it is non-practical to precisely
calculate the spectral norm (which is equal to the absolute value of dominant eigenvalue) of Hessian matrix in ∆ER. To solve this problem, we use a one-shot power iteration method (PIM) for
practical approximation of C(x) during training. First we rewrite C(x) with the following form:
_C(x) = σ(H(x)) = ∥H(x)v∥, where v is the dominant eigenvector with the maximal eigenvalue,_
which can be calculated by power iteration method. In practice, to increase training efficiency, we
use a one-shot power iteration method. Specifically, we estimate the dominant eigenvector v by the
sign(g)
gradient direction: ˜v := _∥sign(g)∥_ _[≈]_ _[v][, where][ g][ =][ ∇][x][L][(][x][)][. This is because previous works have]_

observed a large similarity between the dominant eigenvector and the gradient direction (Miyato
et al., 2018; Moosavi-Dezfooli et al., 2019). We further approximate Hessian matrix by finite differentiation on gradients: H(x)v _h_ where h is a small constant. As a result, the
_≈_ _[∇][x][L][(][x][+][hv][)][−∇][x][L][(][x][)]_

final approximation of curvature smoothness is

_v)_ _x_ (x)
(x) (x) := _−∇_ _L_ _∥_ _._ (8)
_C_ _≈_ _C[˜]_ _[∥∇][x][L][(][x][ +][ h][˜]h_

_|_ _|_


4.2 CURVATURE MATCHING

With the practical curvature approximation, now we can match the curvature distribution of the two
groups by minimizing the MMD distance. Suppose _C[˜](X1) ∼Q1 and_ _C[˜](X2) ∼Q2, we define the_
curvature matching loss functions as:

_Lcm = MMD[2](Q1, Q2)_ (9)

We add Lcm to the traditional adversarially fair training (Ganin et al., 2016; Madras et al., 2018)
loss function as a regularizer, in order to attain both in-distribution fairness and fair robustness. As


-----

Figure 2: The overall framework of CUMA. x is

𝑄2 the input sample. ht is the utility head for the tar
get task. ha is the adversarial head to predict sensitive attributes. fs is the shared backbone. C(·)
is the curvature estimation function, as defined in

𝑥 𝑓𝑠[(∙; θ]s[)] ℎ𝑡(∙; θ𝑡) 𝐿𝑐𝑙𝑓 Eq. (4). _Q1 and Q2 are local curvature distri-_

butions of majority and minority groups, respectively. Lcm, Lclf and Ladv are three loss terms as
defined in Eq. (9) and (11).

𝐶(∙) 𝑄1 𝐿𝑐𝑚

𝑄2

𝑓∙; θ = ℎ𝑡(𝑓𝑠[(∙; θ]s[); θ]𝑡[)]

𝑥 𝑓𝑠[(∙; θ]s[)] ℎ𝑡(∙; θ𝑡) 𝐿𝑐𝑙𝑓

ℎ𝑎(∙; θ𝑎) 𝐿𝑎𝑑𝑣


illustrated in Figure 2, our model follows the same “two-head” structure as traditional adversarial
learning frameworks (Ganin et al., 2016; Madras et al., 2018), where ht is the utility head for the
target task, ha is the adversarial head to predict sensitive attributes, and fs is the shared backbone.[2]
Suppose for each sample xi, the sensitive attribute is ai and the corresponding target label is yi, then
our overall optimization problem can be written as:


_θmins,θt_ [max]θa _[L][ = min]θs,θt_ [max]θa [(][L][clf][ −] _[α][L][adv][ +][ γ][L][cm][)]_ (10)


where


_clf = [1]_
_L_ _N_


_N_

_ℓ(ht(fs(xi; θs); θt), yi),_ _adv = [1]_
_L_ _N_
_i=1_

X


_ℓ(ha(fs(xi; θs); θa), ai),_ (11)
_i=1_

X


_ℓ(·, ·) is the cross-entropy loss function, α and γ are trade-off hyperparameters, and N is the number_
of training samples.

5 EXPERIMENTS

5.1 EXPERIMENTAL SETUP

**Datasets and pre-processing** Experiments are conducted on three datasets widely used to evaluate machine learning fairness: Communities and Crime (C&C) (Redmond & Baveja, 2002), Adult
(Kohavi, 1996), and CelebA (Liu et al., 2015).[3] C&C dataset has 1,994 samples with neighborhood
population statistics, where 1,500 are used for training and the rest for evaluation. The target task
is to predict violent crime per capita, and we use “RacePctBlack” (percentage of black population
in the neighborhood) and “FemalePctDiv” (divorce ratio of female in the neighborhood) as sensitive attributes. All features in C&C dataset are of continous values in [0, 1]. To fit in the fairness
problem setting, we binarilize the target and sensitive attributes with the top-30% largest value as
the threshold.[4] We also do data-whitening on C&C. Adult dataset has 48,842 samples with basic
personal information such as education and occupation, where 30,000 are used for training and the
rest for evaluation. The target task is to predict the person’s annual income, and we use “gender”
(male or female) as the sensitive attribute. The features in Adult dataset are of either continuous
(e.g., age) or categorical (e.g. sex) values. We use one-hot encoding on the categorical features and
then concatenate them with the continuous ones. We use data whitening on the concatenated features. CelebA has over 200,000 images of celebrity faces, with 40 attribute annotations. The target
task is to predict the “attractiveness” attribute and the sensitive attributes to protect are “chubby”
and “eyeglasses”. We randomly select 45, 000 as training samples and 5, 000 as testing samples. All
images are center-cropped and resized to 128 × 128, and pixel values are scaled to [0, 1].

32Thus the binary classifier f (·; θ) = ht(fs(·; θs); θt), with θ = θt ∪ _θs._
Traditional image classification datasets (e.g., ImageNet) are not directly applicable since they lack fairness
attribute labels.
4As a result P[A = 0] = 30% and P[Y = 0] = 30%.


-----

Table 1: Results on C&C dataset with “RacePctBlack” as the sensitive attribute. The best and
second-best metrics are shown in bold and underlined, respectively.

|Method|Original Test Set|Col3|Col4|Col5|With Gaussian Noise|Col7|With Uniform Noise|Col9|
|---|---|---|---|---|---|---|---|---|
||Accuracy (↑)|∆EOpp(↓)|∆EO (↓)|∆ER (↓)|∆EOpp(↓)|∆EO (↓)|∆EOpp(↓)|∆EO (↓)|
|||In-distribution fairness||Robust fairness under distribution shifts|||||
|Normal AdvDebias LAFTR CUMA|89.05 84.79 85.80 85.20±1.70|38.52 26.68 13.32 12.71±1.47|63.22 39.84 28.83 28.17±1.70|46.16 21.77 16.98 7.59±0.19|35.43 26.68 13.53 10.17±0.89|60.13 39.84 29.04 28.69±1.92|39.51 23.65 16.69 12.85±2.98|64.21 36.81 32.20 27.11±0.82|


Original Test Set With Gaussian Noise With Uniform Noise

Method Accuracy (↑) ∆In-distribution fairnessEOpp(↓) ∆EO (↓) ∆ER (↓) ∆Robust fairness under distribution shiftsEOpp(↓) ∆EO (↓) ∆EOpp(↓) ∆EO (↓)

Normal **89.05** 38.52 63.22 46.16 35.43 60.13 39.51 64.21

AdvDebias 84.79 26.68 39.84 21.77 26.68 39.84 23.65 36.81

LAFTR 85.80 13.32 28.83 16.98 13.53 29.04 16.69 32.20

CUMA 85.20±1.70 **12.71±1.47** **28.17±1.70** **7.59±0.19** **10.17±0.89** **28.69±1.92** **12.85±2.98** **27.11±0.82**


Table 2: Results on C&C dataset with “FemalePctDiv” as the sensitive attribute. The best and
second-best metrics are shown in bold and underlined, respectively.

|Method|Original Test Set|Col3|Col4|Col5|With Gaussian Noise|Col7|With Uniform Noise|Col9|
|---|---|---|---|---|---|---|---|---|
||Accuracy (↑)|∆EOpp(↓)|∆EO (↓)|∆ER (↓)|∆EOpp(↓)|∆EO (↓)|∆EOpp(↓)|∆EO (↓)|
|||In-distribution fairness||Robust fairness under distribution shifts|||||
|Normal AdvDebias LAFTR CUMA|85.60 83.57 83.16 83.39±1.01|17.28 12.80 11.73 8.65±0.59|54.74 38.73 27.83 27.57±0.74|67.69 37.17 28.15 27.70±1.04|17.63 12.80 11.73 8.71±0.88|56.41 38.73 29.30 27.70±1.04|18.77 11.38 11.38 9.63±1.37|54.60 37.15 30.11 28.35±1.73|


**Models** For C&C and Adult datasets, we use two-layer MLPs for fs, ht and ha. For CelebA
dataset, we use ResNet18 as backbone, where the first three stages are used as fs and the last stage
(together with the fully connected classification layer) is used as ht. The auxiliary adversarial head
_ha has the same structure as ht. Detailed model structures are described in Appx. A._

**Baseline Methods** We compare CUMA with the following state-of-the-art in-distribution fairness
algorithms. Adversarial debiasing (AdvDebias) (Zhang et al., 2018) is one of the most popular fair
training algorithm based on adversarial training (Ganin et al., 2016). Madras et al. (2018) proposes a
similar framework termed Learned Adversarially Fair and Transferable Representations (LAFTR),
by replacing the cross-entropy loss used in (Zhang et al., 2018) with a group-normalized ℓ1 loss,
which is shown to work better on highly unbalanced datasets. We also include normal (fairnessignorant) training as a baseline.

**Evaluation Metric** We use three different groups of evaluation metrics: the overall accuracy,
in-distribution fairness metrics, and robust fairness metrics. We report the overall accuracy on all
test samples in the original test sets. To measure in-distribution fairness, we use ∆EOpp and ∆EO
on the original test sets. To measure robust fairness under distribution shifts, we use our newly
proposed ∆ER on the original test sets, and also ∆EOpp and ∆EO on a set of pre-defined real-world
distribution shifts. We intend to show that ∆ER calculated on the original test sets aligns well with
robust fairness under real-world distribution shifts. See the following paragraph for the details in
constructing distributional shifts.

**Distributional shifts** On Adult and C&C datasets, we construct two distribution shifts by adding
random Gaussian and uniform noises, respectively, to the test data. Specifically, following (Madras
et al., 2018; Zhang et al., 2018), the categorical features in Adult and C&C datasets are first one-hot
encoded and then whitened into float-value vectors, where noises are added. Both types of noises
have mean µ = 0 and has standard derivation σ = 0.03 . On CelebA dataset, following (Hendrycks
& Dietterich, 2019), we construct two distribution shifts by adding random Gaussian (with mean
_µ = 0 and standard derivation σ = 0.08) and impulse noise (with ratio p = 0.03), respectively. We_
report the fairness in robustness against other settings of distribution shifts in Appx. C.

**Implementation Details** Unless further specified, we set the loss trade-off parameter α to 1 in all
experiments by default. We use Adam optimizer (Kingma & Ba, 2014) with initial learning rate
10[−][3] and weight decay 10[−][5]. The learning rate is gradually decreased to 0 by cosine annealing
learning rate scheduler (Loshchilov & Hutter, 2016). On both Adult and C&C datasets, we train
for 50 epochs from scratch for all methods. On CelebA dataser, we first normally train a model for
100 epochs, and then finetune it for 20 epochs using CUMA. For fair comparison, we train for 120
epochs on CelebA for all baseline methods. The constant h in Eq. (8) is set to 1 by default. For more
implementation details, please check Appx. A.


-----

Table 3: Results on Adult dataset with “Sex” as the sensitive attribute. The best and second-best

|metrics are|shown in bold and underlined, respectively.|Col3|Col4|Col5|Col6|Col7|Col8|Col9|
|---|---|---|---|---|---|---|---|---|
|Method|Original Test Set||||With Gaussian Noise||With Uniform Noise||
||Accuracy (↑)|∆EOpp(↓)|∆EO (↓)|∆ER (↓)|∆EOpp(↓)|∆EO (↓)|∆EOpp(↓)|∆EO (↓)|
|||In-distribution fairness||Robust fairness under distribution shifts|||||
|Normal AdvDebias LAFTR CUMA|86.11 85.17 85.97 85.30±0.73|6.65 5.12 6.28 4.83±0.24|15.45 5.92 11.96 4.77±0.34|34.25 16.78 25.38 5.59±0.28|6.66 5.10 6.22 4.74±0.32|15.01 5.95 12.08 4.81±0.51|6.87 5.77 6.45 5.43±0.19|15.72 7.29 12.06 6.87±0.31|


Table 4: Results on CelebA dataset with “Chubby” as the sensitive attribute. The best and secondbest metrics are shown in bold and underlined, respectively.

|Method|Original Test Set|Col3|Col4|Col5|With Gaussian Noise|Col7|With Impulse Noise|Col9|
|---|---|---|---|---|---|---|---|---|
||Accuracy (↑)|∆EOpp(↓)|∆EO (↓)|∆ER (↓)|∆EOpp(↓)|∆EO (↓)|∆EOpp(↓)|∆EO (↓)|
|||In-distribution fairness||Robust fairness under distribution shifts|||||
|Normal AdvDebias LAFTR CUMA|91.25 90.48 89.92 89.97±0.38|38.45 26.41 26.54 27.19±0.75|42.56 29.73 29.10 30.26±0.95|59.34 42.65 39.16 23.23±0.39|39.16 28.95 27.94 27.62±0.85|43.90 35.46 34.60 31.49±1.28|39.76 29.73 28.96 27.97±0.48|44.51 36.48 35.12 31.74±1.14|


Original Test Set With Gaussian Noise With Impulse Noise

Method Accuracy (↑) ∆In-distribution fairnessEOpp(↓) ∆EO (↓) ∆ER (↓) ∆Robust fairness under distribution shiftsEOpp(↓) ∆EO (↓) ∆EOpp(↓) ∆EO (↓)

Normal **91.25** 38.45 42.56 59.34 39.16 43.90 39.76 44.51

AdvDebias 90.48 **26.41** 29.73 42.65 28.95 35.46 29.73 36.48

LAFTR 89.92 26.54 **29.10** 39.16 27.94 34.60 28.96 35.12

CUMA 89.97±0.38 27.19±0.75 30.26±0.95 **23.23±0.39** **27.62±0.85** **31.49±1.28** **27.97±0.48** **31.74±1.14**


5.2 MAIN RESULTS

Experimental results on three datasets with different sensitive attributes are shown in Tables 3-5,
where we compare CUMA with the baseline methods on three different groups of metrics as discussed in Section 5.1. “Normal” means standard training without any fairness regularization. All
numbers are shown as percentages. Many intriguing findings can be concluded from the results.

First, we see that previous state-of-the-art fairness learning algorithms would be jeopardized if distributional shifts are present in test data. For example, on C&C dataset with “RacePctBlack” as
sensitive attribute (Table 1), LAFTR achieves ∆EO = 28.83% on in-distribution test set, while that
number is increased to 32.20% on the test set perturbed with uniform random noise. Similarly, for
AdvDebias, it achieves ∆EO = 29.73% on the original CelebA test set with “chubby” as the sensitive attribute (Table 4), while that number is increased to 35.46% and 36.48% on test sets perturbed
with Gaussian and impulse noises, respectively.

Second, we see that CUMA achieves the best robust fairness under distribution shifts on all
three benchmark datasets with different sensitive attribute settings, while maintaining similar indistribution fairness and overall accuracy. For example, on C&C dataset with “RacePctBlack” as
the sensitive attribute (Table 1), CUMA achieves 2.73% and 4.82% less ∆EO than the second-best
performer (LAFTR) under distribution shifts by additive Gaussain and uniform noises, respectively.
Moreover, for the same experiment setting, although CUMA and LAFTR achieve almost identical in-distribution fairness (the difference between their ∆EO on original test set is within 0.5%),
CUMA keeps (and even increases) the fairness under distribution shifts (e.g., 1.33% smaller ∆EO
under uniform noises), while the fairness achieved by LAFTR is jeopardized under both types of
distribution shifts (e.g., 3.37% larger ∆EO under uniform noises). Similarly, on CelebA dataset
with “Chubby” as the sensitive attribute, LAFTR has even slightly better in-distribution fairness
than CUMA. However, when the test sets have distribution shifts, the fairness achieved by LAFTR
is jeopardized (with 5.50% and 6.02% more ∆EO under Gaussian and uniform noises, respectively),
while CUMA keeps its fairness and achieves better fairness under distribution shifts (e.g., 2.50% and
3.17% less ∆EO compared with LAFTR.).

Third, for all three datasets, the ∆ER calculated on the original test set highly correlates with traditional fairness metrics (e.g., ∆EOpp, ∆EO) calculated on the perturbed test sets: the smaller ∆ER
on the in-distribution test set, the smaller ∆EO on perturbed test sets. This shows that our new
metric ∆ER aligns well with robust fairness under real-world distribution shifts, and validates the
rationality of using it as an indicator of model robustness discrimination.

More experimental results are shown in Appx. B (trade-off curves between fairness and accuracy)
and Appx. C (results on other settings of distributional shifts).


-----

Table 5: Results on CelebA dataset with “Eyeglasses” as the sensitive attribute. The best and secondbest metrics are shown in bold and underlined, respectively.

|Method|Original Test Set|Col3|Col4|Col5|With Gaussian Noise|Col7|With Impulse Noise|Col9|
|---|---|---|---|---|---|---|---|---|
||Accuracy (↑)|∆EOpp(↓)|∆EO (↓)|∆ER (↓)|∆EOpp(↓)|∆EO (↓)|∆EOpp(↓)|∆EO (↓)|
|||In-distribution fairness||Robust fairness under distribution shifts|||||
|Normal AdvDebias LAFTR CUMA|90.52 88.65 89.72 89.10±0.13|36.40 23.15 24.90 24.16±0.40|43.96 32.56 35.48 33.39±0.22|54.38 41.06 42.93 32.56±0.41|35.62 25.70 26.12 25.76±0.50|42.91 36.41 37.94 34.77±0.47|37.92 23.92 24.52 22.61±0.06|45.63 33.46 34.10 31.68±0.15|


5.3 ABLATION STUDY

**Ablation Study on ∆ER** In this section, we study how well can the ∆ER predict the robust fairness and the sensitivity of ∆ER with respect to S (the sampling set for σ in the
mixed RBF kernel function, as described in Section 3). A small σ will make the ∆ER
more sensitive to the difference between the two sample set, which could be caused by
either the true discrepancy of distributions or the different noise introduced by sampling.

In contrast, a larger ones may under-estimate
the discrepancy. Thus, a proper S should in- Table 6: ∆ER values with different mixed RBF
clude a wide range of σ to avoid the domination kernel scale parameter set S. Results are reported

on C&C dataset with “RacePctBlack” as the sensi
of either large or small values. In this paper,

tive attribute. Models are trained by CUMA with

we choose a geometric sequence with 2 as the
base, i.e., S = 1, 2, 4, 8, 16 . Furthermore, we
compare ∆ER values under three different sets: S = S1 S = S2 S = S3 with Uniform Noise
S(the default1 = {0.25 S, 0. as defined in Section5, 1, 2, 4}, S2 = {1, 2, 4 3, 8), and, 16} _γγ = 0 = 1.1_ 12.728.56 13.527.61 11.064.22 31.0927.02
S3 = {4, 8, 16, 32, 64}. Results are shown in

|different|γ values.|Col3|
|---|---|---|
||∆ER on Original Test Set|∆EO on Test Set with Uniform Noise|
||S = S1 S = S2 S = S3||
|γ = 0.1 γ = 1 γ = 10|12.72 13.52 11.06 8.56 7.61 4.22 8.40 7.24 4.02|31.09 27.02 26.98|

Table 6. As in Section 5.2, we empirically evaluate the robust fairness by ∆EO on the test set corrupted by uniform noise. From the results, we observe that with all three different S settings, ∆ER
aligns well with the model fairness under distribution shifts (∆EO under uniform noise).

**Ablation Study on CUMA** In this Table 7: Ablation study results on the loss trade-off paramesection, we check the sensitivity of ters α and γ in the CUMA algorithm. Results are reported on
CUMA with respect to its hyper- C&C dataset with “RacePctBlack” as the sensitive attribute.

rameters α and γ in Eq. (10) and h

|Col1|α 0.1 1 10|γ 0.1 1 10|h 0.1 1|
|---|---|---|---|
|Accuracy ∆EO ∆ER|86.94 85.40 83.75 59.74 28.35 32.68 42.50 7.61 18.56|85.19 85.40 84.79 38.85 28.35 27.99 13.52 7.61 7.24|85.32 85.40 29.15 28.35 7.53 7.61|


peeks at around α = 1, so we use
it as the default α value. When fixing α = 1, the best trade-off between overall accuracy and robust
fairness is achieved at round γ = 1, which we use as the default γ. Varying the value of h hardly
affects the performance of CUMA.

6 CONCLUSION


In this paper, we first propose a new fairness goal, termed Equalized Robustness (ER), to impose
fair model robustness against unseen distribution shifts across different data groups. We further propose a novel fairness learning algorithm, termed Curvature Matching (CUMA), to simultaneously
achieve both traditional in-distribution fairness and our new robust fairness. Experiments show
CUMA achieves superior fairness in robustness against distribution shifts, without more sacrifice on
either overall accuracies or the in-distribution fairness compared with traditional in-distribution fair
learning methods. As a pioneer work, the new concept of ER proposed in this paper aims to measure
a new dimension of fairness that is practically significant yet so far largely overlooked: ER assesses
“out-of-distribution” fairness while previous metrics focus on “in-distribution” fairness. Therefor,
ER works as a complement instead of a replacement for previous fairness metrics. We hope our
work can open up more discussions on how to evaluate model fairness in a more complete spectrum.


-----

REFERENCES

Hyojin Bahng, Sanghyuk Chun, Sangdoo Yun, Jaegul Choo, and Seong Joon Oh. Learning de-biased
representations with biased representations. In International Conference on Machine Learning,
pp. 528–539, 2020.

Peter Bartlett, Dylan J Foster, and Matus Telgarsky. Spectrally-normalized margin bounds for neural
networks. In Advances in Neural Information Processing Systems, 2017.

Mikołaj Bi´nkowski, Dougal J Sutherland, Michael Arbel, and Arthur Gretton. Demystifying MMD
GANs. In International Conference on Learning Representations, 2018.

Joy Buolamwini and Timnit Gebru. Gender shades: Intersectional accuracy disparities in commercial gender classification. In Conference on Fairness, Accountability and Transparency, pp.
77–91, 2018.

Flavio P Calmon, Dennis Wei, Bhanukiran Vinzamuri, Karthikeyan Natesan Ramamurthy, and
Kush R Varshney. Optimized pre-processing for discrimination prevention. In International
_Conference on Neural Information Processing Systems, pp. 3995–4004, 2017._

Elliot Creager, David Madras, J¨orn-Henrik Jacobsen, Marissa Weis, Kevin Swersky, Toniann Pitassi,
and Richard Zemel. Flexibly fair representation learning by disentanglement. In International
_Conference on Machine Learning, pp. 1436–1445, 2019._

Terrance de Vries, Ishan Misra, Changhan Wang, and Laurens van der Maaten. Does object recognition work for everyone? In IEEE Conference on Computer Vision and Pattern Recognition
_Workshops, pp. 52–59, 2019._

Mengnan Du, Fan Yang, Na Zou, and Xia Hu. Fairness in deep learning: A computational perspective. IEEE Intelligent Systems, 2020.

Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness
through awareness. In Innovations in Theoretical Computer Science Conference, pp. 214–226,
2012.

Harrison Edwards and Amos Storkey. Censoring representations with an adversary. In International
_Conference on Learning Representations, 2016._

Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pascal Germain, Hugo Larochelle, Franc¸ois
Laviolette, Mario Marchand, and Victor Lempitsky. Domain-adversarial training of neural networks. Journal of Machine Learning Research, 17(1):2096–2030, 2016.

Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Sch¨olkopf, and Alexander Smola.
A kernel two-sample test. Journal of Machine Learning Research, 13(1):723–773, 2012.

Yiwen Guo, Chao Zhang, Changshui Zhang, and Yurong Chen. Sparse DNNs with improved adversarial robustness. In Advances in Neural Information Processing Systems, 2018.

Moritz Hardt, Eric Price, and Nathan Srebro. Equality of opportunity in supervised learning. In
_Advances in Neural Information Processing Systems, 2016._

Tatsunori Hashimoto, Megha Srivastava, Hongseok Namkoong, and Percy Liang. Fairness without
demographics in repeated loss minimization. In International Conference on Machine Learning,
pp. 1929–1938, 2018.

Dan Hendrycks and Thomas Dietterich. Benchmarking neural network robustness to common corruptions and perturbations. International Conference on Learning Representations, 2019.

Dan Hendrycks, Steven Basart, Norman Mu, Saurav Kadavath, Frank Wang, Evan Dorundo, Rahul
Desai, Tyler Zhu, Samyak Parajuli, Mike Guo, Dawn Song, Jacob Steinhardt, and Justin Gilmer.
The many faces of robustness: A critical analysis of out-of-distribution generalization. arXiv
_preprint arXiv:2006.16241, 2020._

Dan Hendrycks, Kevin Zhao, Steven Basart, Jacob Steinhardt, and Dawn Song. Natural adversarial
examples. In IEEE Conference on Computer Vision and Pattern Recognition, 2021.


-----

Sunhee Hwang, Sungho Park, Dohyung Kim, Mirae Do, and Hyeran Byun. Fairfacegan: Fairnessaware facial image-to-image translation. In British Machine Vision Conference, 2020.

Faisal Kamiran and Toon Calders. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1):1–33, 2012.

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint
_arXiv:1412.6980, 2014._

Diederik P Kingma and Max Welling. Auto-encoding variational bayes. _arXiv preprint_
_arXiv:1312.6114, 2013._

Ron Kohavi. Scaling up the accuracy of naive-Bayes classifiers: A decision-tree hybrid. In Interna_tional Conference on Knowledge Discovery and Data Mining, pp. 202–207, 1996._

Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnab´as P´oczos. MMD GAN:
Towards deeper understanding of moment matching network. In International Conference on
_Machine Learning, 2017._

Yi Li and Nuno Vasconcelos. REPAIR: Removing representation bias by dataset resampling. In
_IEEE Conference on Computer Vision and Pattern Recognition, pp. 9572–9581, 2019._

Yujia Li, Kevin Swersky, and Rich Zemel. Generative moment matching networks. In International
_Conference on Machine Learning, pp. 1718–1727, 2015._

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild.
In IEEE International Conference on Computer Vision, 2015.

Ilya Loshchilov and Frank Hutter. SGDR: Stochastic gradient descent with warm restarts. arXiv
_preprint arXiv:1608.03983, 2016._

David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair and
transferable representations. In International Conference on Machine Learning, pp. 3384–3393,
2018.

Natalia Martinez, Martin Bertran, and Guillermo Sapiro. Minimax Pareto fairness: A multi objective
perspective. In International Conference on Machine Learning, pp. 6755–6764, 2020.

Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. Virtual adversarial training: a
regularization method for supervised and semi-supervised learning. IEEE Transactions on Pattern
_Analysis and Machine Intelligence, 41(8):1979–1993, 2018._

Seyed-Mohsen Moosavi-Dezfooli, Alhussein Fawzi, Jonathan Uesato, and Pascal Frossard. Robustness via curvature regularization, and vice versa. In IEEE Conference on Computer Vision and
_Pattern Recognition, pp. 9078–9086, 2019._

Cecilia Mu˜noz, Megan Smith, and DJ Patil. Big data: A report on algorithmic systems, opportunity,
_and civil rights. United States Executive Office of the President, 2016._

Shruti Nagpal, Maneet Singh, Richa Singh, and Mayank Vatsa. Deep learning for face recognition:
Pride or prejudiced? arXiv preprint arXiv:1904.01219, 2019.

Junhyun Nam, Hyuntak Cha, Sung-Soo Ahn, Jaeho Lee, and Jinwoo Shin. Learning from failure: De-biasing classifier from biased classifier. In Advances in Neural Information Processing
_Systems, 2020._

Sungho Park, Sunhee Hwang, Jongkwang Hong, and Hyeran Byun. Fair-VQA: Fairness-aware
visual question answering through sensitive attribute prediction. IEEE Access, 8:215091–215099,
2020.

John Podesta, Penny Pritzker, Ernest J. Moniz, John Holdren, and Jeffery Zients. Big data: Seizing
_opportunities and preserving values. United States Executive Office of the President, 2014._


-----

Yuxian Qiu, Jingwen Leng, Cong Guo, Quan Chen, Chao Li, Minyi Guo, and Yuhao Zhu. Adversarial defense through network profiling based path extraction. In IEEE Conference on Computer
_Vision and Pattern Recognition, pp. 4777–4786, 2019._

Novi Quadrianto and Viktoriia Sharmanska. Recycling privileged learning and distribution matching
for fairness. In Advances in Neural Information Processing Systems, 2017.

Michael Redmond and Alok Baveja. A data-driven software tool for enabling cooperative information sharing among police departments. European Journal of Operational Research, 141(3):
660–678, 2002.

Mhd Hasan Sarhan, Nassir Navab, Abouzar Eslami, and Shadi Albarqouni. Fairness by learning
orthogonal disentangled representations. In European Conference on Computer Vision, pp. 746–
761, 2020.

Shreya Shankar, Yoni Halpern, Eric Breck, James Atwood, Jimbo Wilson, and D Sculley. No classification without representation: Assessing geodiversity issues in open data sets for the developing
world. In Advances in Neural Information Processing Systems Workshop, 2017.

Nathalie A Smuha. The EU approach to ethics guidelines for trustworthy artificial intelligence.
_Computer Law Review International, 20(4):97–106, 2019._

Rohan Taori, Achal Dave, Vaishaal Shankar, Nicholas Carlini, Benjamin Recht, and Ludwig
Schmidt. Measuring robustness to natural distribution shifts in image classification. In Advances
_in Neural Information Processing Systems, 2020._

Christina Wadsworth, Francesca Vera, and Chris Piech. Achieving fairness through adversarial
learning: an application to recidivism prediction. arXiv preprint arXiv:1807.00199, 2018.

Tianlu Wang, Jieyu Zhao, Mark Yatskar, Kai-Wei Chang, and Vicente Ordonez. Balanced datasets
are not enough: Estimating and mitigating gender bias in deep image representations. In IEEE
_International Conference on Computer Vision, pp. 5310–5319, 2019._

Tsui-Wei Weng, Huan Zhang, Pin-Yu Chen, Jinfeng Yi, Dong Su, Yupeng Gao, Cho-Jui Hsieh, and
Luca Daniel. Evaluating the robustness of neural networks: An extreme value theory approach.
In International Conference on Learning Representations, 2018.

Benjamin Wilson, Judy Hoffman, and Jamie Morgenstern. Predictive inequity in object detection.
_arXiv preprint arXiv:1902.11097, 2019._

Hanshu Yan, Jingfeng Zhang, Gang Niu, Jiashi Feng, Vincent YF Tan, and Masashi Sugiyama.
CIFS: Improving adversarial robustness of cnns via channel-wise importance-based feature selection. arXiv preprint arXiv:2102.05311, 2021.

Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. Mitigating unwanted biases with adversarial learning. In AAAI Conference on AI, Ethics, and Society, pp. 335–340, 2018.

Jieyu Zhao, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. Men also like
shopping: Reducing gender bias amplification using corpus-level constraints. _arXiv preprint_
_arXiv:1707.09457, 2017._


-----

A MORE IMPLEMENTATION DETAILS

On C&C and Adult datsets, suppose the input feature dimension is d, then the dimensions of hidden
layers in fs and ht are d → 100 → 64 and 64 → 32 → 2, respectively. ha has identical model
structure with ht. For all three sub-networks, ReLU activation function and dropout layer with 0.25
dropout ratio are applied between the two fully connected layers. On CelebA dataset, we use the
ResNet18 as backbone. The input feature size of ht and ha is 8 _×_ 8 _×_ 256 (with channel-last layout).

B TRADE-OFF CURVES BETWEEN FAIRNESS AND ACCURACY

For CUMA and both baseline methods, we can obtain different trade-offs between fairness and
accuracy by setting the loss function weights (e.g., α and γ) to different values. For example, the
larger α, the better fairness and the worse accuracy. Such trade-off curves between fairness and
accuracy of different methods are shown in Figure 3. The closer the curve to the top-left corner
(i.e., with larger accuracy and smaller ∆EO), the better Pareto frontier is achieved. As we can see,
our method achieves the best Pareto frontiers for both in-distribution fairness (left panel) and robust
fairness under distribution shifts (middle and right panel).

Figure 3: Trade-off curves between fairness and accuracy of different methods. Results are reported
on C&C dataset with “RacePctBlack” as the sensitive attribute.

C RESULTS ON OTHER SETTINGS OF DISTRIBUTIONAL SHIFTS

Table 8: Results on C&C dataset with “RacePctBlack” as the sensitive attribute. The best and
second-best metrics are shown in bold and underlined, respectively.

|Method|Original Test Set|Col3|Col4|Col5|With Gaussian Noise|Col7|With Uniform Noise|Col9|
|---|---|---|---|---|---|---|---|---|
||Accuracy (↑)|∆ EOpp(↓)|∆ (↓) EO|∆ (↓) ER|∆ EOpp(↓)|∆ (↓) EO|∆ EOpp(↓)|∆ (↓) EO|
|||In-distribution fairness||Robust fairness under distribution shifts|||||
|Normal AdvDebias LAFTR CUMA|89.05 84.79 85.80 85.40|38.52 26.68 13.32 12.52|63.22 39.84 28.83 28.35|46.16 21.77 16.98 7.61|36.71 28.61 13.96 11.76|61.54 37.02 31.25 27.15|40.22 22.84 16.58 12.80|63.17 37.41 33.42 27.41|


Table 9: Results on C&C dataset with “FemalePctDiv” as the sensitive attribute. The best and
second-best metrics are shown in bold and underlined, respectively.

|Method|Original Test Set|Col3|Col4|Col5|With Gaussian Noise|Col7|With Uniform Noise|Col9|
|---|---|---|---|---|---|---|---|---|
||Accuracy (↑)|∆ EOpp(↓)|∆ (↓) EO|∆ (↓) ER|∆ EOpp(↓)|∆ (↓) EO|∆ EOpp(↓)|∆ (↓) EO|
|||In-distribution fairness||Robust fairness under distribution shifts|||||
|Normal AdvDebias LAFTR CUMA|85.60 83.57 83.16 83.37|17.28 12.80 11.73 8.90|54.74 38.73 27.83 27.79|67.69 37.17 28.15 23.13|18.52 14.90 13.12 9.12|57.64 39.60 30.21 28.74|20.25 12.58 12.41 9.96|55.52 35.26 31.52 29.23|


In this section, we show that the conclusions drawn in Section 5.2 hold under different settings of
distributional shifts. Specifically, we consider a new noise setting with mean µ = 0 and standard


-----

derivation σ = 0.06 (other than the mean µ = 0 and standard derivation σ = 0.03 evaluated in the
main text) for both random Gaussian and uniform noises. The results under these new distributional
shifts on C&C dataset are shown in Tables 8 and 9.


-----