|
# THE MAGNITUDE VECTOR OF IMAGES |
|
|
|
**Anonymous authors** |
|
Paper under double-blind review |
|
|
|
ABSTRACT |
|
|
|
The magnitude of a finite metric space is a recently-introduced invariant quantity. |
|
Despite beneficial theoretical and practical properties, such as a general utility for |
|
outlier detection, and a close connection to Laplace radial basis kernels, magnitude |
|
has received little attention by the machine learning community so far. In this |
|
work, we investigate the properties of magnitude on individual images, with each |
|
image forming its own metric space. We show that the known properties of outlier |
|
detection translate to edge detection in images and we give supporting theoretical |
|
justifications. In addition, we provide a proof of concept of its utility by using a |
|
novel magnitude layer to defend against adversarial attacks. Since naive magnitude |
|
calculations may be computationally prohibitive, we introduce an algorithm that |
|
leverages the regular structure of images to dramatically reduce the computational |
|
cost. |
|
|
|
1 INTRODUCTION |
|
|
|
The topology community has recently invested much effort in studying a newly introduced quantity |
|
called magnitude (Leinster, 2010). While it originates from category theory, where it can be seen |
|
as a generalization of Euler characteristics to metric spaces, the magnitude of a metric space is |
|
most intuitively understood as an attempt to measure the effective size of a metric space. As a |
|
descriptive scalar, this quantity extends the set of other well known descriptors such as the rank, |
|
diameter or dimension. However, unlike those descriptors, the properties of magnitude are not yet |
|
well understood. |
|
|
|
Even though the metric space structure of dataset is of the utmost importance to understand e.g. |
|
the regularisation behaviour of classifiers, magnitude has received little attention by the machine |
|
learning community so far. However, it turns out that one can instead investigate the magnitude of |
|
each data point separately, considering each data instance as its own metric space. Following this line |
|
of thought, magnitude vectors were introduced as a way to characterise the contribution of each data |
|
sample to the overall magnitude, such that the sum of the elements of the magnitude vector amounts |
|
to the magnitude. As shown in previous works, the magnitude vectors can detect boundaries of a |
|
metric space, with boundary points having a larger contribution to magnitude (Bunch et al., 2021). |
|
|
|
In this work, we seek to advance the research about magnitude in machine learning by addressing |
|
these challenges. Since the metric space structure of an entire dataset may not be the most useful |
|
application in machine learning, we instead therefore consider the magnitude of each individual data |
|
point by endowing each of them with a metric space structure and explore its properties. In particular, |
|
because of their ubiquity and the large dimensionality, we focus our analysis on image datasets, where |
|
each individual image is seen as a metric space. We then extend previous results by investigating |
|
the theoretical properties associated with magnitude vector in such a context. In addition, we study |
|
the properties of the magnitude vector for improving the adversarial robustness of existing neural |
|
network architectures for image classification. We propose a novel, fully differentiable, magnitude |
|
_layer, which can serve as an input layer for any deep learning architecture. We show that it results in_ |
|
more robustness for several types of adversarial attacks, with an acceptable reduction in classification |
|
performance, paving the way for a new exciting direction in the creation of robust neural architectures. |
|
Moreover, since naive magnitude calculations may be computationally prohibitive for large images, |
|
we introduce a new algorithm that dramatically speeds up the computation of the magnitude vector. |
|
Leveraging the regular structure of images, this allows to conveniently approximate magnitude vectors |
|
for large images with minimal error. Intractable computational runtime often stymies the applicability |
|
of magnitude in machine learning and therefore hinders further the research of it; our algorithm opens |
|
|
|
|
|
----- |
|
|
|
the door to using magnitude efficiently in machine learning research. Equipped with the speed up |
|
algorithm, we showcase possible use cases of magnitude vectors in machine learning in the realm of |
|
in edge detection and adversarial robustness. |
|
|
|
Our contributions are summarized as follows: |
|
|
|
- We formalize the notion of magnitude vectors for images and investigate the impact of the |
|
choice of different metrics. |
|
|
|
- We introduce an algorithm that dramatically speeds up the computation of magnitude with |
|
little loss, which removes a critical roadblock in using magnitude vectors in machine learning |
|
applications. |
|
|
|
- We provide a theoretical framework to understand the edge detection capability of magnitude |
|
vectors and report empirical supporting evidence. |
|
|
|
- We demonstrate the capabilities of a novel, fully differentiable, magnitude layer for improving adversarial robustness of computer vision architectures. |
|
|
|
2 THEORY |
|
|
|
In this section, we introduce essential notions of the theory of magnitude and magnitude vectors and |
|
develop the theoretical background that will help interpreting the empirical results. |
|
|
|
2.1 MATHEMATICAL BACKGROUND |
|
|
|
We start by formally introducing the notion of a finite metric space. |
|
**Definition 1. A metric space is an ordered pair (B, d), where B is a finite set and d is a metric on** |
|
_B. We denote the cardinality of B by |B|._ |
|
|
|
In our application the set B is a set of vectors B ⊂ R[n] and the metric considered will be the ℓp norm. |
|
In order to define the magnitude of such a space we first define the similarity matrix. |
|
**Definition 2. Given a finite metric space M = (B, d), its similarity matrix is ζM with entries** |
|
_ζM_ (i, j) = e[−][d][(][B][i][,B][j] [)] _for Bi, Bj_ _B._ |
|
_∈_ |
|
|
|
The similarity matrix can be seen as the weights arising from an exponential kernel. We are now in a |
|
position to define the magnitude vector and the magnitude of a finite vector space. |
|
**Definition 3. Given a finite metric space M = (B, d) with |B| = n and similarity matrix ζM with** |
|
_inverse ζM[−][1][, the magnitude vector of element][ B][i][ is given by][ w][i][ =][ P]j[n]=1_ _[ζ]M[−][1][(][i, j][)][. Moreover, the]_ |
|
_magnitude of M_ _, magM is_ _i,j=1_ _[ζ]M[−][1][(][i, j][) =][ P]i[n]=1_ _[w][i][.]_ |
|
|
|
Not every finite metric space has a magnitude. In particular, the magnitude is not defined when the |
|
|
|
[P][n] |
|
|
|
similarity matrix is not invertible; the magnitude therefore characterizes the structure of a metric |
|
space to some extent. It should be also noted that the definition of the magnitude vector is reminiscent |
|
of optimising a support vector machine. This connection has been pointed out for the Euclidean norm |
|
by Bunch et al. (2021). |
|
|
|
2.2 THEORETICAL RESULTS |
|
|
|
2.2.1 THE MAGNITUDE OF AN IMAGE |
|
|
|
This work focuses on the analysis of magnitude on images, by considering each individual image as |
|
its own metric space. We then first define how we build such a metric space from an image and then |
|
prove the existence of a magnitude on images. |
|
|
|
We refer to the metric spaces on images as image metric spaces and define them as follows. |
|
**Definition 4. Let I ∈** R[c][×][n][×][m] _be an image with c channels, height n and width m. Let {V :_ |
|
**_vij, i_** 1, . . ., m; j 1, . . ., n _be the set of c-dimensional vectors of pixel values in the image._ |
|
_∈_ _∈_ _}_ |
|
_Then the image metric space M_ (B, d) is given by a set of vectors B ⊂ R[c][+2] _of the form B =_ |
|
_{(i, j, vij[(1)][, . . ., v]ij[(][c][)][)][T][ :][ ∀][v][ij][ ∈]_ _[V][ }][ together with a metric][ d][ on][ R][c][+2][.]_ |
|
|
|
|
|
----- |
|
|
|
Informally, we put all vectors corresponding to pixel values on a grid, concatenating the position |
|
vector with the pixel value vector and use the resulting vectors as the ground set B for our finite |
|
metric space. Let us note that |B| = m × n and therefore the number of points in the metric space |
|
can be quite large even for moderately-sized images, a potential limitation that we address in Section |
|
2.3. |
|
|
|
We now turn to investigating when an image metric space has magnitude and, by extension, a |
|
magnitude vector. This is a priori not clear since, as we saw, the existence of magnitude depends on |
|
the properties of the metric space. |
|
|
|
From the definition of magnitude (Definition 3), it is readily seen that an image metric space M |
|
has magnitude if and only if its similarity matrix ξM is invertible. As generic n × n matrices are |
|
invertible (i.e. subjecting any non-invertible matrix to a random perturbation will almost certainly |
|
result in an invertible matrix), we conclude that generic image metric spaces have magnitude. In |
|
fact, since the vectors b ∈ _B of the image metric space by a factor t > 0 can be rescaled, we can_ |
|
define scaled metrics d(·, ·) 7→ _td(·, ·). This rescaled metric space has magnitude except for finitely_ |
|
many t > 0 (Leinster et al., 2017, Proposition 2.8). In other words, we can always find a scaling |
|
which gives rise to a magnitude vector. The univariate function defined by this scaling is called the |
|
_magnitude function._ |
|
|
|
In general, we are interested in computing the magnitude of all images in a whole dataset, and not |
|
only of a single image, such that we can compare their magnitude (vectors). Therefore, we would like |
|
theoretical guarantees that there exists a scaling such that every image in a dataset has a magnitude |
|
vector. |
|
|
|
**Proposition 5. Let M** (B, d) be an image metric space. If d is an ℓp norm with 0 < p < 2, then |
|
_every image metric space M has magnitude._ |
|
|
|
_Proof. This is a special case of (Meckes, 2013, Theorem 3.6)._ |
|
|
|
The above proposition theoretically only applies to ℓp norm with 0 < p < 2. However, in practice, |
|
we find that on the various datasets we considered in this work, an ℓp norm with p = 4, 10 and the |
|
Hamming distance also lead to invertible similarity matrices, and thus to the existence of magnitude. |
|
|
|
The image metric space exhibits substantial structure; in particular, there is an underlying regular |
|
subspace (the grid). To quantify this further, we use the notion of a product space. |
|
|
|
**Lemma 6. Let M1(B1, d1) and M2(B2, d2) be two finite metric spaces with d1, d2 being ℓp norms.** |
|
_Its product metric space M = M1×M2 is a metric space with a metric ρp : (B1×B2)×(B1×B2) →_ |
|
R such that |
|
|
|
|
|
(d1(x1, x2)[p] + d2(y1, y2)[p], 1 _p_ _._ |
|
_≤_ _≤∞_ |
|
|
|
|
|
_ρp((x1_ _y1), (x2_ _y2)) =_ |
|
_×_ _×_ |
|
|
|
|
|
_Proof. This is a special case of (Dovgoshey & Martio, 2009)._ |
|
|
|
Note that the product space formulation gives a lot of freedom, since every positively weighted |
|
combination of d1 and d2 is also a metric on the product space. |
|
|
|
**Magnitude vector on images based on harmonic analysis** We now present our main result of |
|
this subsection which is an interpretation of the magnitude vector for images based on harmonic |
|
analysis. For this we consider grey scale images, however, our reasoning generalises to colour images. |
|
First, notice that a grey scale image can be seen as discretisation of a surface in R[3], i.e. z = f (x, y), |
|
where x, y are the pixel positions and z is the brightness. In general, it is not clear what an “outlier” on |
|
a continuous surface or curve is. In this paper we define an outlier as a point on the surface where the |
|
gradient is large, i.e. neighbouring points are further away w.r.t. some distance measure. In the image |
|
case, outliers are points where the brightness value is substantially different between neighbouring |
|
pixels. We can use this reasoning to define a filtration on the vectors of B, bij = (i, j, vij[(1)][)][T][,] |
|
_F_ |
|
|
|
namely _B[(1)]_ _B[(2)]_ _B[(][K][)]_ = B such that vij[(1)] _< δ[(][k][)]_ for k = 1, . . ., K, where the |
|
_∅⊆_ _⊆_ _⊆· · · ⊆_ |
|
_δ[(][k][)]_ are different brightness thresholds. Due to symmetry in the problem we also define F _[′]_ with |
|
criterion vij[(1)] |
|
|
|
_[≥]_ _[δ][(][k][)][. Further, we define the projection onto][ R][2][,][ p][ :][ R][3][ →]_ [R][2][, p][(][b][ij][)][ 7→] [(][i, j][)][. By] |
|
|
|
|
|
----- |
|
|
|
1.0 |
|
|
|
0.8 |
|
|
|
0.6 |
|
|
|
0.4 |
|
|
|
0.2 |
|
|
|
0.0 |
|
|
|
25 |
|
|
|
20 |
|
|
|
0 5 10 15 20 25 0 5 10 15 |
|
|
|
|
|
(a) (b) (c) (d) |
|
|
|
Figure 1: A schematic example of the filtration. In (a) we introduce three thresholds, 0, 0.5, 1. In |
|
_F all points below the grey planes are projected to the unit square and in F_ _[′]_ all points above are |
|
projected. (b) is an illustration of the 0.5 level of the filtration F and (c) is is the same for F _[′]. Point_ |
|
size and colour indicate the magnitude vector values. In (d) we reconstruct the magnitude vector. |
|
|
|
considering the projections of each subset of F (F _[′]) we break the problem down into successive_ |
|
boundary detection of compact subsets of R[2]. To extend this reasoning to colour images we can |
|
either consider each channel as a grey scale image or define multi-dimensional filtrations. A visual |
|
description of our reasoning is found in Figure 1. |
|
|
|
To investigate the effect of different metrics on boundary detection, we consider the behaviour of the |
|
weighting vector on the 2-dimensional grid. We closely follow the argument of Bunch et al. (2021), |
|
extending their results to the product space metric in the case when di are ℓp norms with p = 1, 2 and |
|
|
|
_ρ((x1_ _z1), (x2_ _z2)) = α1d1(x1, x2) + α2d2(z1, z2)_ (1) |
|
_×_ _×_ |
|
|
|
for some positive weights α1, α2. For image we can consider the xi as two-dimensional vectors |
|
indicating the position on the unit square and zi as the corresponding brightness value. Consider a |
|
regular 2d grid and the equation defining a weighting on the grid points ζM **_v = (1, . . ., 1)[T]_** . Using a |
|
continuous analogue, this can be written as a convolution (Bunch et al., 2021) |
|
|
|
|
|
(2) |
|
R[2][ e][−][αd][(][x][,][y][)][v][(][y][) =][ I][[0][,][1]][2] [(][x][)][,] |
|
|
|
|
|
_f ⋆v(x) =_ |
|
|
|
|
|
R[2][ f] [(][x][ −] **_[y][)][v][(][y][) =]_** |
|
|
|
|
|
where I[0,1]2 is the indicator function and d is any translation invariant metric. Using the Fourier |
|
transform, |
|
|
|
F(f )(ξ) = (3) |
|
|
|
R[2][ e][−][i][2][π][x][·][ξ][f] [(][x][)][d][x][,] |
|
|
|
Z |
|
|
|
its well-known properties Folland (1999) and the convolution theorem, we can derive an intuitive |
|
understanding of the magnitude vector and the effects of specific metrics. In the case of d(·, ·) being |
|
the Euclidean (ℓ2) norm, it has been shown in (Bunch et al., 2021) that |
|
|
|
|
|
Γ( [3]2 [)] |
|
|
|
3 (α[2] |
|
_απ_ 2 _−_ |
|
|
|
|
|
3 |
|
_∂i[2][)]_ 2 I[0,1]2 = v, (4) |
|
_i=1_ |
|
|
|
X |
|
|
|
|
|
where we generalised the result of Bunch et al. (2021) to the unit square. We can interpret equation 4 |
|
as the weighting v being constant in the interior of the unit square (∂i[2] [is zero in interior) and different] |
|
to this constant on the boundary. In fact, the weighting will also be constant on the edges and corners. |
|
|
|
In the Euclidean case it has been shown (Bunch et al., 2021) that the intuitive understanding translates |
|
rigorously to compact subsets of R[n] with n odd using distribution theory. Guided by this, we continue |
|
our argument using the Fourier transform. |
|
|
|
If d(·, ·) is the ℓ1 norm, i.e. the Manhattan or Cityblock distance, we obtain a similar result to |
|
equation 4, |
|
2 |
|
"i=1(α[2] _−_ _∂i[2][)]#_ I[0,1]2 = v, (5) |
|
Y |
|
|
|
|
|
----- |
|
|
|
in other words, the ℓ1 norm admits the same interpretation as ℓ2, however, this time the differential |
|
operators acts multiplicatively on each dimension. |
|
|
|
For the product space metric ρ(·, ·) = α1d1(·, ·) + α2d2(·, ·) and di are ℓp with p = 1, 2 it follows |
|
that the Fourier transform is a product of the single-metric results. Equipped with this insight, we can |
|
now explain the edge detection capabilities of the magnitude vector. |
|
|
|
Consider a filtration F (and F _[′]) on a grey scale image and choose a subset of vectors B[(][i][)]_ from the |
|
filtration. Apply the projection map p to every vector in this subset. The transformed set is a grid with |
|
potentially “missing” grid points on the domain [0, n] × [0, m]. From the results on the unit square |
|
and the discretising of the ∂ operator, we expect a constant weighting vector except on the boundaries |
|
of the grid, i.e. on points adjacent to the missing grid points and on the boundaries of the domain. |
|
This procedure can be performed on every B[(][i][)] _∈F and the final magnitude vector can be seen as a_ |
|
a combination of the weightings of each step (see Figure 1). We find that our expectations from the |
|
theoretical result agree well with empirical results (see Appendix D). Moreover, we empirically find |
|
that Hamming distance and ℓp norms for p ̸= 1, 2 have similar properties. |
|
|
|
2.3 COMPUTATIONAL CONSIDERATIONS AND SPEED UP |
|
|
|
As can be seen from Definition 3, the computation of the magnitude vector is achieved via a matrix |
|
inversion. This is a fundamentally expensive computation since for an m × n image, we need to |
|
invert an N × N square matrix with N = mn. In other words, even for moderately sized images |
|
there is a considerable computational challenge to overcome. |
|
|
|
The authors of Bunch et al. (2021) propose an iterative algorithm based on the a small subspace of |
|
the finite metric space M _[′]_ _⊂_ _M_ . Suppose the magnitude vector of M _[′]_ is known, then, when adding |
|
a single extra point to M _[′], the computation of the updated magnitude vector reduces to a simple_ |
|
matrix-vector multiplication. Empirically, we find that even for MNIST images this computation is |
|
considerably slower than the naive implementation due to the large number of steps in the algorithm. |
|
Furthermore, the full distance matrix of the image needs to be stored, which can quickly exceed |
|
memory. |
|
|
|
When considering the metric space of images, we also have one fundamental advantage over methods, |
|
namely a grid structure. While every pixel in an image has a weighting, every weighting of a pixel |
|
can also be associated to a position on a two-dimensional grid. Any algorithm which speeds up the |
|
computation of weighting vectors should also implicitly consider this additional structure. Therefore, |
|
we propose a divide-and-conquer procedure to be able to handle large images efficiently. |
|
|
|
According to the theoretical considerations of Subsection 2.2, the magnitude vector is constant in |
|
the interior of the unit square and also constant along its edges and corners. From this analysis we |
|
gain one key insight, namely creating patches from the image would result in different magnitude |
|
vector values along the edges, but all other magnitude vector elements would remain the same. To |
|
counter the resulting “edge effects” of the patches, we can choose overlapping patches, calculate the |
|
magnitude vector of the patches, and crop the transformed sub-images. Finally, piecing the patches |
|
together, we obtain an approximation of the full magnitude vector. As shown in Subsection 3.1, this |
|
results in considerable speed up with manageable error. Furthermore, it allows us to even calculate |
|
the magnitude vector of high-resolution images. |
|
|
|
If even faster inference of the magnitude vector is required, we can view the calculation as a special |
|
type of image-to-image translation problem and use current techniques available. |
|
|
|
3 EXPERIMENTS |
|
|
|
Given a theoretically-based algorithm to efficiently calculate the magnitude vector on large images, |
|
we can investigate its effect empirically. We turn now to our experiments which begin first with an |
|
analysis of the runtime using the speed up algorithm. Armed with a fast calculation, we are then |
|
free to explore potential applications of the magnitude vector in machine learning applications. We |
|
investigate the utility of magnitude vectors in two domains which magnitude was not originally |
|
designed for, namely in edge detection and adversarial robustness, and find that it can prove useful. |
|
|
|
|
|
----- |
|
|
|
1750 0.30 0.07 |
|
|
|
1500 0.06 |
|
|
|
0.25 |
|
|
|
1250 0.05 |
|
|
|
0.20 |
|
|
|
1000 0.04 |
|
|
|
0.15 |
|
|
|
750 0.03 |
|
|
|
500 Maximum error0.10 Frobenius error0.02 |
|
|
|
Computation time [s] |
|
|
|
250 0.05 0.01 |
|
|
|
0 0.00 0.00 |
|
|
|
1 2 4 8 16 32 64 128 256 1 2 4 8 16 32 64 128 256 1 2 4 8 16 32 64 128 256 |
|
|
|
Number of patches per dimension Number of patches per dimension Number of patches per dimension |
|
|
|
|
|
(a) |
|
|
|
|
|
(b) |
|
|
|
|
|
(c) |
|
|
|
|
|
Figure 2: (a) The computation time in seconds of the magnitude vector calculation for each of the |
|
ten images. Even a small number of patches the computational time deceases drastically. (b) and |
|
(c) show box plots of the approximation errors. Although the maximum error can be quite large |
|
compared to the maximum difference of 1, the Fröbenius error remains small, indicating a good |
|
overall approximation. |
|
|
|
3.1 FAST MAGNITUDE CALCULATION AND HIGH-RESOLUTION IMAGES |
|
|
|
To illustrate the benefits of the patching method, we consider the first ten images of the NIH Chest |
|
X-ray dataset (Wang et al., 2017). The dataset consists of 100,000 de-identified X-ray (grey scale) |
|
images with a resolution of 1024 × 1024. Hence, a full-scale magnitude calculation would involve |
|
inverting a roughly 10[6] _× 10[6]_ matrix which is far beyond current computational power. |
|
|
|
To estimate the error induced by the patching algorithm, we rescale the images to a 256 × 256 |
|
resolution and calculate the magnitude via matrix inversion. Then, we divide the rescaled image into |
|
patches and repeat the calculation. We used the a ℓ1 norm and weights of 1 for both the grid points |
|
and intensity values. To ensure comparability we min-max scaled the magnitude vectors. The error is |
|
measured in maximum absolute deviation on a pixel level and also in terms of Fröbenius distance |
|
between the two magnitude vectors |
|
|
|
|
|
_i[(][magnitude vector]exact_ patches[)][2] |
|
|
|
_[−]_ [magnitude vector] _._ (6) |
|
|
|
_i[(][magnitude vector]exact[)][2]_ |
|
|
|
P |
|
|
|
|
|
error = |
|
|
|
|
|
The results can be found in Figure 2. We observe a significant reduction (several order of magnitudes) |
|
when using our fast magnitude calculation method, with limited reconstruction error. |
|
|
|
To illustrate the effect of the magnitude vector on full-scale images we used the patch-algorithm on |
|
the first image of the dataset (see Supplementary Figure 7). We also investigated effect of increasing |
|
the weight α2 of the pixel values in the product space metric and the results can be found in the |
|
Supplementary Material. |
|
|
|
3.2 MAGNITUDE VECTORS AND EDGE DETECTION |
|
|
|
Now that we have a computationally efficient method to calculate magnitude, we turn towards the |
|
first of two novel applications of magnitude: developing a proof-of-concept of the capabilities of |
|
magnitude for edge detection. Indeed, as shown in Section 2, the magnitude vector is expected to |
|
be higher on edges present in the images. This leads naturally to an edge detection algorithm where |
|
pixels whose magnitude vector values are larger than a threshold are classified as edges. |
|
|
|
To assess the performance of a magnitude-based approach on this task, we compare it with the famous |
|
Canny detector (Canny, 1986), which has been widely used for edge detection in images. Because |
|
the definition of an edge is a subjective concept, we use the edges given by the Canny detector as a |
|
ground truth and tune the hyper-parameters of our magnitude-based edge detector such as to match |
|
the Canny edges as close as possible. We use the Sørensen-Dice coefficient, D, to assess the similarity |
|
between both types of edges. Both algorithms can be seen as classifier on a pixel level operating |
|
on two classes ({ edge, no-edge}), thereby outputting a mask with same dimensions of the original |
|
image. The Sørensen-Dice coefficient between one predicted edge mask (Edgehat) and the reference |
|
edge mask (Edgeref ) then writes: |
|
|
|
|
|
----- |
|
|
|
Table 1: Sørensen-Dice coefficient on held-out test of the magnitude-based method with respect to |
|
the Canny method. |
|
|
|
FASHIONMNIST CIFAR X-ray |
|
|
|
0.59±0.12 0.61±0.07 0.33±0.07 |
|
|
|
2TP |
|
(Edgehat, Edgeref ) = |
|
_D_ 2TP + FP + FN |
|
|
|
where TP, FP and FN stands for the number True Positive, False Positive and False Negative pixels |
|
predictions respectively. |
|
|
|
We then use Bayesian optimization of the Sørensen-Dice coefficient (Sorensen, 1948; Dice, 1945) on |
|
a training set to find the best hyperparameters of the magnitude-based edge detector. In particular, |
|
we tune the threshold at which magnitude vectors predict for an edge and the α2 from Equation |
|
1. We then report the metric on an hold-out test set, and the results are shown in Table 1. We ran |
|
this experiment on three different datasets : FashionMNIST, CIFAR and an X-ray image dataset. |
|
For the X-ray dataset, we use the patched version as described in Section 2.3. On Figure 3, we |
|
show examples from the held-out test set on the Fashion MNIST dataset. In Appendix A, we show |
|
examples from the two other datasets. A visual inspection shows that edges can be recovered quite |
|
accurately. However, the edges masks obtained through magnitude appear more noisy. |
|
|
|
Original Canny Detector Magnitude |
|
|
|
Figure 3: Examples of edge detection on the FashionMNIST dataset. The left-most column show |
|
the initial images, the center shows the edges mask obtained with the Canny detector, the right-most |
|
column shows the edges mask obtained with the magnitude-based approach. |
|
|
|
3.3 MAGNITUDE VECTORS AND ADVERSARIAL ROBUSTNESS |
|
|
|
As we saw in Subsection 3.2 and Supplementary Figure 6, the magnitude calculation transforms an |
|
original image into a representation of its edges. We now want to explore another facet of its utility |
|
in machine learning by investigating its potential use for improving adversarial robustness. In the |
|
following, the datasets we use are MNIST, KMNIST and FashionMNIST. |
|
|
|
The magnitude layer is a zero-parameter feature transformation consisting, sequentially, in |
|
|
|
|
|
----- |
|
|
|
1. the computation of the distance matrix, |
|
|
|
2. the exponentiation of the negative distance matrix element-wise, |
|
|
|
3. the inversion the resulting matrix, |
|
|
|
4. the summing of the resulting matrix over its rows. |
|
|
|
Crucially, this layer can be used as an input layer to any image-classification network. While the |
|
transformation is still, in principle, differentiable, we are presented with numerous opportunities to |
|
introduce a step function into the calculation. One possibility is inspired by feature squeezing (Xu |
|
et al., 2017), which quantizes the pixel values. In the magnitude calculation we can differentiably |
|
quantize the values of the inverse similarity matrix just before the summation in step 4. This step |
|
function ensures that the gradients with respect to the input data are zero and, therefore, any white-box |
|
attacks which rely on this gradient (such as FGSM, PGD, Carlini-Wagner) must fail. |
|
|
|
While the introduction of the step function guarantees trivial “robustness”, we also investigate the |
|
efficacy of transferred adversarial examples. To this end, we train two LeNet models (LeCun et al., |
|
1989) for 30 epochs each on our datasets, one with a magnitude input layer, one without (the base |
|
model). We then generate adversarial examples w.r.t. the base model and test them on the model |
|
with magnitude. Our results for an FGSM attack and various quantization levels are summarised |
|
in Table 2. We note that, although our model shows the best performance in terms of adversarial |
|
evasion, we emphasise that these results are not directly comparable. While simple FGSM attacks |
|
fail altogether on the squeezed magnitude layer due to the step function, we calculated robustness via |
|
a transfer of adversarials from a conventional LeNet model. The feature squeezing setup is similar |
|
to the original setup from Xu et al. (2017), however, we used our own quantization layer which |
|
divides the features into equally spaced levels and the models were trained on the squeezed images. |
|
The base model is a simple undefended LeNet. We also investigate the robustness properties of a |
|
non-quantized magnitude input layer which ℓ1 norm, which shows substantially increased robustness |
|
over the baselines of the undefended model and for large ϵ feature squeezing. |
|
|
|
The number of levels in the feature squeezing was chosen to be small due to the fact that there are |
|
large brightness in the datasets we have mainly white objects on black background and, therefore, |
|
a large compression would be expected to give the largest robustness. The levels of the magnitude |
|
vector were chosen to be larger as the models performed worse with smaller levels. |
|
|
|
|Col1|ϵ|Levels|Col4|Col5|Base|Squeeze Levels|Col8|Col9|Standalone ℓ1| |
|
|---|---|---|---|---|---|---|---|---|---| |
|
|||50|100|1000||2|3|5|| |
|
|MNIST|0 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7|97.79 71.90 54.50 54.95 54.84 46.14 28.00 25.66 23.56|98.05 72.88 65.41 67.24 68.38 61.56 43.87 41.77 39.12|97.63 68.23 62.02 60.65 60.39 54.57 39.34 38.19 35.93|99.01 94.78 81.57 42.10 21.76 14.37 11.95 11.72 12.95|98.57 100.0 100.0 100.0 100.0 100.0 18.43 6.26 6.26|98.68 100.0 100.0 100.0 3.47 3.47 3.47 3.47 3.47|98.73 100.0 100.0 32.20 32.20 2.53 2.53 2.53 2.24|98.04 90.78 81.36 65.68 51.55 39.77 33.65 28.82 21.36| |
|
|KMNIST|0 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7|81.23 53.91 32.52 27.05 27.94 24.60 16.61 16.52 16.24|83.75 64.27 42.15 34.93 36.94 31.7 22.27 21.74 20.94|80.26 53.25 30.29 26.14 28.20 24.77 16.43 16.51 16.39|88.06 78.34 55.54 18.69 6.56 3.97 3.68 3.85 4.03|87.60 100.0 100.0 100.0 100.0 100.0 19.42 5.31 5.31|87.84 100.0 100.0 100.0 3.89 3.89 3.89 3.89 3.89|88.25 100.0 100.0 19.88 19.88 2.31 2.31 2.31 1.90|83.26 70.77 55.85 41.64 35.44 31.22 28.74 25.26 20.85| |
|
|FashionMNIST|0 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7|81.45 77.42 56.85 48.86 44.13 38.15 27.30 28.39 24.78|82.49 75.53 61.48 53.89 48.83 42.66 33.13 31.28 29.53|80.46 74.67 66.17 60.34 55.95 51.77 43.29 42.00 39.88|89.0 24.79 10.70 5.10 1.82 1.21 1.42 1.55 1.72|82.82 100.0 100.0 100.0 100.0 100.0 23.26 4.47 4.47|82.88 100.0 100.0 100.0 3.37 3.37 3.37 3.37 3.37|82.82 100.0 100.0 23.68 23.68 4.76 4.76 4.76 2.54|86.02 55.43 43.39 35.24 30.05 24.81 20.45 16.22 13.54| |
|
|
|
|
|
|
|
Table 2: The accuracies of the attacked networks. From left to right: quantized magnitude layer with |
|
different quantization levels, the undefended model, the model with feature squeezing and various |
|
levels, and the magnitude layer as an input layer. The zero epsilon values are the base accuries. For |
|
the non-zero epsilon, the accuracy is the ratio of non-adversarial examples to base accuracies. |
|
|
|
|
|
----- |
|
|
|
4 RELATED WORK |
|
|
|
An early version of magnitude has been introduced by Solow & Polasky (1994) where it was defined |
|
as an “effective number of species”, however with little mathematical development. The subject |
|
has been picked up formally by Leinster (2010) where connection to theory of enriched categories |
|
has been made. From there, the magnitude of many mathematical objects has been studied, such as |
|
spheres (Willerton, 2014), odd balls (Willerton, 2018; 2017), compact sets (Meckes, 2015; Leinster |
|
& Willerton, 2013), convex bodies (Meckes, 2020), and graphs (Leinster, 2019). |
|
|
|
As stated in the introduction, magnitude and magnitude vectors have not been studied from a machine |
|
learning perspective except for the work of Bunch et al. (2021) that analysed the capability of |
|
magnitude vectors for boundary detection. In contrast, our work extends the results from this study to |
|
images and provide the necessary theoretical grounds. To the best of our knowledge, this represents |
|
the first work exploring magnitude vectors for images. In particular, using magnitude vectors for |
|
adversarial robustness has never been explored before. |
|
|
|
5 DISCUSSION |
|
|
|
This paper investigated the magnitude vector of images. In the first part, we stated several theoretical |
|
properties of the magnitude vector on images and showed how the outlier detection property translates |
|
to edge detection in images. Although the theoretical part is based on analogies, rather than rigorous |
|
mathematical proofs, we can still distil the essence of the behaviour of the magnitude vector. In |
|
particular, these theoretical insights are confirmed by practical experiments. We also propose an |
|
algorithm for an approximate calculation of the magnitude vector leading to a significant speed up of |
|
computations. While this works well in practice, as shown in Subsection 3.1, the exact computation |
|
of magnitude vector on a metric space with a large number of points still remains an open problem. |
|
|
|
Our algorithm to speed up the computation allows magnitude to be used more generally as the |
|
computational bottleneck of its naive computation can be bypassed. Investigating the capabilities of |
|
magnitude vectors in machine learning applications, we studied two application areas in more depth, |
|
namely edge detection and adversarial robustness, showing promising and overall favourable results |
|
in both cases. |
|
|
|
As for edge detection, we compared a magnitude-based approached with the well known Canny edge |
|
detector. We show that the overlap between the edges found by the Canny detector and the magnitude |
|
vector is noteworthy, though the magnitude vector can appear a bit noisier. In particular, for low |
|
resolution colour images (e.g. CIFAR-10), the edges are very noisy and a thorough investigation |
|
for the magnitude vector on colour images should be conducted. This is nonetheless impressive, |
|
since the magnitude vector is a general-purpose image transformation, unlike the Canny detector, and |
|
therefore shows an exciting potential for improving exisiting edge detection tools. |
|
|
|
We also considered a potential application of the magnitude vector as a simple mean of defence |
|
against adversarial attacks. We first noted that a step function such as a quantization function can |
|
be introduced into the magnitude calculation, providing a trivial but effective defence against any |
|
white box attack. We went further by investigating the property of transferred adversarial examples. |
|
Our model shows reasonable robustness and performs better than the baseline method of feature |
|
squeezing. With these encouraging results we aim to investigate the use of magnitude in adversarial |
|
evasion and detection in future work. |
|
|
|
REPRODUCIBILITY STATEMENT |
|
|
|
We provide all code to create figures and results in this paper in the supplementary material. All |
|
datasets are publicly available and downloadable via the code provided. |
|
|
|
REFERENCES |
|
|
|
Eric Bunch, Jeffery Kline, Daniel Dickinson, Suhaas Bhat, and Glenn Fung. Weighting vectors for |
|
machine learning: numerical harmonic analysis applied to boundary detection. arXiv preprint |
|
_arXiv:2106.00827, 2021._ |
|
|
|
|
|
----- |
|
|
|
John Canny. A computational approach to edge detection. IEEE Transactions on pattern analysis |
|
_and machine intelligence, (6):679–698, 1986._ |
|
|
|
Lee R Dice. Measures of the amount of ecologic association between species. Ecology, 26(3): |
|
297–302, 1945. |
|
|
|
Oleksiy Dovgoshey and Olli Martio. Products of metric spaces, covering numbers, packing numbers |
|
and characterizations of ultrametric spaces. arXiv preprint arXiv:0903.1526, 2009. |
|
|
|
Gerald B Folland. Real analysis: modern techniques and their applications, volume 40. John Wiley |
|
& Sons, 1999. |
|
|
|
Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne |
|
Hubbard, and Lawrence D Jackel. Backpropagation applied to handwritten zip code recognition. |
|
_Neural computation, 1(4):541–551, 1989._ |
|
|
|
Tom Leinster. The magnitude of metric spaces. arXiv preprint arXiv:1012.5857, 2010. |
|
|
|
Tom Leinster. The magnitude of a graph. In Mathematical Proceedings of the Cambridge Philosoph_ical Society, volume 166, pp. 247–264. Cambridge University Press, 2019._ |
|
|
|
Tom Leinster and Simon Willerton. On the asymptotic magnitude of subsets of euclidean space. |
|
_Geometriae Dedicata, 164(1):287–310, 2013._ |
|
|
|
Tom Leinster, Mark W Meckes, and Nicola Gigli. The magnitude of a metric space: from category |
|
theory to geometric measure theory. pp. 156–193, 2017. |
|
|
|
Mark W Meckes. Positive definite metric spaces. Positivity, 17(3):733–757, 2013. |
|
|
|
Mark W Meckes. Magnitude, diversity, capacities, and dimensions of metric spaces. Potential |
|
_Analysis, 42(2):549–572, 2015._ |
|
|
|
Mark W Meckes. On the magnitude and intrinsic volumes of a convex body in euclidean space. |
|
_Mathematika, 66(2):343–355, 2020._ |
|
|
|
Andrew R Solow and Stephen Polasky. Measuring biological diversity. Environmental and Ecological |
|
_Statistics, 1(2):95–103, 1994._ |
|
|
|
Th A Sorensen. A method of establishing groups of equal amplitude in plant sociology based on |
|
similarity of species content and its application to analyses of the vegetation on danish commons. |
|
_Biol. Skar., 5:1–34, 1948._ |
|
|
|
Xiaosong Wang, Yifan Peng, Le Lu, Zhiyong Lu, Mohammadhadi Bagheri, and Ronald M Summers. |
|
Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proceedings of the IEEE conference on |
|
_computer vision and pattern recognition, pp. 2097–2106, 2017._ |
|
|
|
Simon Willerton. On the magnitude of spheres, surfaces and other homogeneous spaces. Geometriae |
|
_Dedicata, 168(1):291–310, 2014._ |
|
|
|
Simon Willerton. The magnitude of odd balls via hankel determinants of reverse bessel polynomials. |
|
_arXiv preprint arXiv:1708.03227, 2017._ |
|
|
|
Simon Willerton. On the magnitude of odd balls via potential functions. _arXiv preprint_ |
|
_arXiv:1804.02174, 2018._ |
|
|
|
Weilin Xu, David Evans, and Yanjun Qi. Feature squeezing: Detecting adversarial examples in deep |
|
neural networks. arXiv preprint arXiv:1704.01155, 2017. |
|
|
|
A SUPPLEMENTARY ILLUSTRATIONS OF THE EDGE DETECTION CAPABILITIES |
|
OF THE MAGNITUDE VECTORS. |
|
|
|
On Figures 4 and 5, we show examples of edge detection from the test sets of the CIFAR and X-ray |
|
datasets respectively. |
|
|
|
|
|
----- |
|
|
|
Original Canny Detector Magnitude |
|
|
|
Figure 4: Three randomly selected examples of edge detection on the CIFAR dataset. Left left-most |
|
column show the initial images, the center shows the edges mask obtained with the Canny detector, |
|
the right-most column shows the edges mask obtained with the magnitude-based approach. |
|
|
|
Original Canny Detector Magnitude |
|
|
|
Figure 5: Two randomly selected examples of edge detection on the X-ray dataset. Left left-most |
|
column show the initial images, the center shows the edges mask obtained with the Canny detector, |
|
the right-most column shows the edges mask obtained with the magnitude-based approach. |
|
|
|
B MAGNITUDE VECTORS WITH DIFFERENT METRICS |
|
|
|
|
|
----- |
|
|
|
_p = 1, p[′]_ = 1 _p = 2, p[′]_ = 1 |
|
|
|
|
|
Original Hamming _p = 1_ _p = 2_ Hamming, |
|
_p[′]_ = 1 |
|
|
|
|
|
Figure 6: A comparison of the effect of different metrics in the magnitude calculation on KMNIST. |
|
The first column is the original image and the following three columns are the magnitude vectors |
|
with different ℓp metrics. The last three columns use a product space metric with α1 = α = 2 = 1, |
|
grid metric ℓp and brightness metric ℓp′ . |
|
|
|
C THE MAGNITUDE OF FULL SCALE IMAGES |
|
|
|
|
|
0 1.0 0 1.0 0 1.0 0 1.0 0 1.0 |
|
|
|
200 0.8 200 0.8 200 0.8 200 0.8 200 0.8 |
|
|
|
400 0.6 400 0.6 400 0.6 400 0.6 400 0.6 |
|
|
|
600 0.4 600 0.4 600 0.4 600 0.4 600 0.4 |
|
|
|
800 0.2 800 0.2 800 0.2 800 0.2 800 0.2 |
|
|
|
1000 0 200 400 600 800 1000 0.0 1000 0 200 400 600 800 1000 0.0 1000 0 200 400 600 800 1000 0.0 1000 0 200 400 600 800 1000 0.0 1000 0 200 400 600 800 1000 0.0 |
|
|
|
|
|
Original |
|
|
|
|
|
_α2 = 1_ |
|
|
|
|
|
_α2 = 100_ |
|
|
|
|
|
_α2 = 200_ |
|
|
|
|
|
_α2 = 300_ |
|
|
|
|
|
Figure 7: An illustration of the effect of α2 on the magnitude calculation of full-scale images. |
|
|
|
COMPARISON OF THE FILTRATION AND THE FULL MAGNITUDE |
|
|
|
|
|
----- |
|
|
|
Original Full magnitude Thresholds: {0, 1} Thresholds: |
|
_{0, 0.5, 1}_ |
|
|
|
|
|
Thresholds: |
|
_{0, 0.1, . . ., 1}_ |
|
|
|
|
|
Figure 8: An illustration of the magnitude based on filtrations with metric ℓ1. |
|
|
|
|
|
----- |
|
|
|
|