Image Cluster Compression using PartitionedIterated Function Systems and efﬁcient InterImageSimilarity Features
Matthias Kramm
Technical University of MunichInstitute for Computer ScienceBoltzmannstr. 3D85748 GarchingEmail: kramm@in.tum.de
Abstract
—When dealing with large scale image archive systems, efﬁcient data compression is crucial for the economicstorage of data. Currently, most image compression algorithmsonly work on a perpicture basis — however most imagedatabases (both private and commercial) contain high redundancies between images, especially when a lot of images of the sameobjects, persons, locations, or made with the same camera, exist.In order to exploit those correlations, it’s desirable to apply imagecompression not only to individual images, but also to groups of images, in order to gain better compression rates by exploitinginterimage redundancies.This paper proposes to employ a multiimage fractal PartitionedIterated Function System (PIFS) for compressing image groupsand exploiting correlations between images. In order to partitionan image database into optimal groups to be compressed withthis algorithm, a number of metrics are derived based on thenormalized compression distance (NCD) of the PIFS algorithm.We compare a number of relational and hierarchical clusteringalgorithms based on the said metric. In particular, we show how areasonable good approximation of optimal image clusters can beobtained by an approximation of the NCD and nCut clustering.While the results in this paper are primarily derived from PIFS,they can also be leveraged against other compression algorithmsfor image groups.
I. I
NTRODUCTION
Extending image compression to multiple images has notattracted much research so far. The only exceptions are theareas of hyperspectral compression [1]–[3] and, of course,video compression [4], which both handle the special caseof compressing highly correlated images of exactly the samesize.Concerning generalized image group compression, we recently researched an algorithm which works by building aspecial eigenimage library for extracting principal componentbased similarities between images.While the algorithm presented in [5] is quite fast, andmanages to merge lowscale redundancy from multiple images,it fails to detect more global scale redundancies (in particular,similar image parts which are both translated and scaled),and also has the problem of becoming “saturated” quite fast(i.e., the more images in a group, the worse the additionalcompression rate of the individual images), which limits thesize of possible image groups.In this paper, we present a novel algorithm for image groups,which is based on PIFS compression [6], and thus managesto exploit several highlevel redundancies, in particular scaledimage parts.Compression of image sequences using PIFS was donepreviously (in the context of video compression) in [7], [8].However, in these papers, both the frames/images contributingto one compression group as well as the order of those imagesis predetermined by the video sequence. Furthermore, imagesneed to be of the same size, which can’t be assumed for mostrealworld image databases. Here, we specify a multiimagePIFS algorithm which works on images of arbitrary sizes, andalso allows to cluster image databases into groups so thatcompression of each group is optimized.The rest of this paper is organized as follows: We ﬁrst derivethe multiimage PIFS algorithm by generalizing the singleimage PIFS algorithm. We also describe a way to optimize saidalgorithm using DFT lookup tables. Afterwards, we take onthe problem of combining the “right” images into groups, byﬁrst describing efﬁcient ways to compute a distance functionbetween two images, and then, in the next session, comparinga number of clustering algorithms working on such a distance.The ﬁnal algorithm is evaluated by compression runs over aphoto database consisting of 3928 images.II. T
HE COMPRESSION ALGORITHM
PIFS algorithms work by adaptively splitting an image
I
into a number of nonoverlapping rectangular “range” blocks
R
1
...R
n
(using a quadtree algorithm with an error threshold
max
), and then mapping each range block
R
onto a “domain”block
D
(with
D
being selected from a number of rectangularoverlapping domain blocks
D
1
,...,D
m
from the same image)which is scaled to the dimensions of
R
by an afﬁne transform,resulting in a block
ˆ
D
, and is henceforth processed by acontrast scaling
c
and a luminance shift
l
:
R
xy
=
c
ˆ
D
xy
+
l
(1)
The contrast and luminance parameters can either be derivedusing a search operation from a number of discrete values
c
1
,...,c
N
and
l
1
,...,l
M
:
c
R,D
,l
R,D
=
argminc
i
,l
j
x,y
∈
dim
(
R
)
(
c
i
ˆ
D
xy
+
l
j
−
R
xy
)
2
They can also be calculated directly by linear regression:
c
R,D
=

R

ˆ
D
xy
R
xy
−
ˆ
D
xy
R
xy

R

ˆ
D
2
xy
−
(
ˆ
D
xy
)
2
(2)
l
R,D
=
1

R

(
R
xy
−
c
R,D
ˆ
D
xy
)
(3)with
=
x,y
∈
dim
(
R
)
.
The quadratic error between a range and its domain block mapping is, for both cases:
=
(
c
R,D
ˆ
D
xy
+
l
R,D
−
R
xy
)
2
which can also be written as
=
c
2
R,D
ˆ
D
2
xy
+

R

l
2
R,D
+
R
2
xy
−
2
l
R,D
R
xy
++2
c
R,D
l
R,D
ˆ
D
xy
−
2
c
R,D
ˆ
D
xy
R
xy
(In [9], [10], the idea was brought forth to use the transform
R
xy
=
c
(ˆ
D
xy
−
ˆ
D
xy
) +
R
xy
, so that only the contrastparameter
c
needs to be derived, which provides slightly betterimage quality if quantization is taken into account for thecalculation of
c
. In our method, we however use the linearregression model, for simplicity.)The domain block
D
to be mapped to the range block
R
needs to be searched in all available range blocks
D
I
from theimage
I
, by minimizing:
D
=
min
D
∈ D
I
(
c
R,D
ˆ
D
xy
+
l
R,D
−
R
xy
)
2
(4)In the proposed multiimage compression method, equation(4) is now extended to a group of images
I
:
D
=
min
D
∈ D
I
I
∈ I
(
c
R,D
ˆ
D
xy
+
l
R,D
−
R
xy
)
2
(5)Hence, domain blocks are collected from a number of images, and also images are allowed to crossreference each other(see Fig. 1). Decompression works by crosswise recursion, inwhich all images are decompressed simultaneously (see Fig.2).In our implementation, we assume that domain blocks arealways twice the size of range blocks, similar to the algorithmdescribed in [11].
Fig. 1.
Cross references between PIFS compressed images of an image group
Fig. 2.
Crosswise recursive decompression of two images.
Notice that e.g. in [12], methods which don’t require anysearching of domain blocks were devised — They do, however, base on the assumption that a given domain block isalways most similar to it’s immediate surroundings, a fact notextensible to multiimage compression.Search of the domain blocks in our algorithm is performedby preprocessing a number of relevant parameters, in particular
ˆ
D
xy
,
ˆ
D
2
xy
and
R
xy
and
R
2
xy
, so that only
R
xy
ˆ
D
xy
(6)needs to be calculated for each range and domain block combination.The calculation of (6) as well as the preprocessing of
ˆ
D
xy
,
ˆ
D
2
xy
,
R
xy
and
R
2
xy
can be done very efﬁciently by using the Fast Fourier Transforms, analogous tothe covariance method used in [13], which takes advantage of overlapping domain blocks:
R
xy
D
xy
can be calculated for all domain blocks
D
i
,...,D
m
simultaneously for a given
R
, by preprocessing
F
(
I
j
)
C
for all
I
j
∈ {
I
1
,I
2
,...,I
n
}
subsampled
1
by factor 2,and then ﬁltering all those images using the range block (
A
·
B
denotes elementwise multiplication):
M
=
Cov
(
I
j
,R
) =
F
−
1
(
F
(
I
j
)
C
·F
(
R
))
(7)In the array
M
,
M
(
u,v
)
will then contain
R
xy
ˆ
D
xy
forthe domain block
ˆ
D
at the position
2
u,
2
v
in the image to becompressed, so that the matching of a domain block againsta range block can be done in 10 ﬂops analogous to the singleimage variant presented in [13].The algorithm hence uses one preprocessing loop and twonested loops over all images:
Algorithm 1
MULTIIMAGEPIFS
(
I
1
,I
2
,...,I
n
)
1:
for
I
j
∈ {
I
1
,I
2
,...,I
n
}
do
2:
Scale
I
j
down by
2
, store result in
S
3:
Precalculate:
4:
F
j
=
F
(
S
)
C
5:
r
j,u,v
=
x<u,y<v
I
j,x,y
for all
u,v
6:
r
2
j,u,v
=
x<u,y<v
I
2
j,x,y
for all
u,v
7:
d
j,u,v
=
x<u,y<v
S
x,y
for all
u,v
8:
d
2
j,u,v
=
x<u,y<v
S
2
x,y
for all
u,v
9:
end for
10:
for
I
j
∈ {
I
1
,I
2
,...,I
n
}
do
11:
for
all range blocks
R
∈
I
j
do
12:
Calculate
K
=
F
(
R
))
13:
for
I
k
∈ {
I
1
,I
2
,...,I
n
}
do
14:
Calculate
M
=
F
−
1
(
F
k
·
K
)
15:
for
all domain blocks
D
k,m
∈
I
k
do
16:
Calculate
c,l
from
M,r
j
,r
2
j
,d
k
,d
2
k
17:
Quantize
c,l
18:
Calculate
k,m
from
c,l,M,r
j
,r
2
j
,d
k
,d
2
k
19:
end for
20:
end for
21:
Find smallest
from all
k,m
22:
if
>
max
then
23:
Split range block
R
24:
else
25:
Write out
k,m,c,l
26:
end if
27:
end for
28:
end for
III. I
MAGE SIMILARITY
Image databases typically consist of tens of thousands of images. The algorithm needs to compress all images in a groupas a whole
2
, and, more importantly, also needs to decompressthe whole group in order to retrieve a single image.
1
We assume a ﬁxed size for the Fourier transform, big enough to encompassall images
I
1
, . . . I
n
. When
F
is applied to an image or a block smaller thanthis size, it is zeroextended.
2
Images can be added to a compressed ﬁle by allowing the new imageto reference the already existing images, but not vice versa. Also, addingan image to a compressed ﬁle provides worse compression results comparedto adding the image to the initial set. It’s also slower, as all domain block information needs to be calculated again.
Hence, it’s desirable to split the input data into manageableclusters. Here, the opportunity presents itself to organize theclusters in a way that compression is optimized, i.e., thatrelevance is paid to the fact which images beneﬁt most fromeach other if placed into the same cluster.In order to partition the database in such a way, a metricspecifying a kind of compression distance between to imagesneed to be devised (so that the clustering algorithm will knowwhich images are “similar” and should be placed in the samegroup).Using the normalized compression distance (NCD) from[14], this can be expressed as
NCD
I
1
,I
2
=
C
I
1
,I
2
−
min
(
C
I
1
,C
I
2
)
max
(
C
I
1
,C
I
2
)
(8)with
C
I
1
,...,I
n
the compressed ﬁlesize of compressing images
I
1
,I
2
,...,I
n
together, and
C
I
k
the ﬁlesize of a compression run on just a single image.This metric can both be interpreted as similarity between
I
1
,I
2
as well as the quality of a cluster formed by
I
1
,I
2
. Alower value of
NCD
I
1
,I
2
denotes that
I
1
and
I
2
have a moreclose resemblance.It’s important to notice that, in our case, the NCD is notnecessarily a “true” metric (The PIFS compressor is not a“normal” compressor [14]). In particular,
NCD
I,I
= 0
if the PIFS algorithm considers only domain blocks larger thanregion blocks (As in our case
3
). This is due to the fact thatthe “second” image doesn’t contain any additional informationwhich improves the compression of the ﬁrst image (Thedomain blocks don’t differ from those already available fromthe ﬁrst image).This abnormal behaviour of the metric function disappears,however, once the images are at least slightly dissimilar(see Fig. 3), so this doesn’t present a problem in practicalapplications.We found that at least for some clustering algorithms, it’ssometimes more efﬁcient and produces better results if wework on a slightly simpler function, the function of preservedbytes:
b
+
I
1
,I
2
,...,I
n
=
C
I
1
+
C
I
2
+
...
+
C
I
n
−
C
I
1
,I
2
,...,I
n
(9)The function
b
+
can also be applied to image groups of more than two images, and describes the number of bytesthat were saved by combining the images
I
1
,I
2
,...,I
n
into acommon cluster, which is also a more intuitive way of deﬁninga similarity function. The higher the value of
b
+
I
1
,I
2
,...,I
n
, themore resemblance between
I
1
,I
2
,...,I
n
.Since during clustering, a huge number of images need tobe “compared”, it’s advisable to ﬁnd faster approximations to
3
It’s possible for region blocks and domain blocks to be of the same sizewith the algorithm still converging, as long as the mappings between imagesare never circular. This can be accomplished by disallowing mappings fromany image
I
j
to the images
I
j
, I
j
+1
, . . . , I
n
. Algorithm constructed usingthis kind of model bear a close resemblance to the motion compensationtechnique used in video compression.
Fig. 3.
NCD Image similarity based on PIFS an image is not similar to itself under the fractal NCD metric, if domain blocks are always larger than regionblocks.
the metrics (8) and (9). An obvious approach is to count thenumber of mappings spanning between images (i.e., where thedomain block is in a different image than the range block), asopposed to mappings which stay in the image (domain block and range block are in the same image) — also see Fig. 5.It’s also possible to, instead of counting the number of references, calculate the sum of
I
j
−
I
1
,...,I
j
−
1
,I
j
+1
,...,I
n
forall interimage references (with
I
j
being the smallest errorfor domain blocks out of image
I
j
, and
I
1
,...,I
j
−
1
,I
j
+1
,...,I
n
the smallest error for domain blocks out of all other images),a value which grows larger the more mapping error is reducedby introducing more images.This type of metric has the advantage that we don’t necessarily need to evaluate all range blocks, but can randomly pick a sample, and derive the metric only for that sample, thereforeoptimizing the speed the image comparison takes.However, it’s also a somewhat artiﬁcial approach, anddoesn’t relate too well to the PIFS compression properties— it doesn’t take into account the extra bits we need inorder to store (more) imagetoimage references, and it’s alsohard to model the error threshold behaviour, where a newquadtree node is introduced every time a range block can’tbe sufﬁciently encoded on a speciﬁc scale, causing the outputstream size to increase nonlinearly.Another option is to introduce an approximation to theNCD by subsampling the images in question by factor two, oreven four, and then calculating the NCD on the subsampledimages according to equation (8). This metric is a closerapproximation to the NCD than the other two metrics (seeFig. 4), and is also the metric which we chose for subsequentexperiments.
qqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqq
0.88 0.90 0.92 0.94 0.96 0.98 1.00 1.02
0 . 8 0 0 . 8 5 0 . 9 0 0 . 9 5 1 . 0 0 1 . 0 5
subscale NCD approximation
N C D
Fig. 4.
NCD compared with a NCD derived from images which were subsampled by factor two in both dimensions before compression. Each dot corresponds to an image pair, randomly selected from 128 example images.
While the subsampling and following domain block searches can be performed quite efﬁciently (especially whenusing the precalculations for each image introduced in the lastsection), we still have to do a compression run on each imagepair, which, especially for large databases, may take sometime.We hence also tried a different approach, and tested approximations of the “fractal similarity” using classical visualimage similarity features, like
tf/idf
ratios of local Gabor ﬁlterand luminance histograms. These are features which can beprecalculated for each individual image, and furthermore can