Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
www.pnas.org/cgi/doi/10.1073/pnas. 115
Supplementary Information for
History of art paintings through the lens of entropy and complexity
Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro
To whom correspondence should be addressed. E-mail: [email protected] or [email protected]
This PDF file includes:
Supplementary textFigs. S1 to S11
Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro 1 of 12
1800083
Supporting Information Text
0.80 0.84 0.88 0.92 0.96
Entropy, H
0.08
0.10
0.12
0.14
Co
mp
lexity
, C
1031-1570
1570-1760
1760-1836
1836-1869
1869-18801895-1902
1902-1909
1939-1952
1952-1962
1962-1970
1970-1980
1980-1994
1994-2016
0.80 0.84 0.88 0.92 0.96
Entropy, H
1031-1570
1570-1760
1760-1836
1836-1869
1869-18801895-1902
1902-1909
1939-1952
1952-1962
1962-1970
1970-1980
1980-1994
1994-2016
Fig. S1. Robustness of the evolution trends against sampling. Each gray curve corresponds to the average values of H and C obtained by randomly sampling 30% (leftpanel) and 10% (right panel) of the images in the dataset. A total of 100 different realizations of the sampling procedure were made. The black curves depict the average trendobtained with the full dataset. We observe that the historical trends displayed by the average values of H and C are robust against sampling, even when using only 10% ofimages.
2 of 12 Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro
Fig. S2. The relationship between the values of H and C, calculated by means of the average RGB channels and by means of the gray-scale luminancetransformation. Each dot in the scatter plots shows the values of H and C for each image, as obtained through the average values of the three color shades of each pixel,and through the gray-scale luminance transformation. We observe that both transformations yield strongly correlated values of H and C.
Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro 3 of 12
0 1000 2000 3000 4000Image dimensions
10 6
10 5
10 4
10 3
Prob
abilit
y dist
ribut
ion
width = 895 [313, 2491]95%height = 913 [323, 2702]95%
Image widthImage height
Fig. S3. Probability distribution of image dimensions. The red and blue curves show the probability distributions of the widths and heights of all images in our dataset on alog-linear plot. It can be observed that the width and height have a similar distribution and practically the same average value (895 pixels for width and 913 pixels for height).The shaded regions represent the intervals of width and height containing 95% of all images.
4 of 12 Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro
Fig. S4. Complexity measures H and C are uncorrelated with image dimensions. The scatter plots depict the values of H (left panels) and C (right panels) versus theimage length defined as the square root of the image area (that is,√nxny , where nx is the image width and ny is the image height). The first row shows the relationship ona linear scale, the second on a linear-log scale, and the third row on a log-log scale. Each dot represents an image in our dataset. We observe no correlations between thecomplexity measures and image length. In particular, the Pearson linear correlation is≈ 0.05 for the relationship between the image length and H, and≈ 0.01 for C. Also,no significant correlation is detected by the maximal information coefficient (MIC), whose values are≈ 0.07 for both relationships. This analysis indicates that our resultsobtained with embedding dimensions dx = dy = 2 are not biased by image dimensions.
Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro 5 of 12
102 103 104
Total by style
ImpressionismRealism
RomanticismExpressionism
Post-ImpressionismArt Nouveau (Modern)
SurrealismBaroque
SymbolismAbstract ExpressionismNaïve Art (Primitivism)
NeoclassicismCubismRococo
Northern RenaissanceMinimalismArt InformelAbstract Art
Color Field PaintingPop ArtUkiyo-e
Mannerism (Late Renaissance)Early RenaissanceHigh Renaissance
Magic RealismConceptual Art
AcademicismNeo-Expressionism
Op ArtLyrical Abstraction
Art DecoContemporary Realism
ConcretismFauvism
Nouveau Réalisme (New Realism)Neo-Romanticism
Hard Edge PaintingPost-Minimalism
TachismeInk and wash painting
PointillismS saku hangaSocial Realism
NaturalismConstructivism
Shin-hangaLuminism
DadaOrientalismDivisionism
RegionalismNeo-Dada
Fantastic RealismArt Brut
PrecisionismFuturism
American RealismProto Renaissance
Light and SpaceSocialist Realism
Post-Painterly AbstractionFeminist Art
OrphismNeo-Minimalism
ClassicismKinetic Art
Neo-Pop ArtStreet art
TenebrismPictorialism
International GothicPhotorealism
TonalismSuprematism
Metaphysical artNew European Painting
CloisonnismCubo-FuturismNeoplasticism
KitschPurism
MuralismSpatialism
Neo-baroqueBiedermeier
ZenNeo-Geo
P&D (Pattern and Decoration)Intimism
Action paintingByzantine
Neo-Rococo
Fig. S5. Image distribution among different artistic styles in our dataset. The barplot shows the number of images for all the 92 different styles that have at least 100images each.
6 of 12 Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro
0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
Entropy, H
0.05
0.10
0.15
0.20
0.25
Co
mp
lexity
,C
Art Deco
Color Field Painting
Conceptual Art
Concretism
Constructivism
Hard Edge Painting
Kinetic Art
Light and Space
Minimalism
Naturalism
Neo-Dada
Neo-Geo
Neo-Minimalism
Neo-Pop Art
Neoplasticism
Op Art
Post-MinimalismPost-Painterly Abstraction
Spatialism
Tenebrism
0.84 0.85 0.86 0.87 0.88 0.89 0.90
Entropy, H
0.10
0.11
0.12
0.13
0.14
Co
mp
lexity
,C
Abstract Art
Abstract Expressionism
Art Brut
Art Informel
Art Nouveau (Modern)
Classicism
Contemporary Realism
Dada
Feminist Art
Ink and wash
painting
Kitsch
Lyrical Abstraction
Neo-Expressionism
Neo-Rococo
New European Painting
Orphism
Pop ArtPrecisionism
Purism
Regionalism
Rococo
Suprematism
S saku hanga
Zen
0.900 0.905 0.910 0.915 0.920 0.925 0.930
Entropy, H
0.080
0.085
0.090
0.095
0.100
0.105
0.110
0.115
Co
mp
lexity
,C Academicism
Action painting
American Realism
Baroque
Cubism
Early Renaissance
Expressionism
Fantastic Realism
Futurism
High Renaissance
International Gothic
Magic Realism
Mannerism (Late Renaissance)
Metaphysical art
Naïve Art (Primitivism)
Northern Renaissance
Nouveau Réalisme (New Realism)
Orientalism
Photorealism
Pictorialism
Proto Renaissance
Realism
Romanticism
Social Realism
Socialist RealismStreet art
Surrealism
Tachisme
Tonalism
0.930 0.935 0.940 0.945 0.950
Entropy, H
0.055
0.060
0.065
0.070
0.075
0.080
0.085
0.090
Co
mp
lexity
,C
Biedermeier
Byzantine
Cloisonnism
Cubo-Futurism
Divisionism
Fauvism
Impressionism
Intimism
Luminism
Muralism
Neo-Romanticism
Neo-baroque
Neoclassicism
P&D (Pattern and Decoration)
Pointillism
Post-Impressionism
Shin-hanga
Symbolism
Ukiyo-e
Fig. S6. Distinguishing among different artistic styles with the complexity-entropy plane. The colored dots represent the average values of H and C for every style inour dataset. Error bars represent the standard error of the mean. The insets highlight three different regions of the plane for better visualization. All 92 styles having at least 100images are shown in this plot.
Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro 7 of 12
Fig. S7. The average values of H and C are statistically significantly different among most styles. The matrix plot shows the outcome of the bootstrap two-samplet-test that compares the differences between the average values of H and C among all possible pairs of styles. We have also considered the Bonferroni correction in order toaccount for the multiple hypothesis testing. The yellow cells indicate pairwise comparisons where the null hypothesis is rejected at 95% confidence (that is, there is a significantdifference between the values of H and/or C between the two styles), while the purple cells indicate pairwise comparisons where the null hypothesis cannot be rejected (that is,no significant difference between the values of H and/or C is observed between the two styles). We note that the null hypothesis is rejected in 91.7% of pairwise comparisons.
8 of 12 Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro
0.05 0.10 0.15 0.20
Distance threshold
0.35
0.40
0.45
0.50
0.55
0.60
Silh
ou
ette
co
effic
ien
t
(0.03, 0.57)
Fig. S8. Silhouette coefficient of clusters obtained by cutting the dendrogram of Figure 3B at different distance thresholds. This coefficient quantifies the quality ofthe clustering analysis. Its value is between−1 to +1, and the higher the value, the better the match among styles within a cluster in comparison to the neighboring clusters.Thus, by finding the distance threshold that maximizes the silhouette coefficient, we are maximizing the quality of the clustering obtained from the dendrogram. It can beobserved that the silhouette coefficient has a maximum value (0.57) at the distance threshold of 0.03. We have thus used this value to cut the dendrogram and define thenumber of clusters in Figure 3B.
Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro 9 of 12
Ne
o-E
xp
ressio
nis
mN
eo
-Ro
ma
ntic
ism
Ne
o-D
ad
aN
ou
ve
au
Ré
alis
me
(N
ew
Re
alis
m)
Po
p A
rtK
itsch
Ph
oto
rea
lism
Pic
tori
alis
mL
igh
t a
nd
Sp
ace
Op
Art
Co
ncre
tism
Ne
op
lastic
ism
Ab
str
act
Art
Cu
bis
mO
rph
ism
Div
isio
nis
mP
oin
tillis
mF
au
vis
mIm
pre
ssio
nis
mP
ost-
Imp
ressio
nis
mC
on
ce
ptu
al A
rtM
inim
alis
mP
ost-
Min
ima
lism
P&
D (
Pa
tte
rn a
nd
De
co
ratio
n)
Fe
min
ist
Art
Kin
etic
Art
Byza
ntin
eIn
tern
atio
na
l Go
thic
Hig
h R
en
ais
sa
nce
Pro
to R
en
ais
sa
nce
Ba
roq
ue
Ne
o-b
aro
qu
eN
eo
cla
ssic
ism
Ea
rly R
en
ais
sa
nce
No
rth
ern
Re
na
issa
nce
Cla
ssic
ism
Na
tura
lism
Re
alis
mT
en
eb
rism
Lu
min
ism
Clo
iso
nn
ism
To
na
lism
Ne
o-M
inim
alis
mN
eo
-Po
p A
rtB
ied
erm
eie
rP
uri
sm
Ne
o-R
oco
co
Ro
co
co
Art
De
co
Art
No
uve
au
(M
od
ern
)R
eg
ion
alis
mM
ura
lism
Str
ee
t a
rtP
recis
ion
ism
Ha
rd E
dg
e P
ain
ting
Lyri
ca
l Ab
str
actio
nP
ost-
Pa
inte
rly A
bstr
actio
nA
bstr
act
Exp
ressio
nis
mC
olo
r F
ield
Pa
intin
gA
ctio
n p
ain
ting
Art
In
form
el
Ta
ch
ism
eS
ocia
l Re
alis
mS
ocia
list
Re
alis
mF
an
tastic
Re
alis
mC
on
tem
po
rary
Re
alis
mA
me
rica
n R
ea
lism
Ma
gic
Re
alis
mC
ub
o-F
utu
rism
Su
pre
ma
tism
Fu
turi
sm
Sp
atia
lism
Ne
w E
uro
pe
an
Pa
intin
gD
ad
aS
urr
ea
lism
Co
nstr
uctiv
ism
Exp
ressio
nis
mIn
timis
mR
om
an
ticis
mS
ym
bo
lism
Art
Bru
tN
eo
-Ge
oM
an
ne
rism
(L
ate
Re
na
issa
nce
)A
ca
de
mic
ism
Me
tap
hysic
al a
rtN
aïv
e A
rt (
Pri
miti
vis
m)
Ukiy
o-e
Sh
in-h
an
ga
Ssa
ku
ha
ng
aO
rie
nta
lism
Ink a
nd
wa
sh
pa
intin
gZ
en
Neo-ExpressionismNeo-RomanticismNeo-DadaNouveau Réalisme (New Realism)Pop ArtKitschPhotorealismPictorialismLight and SpaceOp ArtConcretismNeoplasticismAbstract ArtCubismOrphismDivisionismPointillismFauvismImpressionismPost-ImpressionismConceptual ArtMinimalismPost-MinimalismP&D (Pattern and Decoration)Feminist ArtKinetic ArtByzantineInternational GothicHigh RenaissanceProto RenaissanceBaroqueNeo-baroqueNeoclassicismEarly RenaissanceNorthern RenaissanceClassicismNaturalismRealismTenebrismLuminismCloisonnismTonalismNeo-MinimalismNeo-Pop ArtBiedermeierPurismNeo-RococoRococoArt DecoArt Nouveau (Modern)RegionalismMuralismStreet artPrecisionismHard Edge PaintingLyrical AbstractionPost-Painterly AbstractionAbstract ExpressionismColor Field PaintingAction paintingArt InformelTachismeSocial RealismSocialist RealismFantastic RealismContemporary RealismAmerican RealismMagic RealismCubo-FuturismSuprematismFuturismSpatialismNew European PaintingDadaSurrealismConstructivismExpressionismIntimismRomanticismSymbolismArt BrutNeo-GeoMannerism (Late Renaissance)AcademicismMetaphysical artNaïve Art (Primitivism)Ukiyo-eShin-hangaS saku hangaOrientalismInk and wash paintingZen
0.20.40.60.81.0
Dis
tan
ce
Fig. S9. Hierarchical organization of styles according to the keywords extracted from the corresponding Wikipedia pages. For each of the 92 different styles havingat least 100 images, we have obtained the textual content from its Wikipedia page. These texts were processed using the term frequency-inverse document frequency (TF-IDF)approach, and the top-100 keywords were obtained for each style. We thus define the “distance” between two styles as the inverse of 1 plus the number of shared keywordsbetween the two styles. Thus, styles having no common keywords are at the maximum “distance” of 1, while styles sharing several keywords are at a closer “distance”. Thematrix plot shows this “distance” matrix as well as the dendrogram. The colored labels indicate the clusters obtained by cutting the dendrogram at the threshold distancemaximizing the silhouette coefficient. This approach yields 24 different style clusters, which is considerably more than the 14 clusters obtained from the complexity-entropyplane (Figure 3). However, both groups of clusters share similarities, which can be quantified by the homogeneity h, completeness c, and the v-measure metrics. Perfecthomogeneity (h = 1) implies that all clusters obtained from the Wikipedia texts contain only styles belonging to the same clusters obtained from the complexity-entropy plane.On the other hand, perfect completeness (c = 1) implies that all styles belonging to the same cluster obtained from the complexity-entropy plane are grouped in the sameclusters obtained from the Wikipedia texts. The v-measure is the harmonic mean between h and c, that is, v = 2hc/(h+ c). Our results yields h = 0.49, c = 0.40, andv = 0.44. These values are significantly larger than those obtained from a null model where the number of shared keywords is randomly chosen from a uniform distributionbetween 0 and 100 (hrand = 0.42± 0.02, crand = 0.35± 0.01, and vrand = 0.38± 0.01 – average values over 100 independent realizations).
10 of 12 Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro
10 6 10 4 10 2 100
Neural network
0.14
0.15
0.16
0.17
0.18
Scor
es
Training scoreCross-validation score
0.2 0.4 0.6 0.8Training size
0.13
0.14
0.15
0.16
0.17
0.18
0.19
Scor
es
Training scoreCross-validation score
0 100 200 300 400 500Number of neighbors
0.1
0.2
0.3
0.4
0.5
Scor
es
Training scoreCross-validation score
0.2 0.4 0.6 0.8Training size
0.165
0.170
0.175
0.180
0.185
Scor
es
Training scoreCross-validation score
100 102 104 106
SVC
0.150
0.175
0.200
0.225
0.250
0.275
Scor
es
Training scoreCross-validation score
10 4 10 2 100
SVC C
0.10
0.15
0.20
0.25
0.30
0.35
Scor
es
Training scoreCross-validation score
0.2 0.4 0.6 0.8Training size
0.14
0.15
0.16
0.17
Scor
es
Training scoreCross-validation score
0 20 40 60 80 100Number of trees
0.2
0.4
0.6
0.8
1.0
Scor
es
Training scoreCross-validation score
0 10 20 30Max. depth
0.0
0.2
0.4
0.6
0.8
1.0
Scor
es
Training scoreCross-validation score
0.2 0.4 0.6 0.8Training size
0.170
0.175
0.180
0.185
0.190
0.195
Scor
es
Training scoreCross-validation score
Nearest Neighbors
Support Vector Machine (RBF)
Random Forest
Neural Network
A B
F G H
C D E
I J
Fig. S10. Training and cross-validation scores obtained with the four different machine learning algorithms used for predicting styles as a function of their mainparameters and the training size. Panels (A) and (B) show the results for the nearest neighbors algorithm. Panels (C), (D) and (E) show the scores for the random forestalgorithm. The main parameters, in this case, are the number of trees in the forest and the maximum depth of the trees (Max. depth). Panels (F), (G) and (H) show the scoresfor the support vector classification (SVC) with the radial basis function kernel (RBF). The parameter γ is associated with the width of the RBF, and C′ is the penalty parameter.Panels (I) and (J) show the results for a neural network algorithm. The parameter α is the so-called L2 penalty, and the number of neurons is equal to 100. The averageaccuracies reported in Figure 4 where obtained for = 400 neighbors; γ = 104 and C′ = 0.1; number of trees = 400 and Max. depth = 5; and α = 10−4. All algorithmsare implemented in the scikit-learn library, and the learning curves are estimated with the best tuning parameters of each learning model.
Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro 11 of 12
0.00
0.04
0.08
0.12
Accu
racy
Nea
rest
Nei
ghbo
rsR
ando
m F
ores
tS
uppo
rt V
ecto
r M
achi
ne (R
BF)
Neu
ral N
etw
ork
Dum
my
(Strat
ified
)D
umm
y (U
nifo
rm)
Fig. S11. Accuracies of learning methods when considering the 92 different artistic styles with more than 100 images each. Depicted is the comparison between thefour different statistical learning algorithms (nearest neighbors, random forest, support vector machine, and neural network), and the null accuracies obtained from two “dummy”classifiers. Error bars represent the standard error of the mean. The four classifiers have similar accuracies (≈13%) and significantly outperform the “dummy” classifiers. Theparameters of the learning methods are the same as those used in Figure 4.
12 of 12 Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro