Supplementary Information for · Surrealism Baroque Symbolism Abstract Expressionism Naïve Art (Primitivism) Neoclassicism Cubism Rococo Northern Renaissance Minimalism Art Informel

www.pnas.org/cgi/doi/10.1073/pnas. 115

Supplementary Information for

History of art paintings through the lens of entropy and complexity

Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro

To whom correspondence should be addressed. E-mail: [email protected] or [email protected]

This PDF file includes:

Supplementary textFigs. S1 to S11

Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro 1 of 12

1800083

Supporting Information Text

0.80 0.84 0.88 0.92 0.96

Entropy, H

0.08

0.10

0.12

0.14

Co

mp

lexity

, C

1031-1570

1570-1760

1760-1836

1836-1869

1869-18801895-1902

1902-1909

1939-1952

1952-1962

1962-1970

1970-1980

1980-1994

1994-2016

0.80 0.84 0.88 0.92 0.96

Entropy, H

1031-1570

1570-1760

1760-1836

1836-1869

1869-18801895-1902

1902-1909

1939-1952

1952-1962

1962-1970

1970-1980

1980-1994

1994-2016

Fig. S1. Robustness of the evolution trends against sampling. Each gray curve corresponds to the average values of H and C obtained by randomly sampling 30% (leftpanel) and 10% (right panel) of the images in the dataset. A total of 100 different realizations of the sampling procedure were made. The black curves depict the average trendobtained with the full dataset. We observe that the historical trends displayed by the average values of H and C are robust against sampling, even when using only 10% ofimages.

2 of 12 Higor Y. D. Sigaki, Matjaž Perc, and Haroldo V. Ribeiro

Fig. S2. The relationship between the values of H and C, calculated by means of the average RGB channels and by means of the gray-scale luminancetransformation. Each dot in the scatter plots shows the values of H and C for each image, as obtained through the average values of the three color shades of each pixel,and through the gray-scale luminance transformation. We observe that both transformations yield strongly correlated values of H and C.


0 1000 2000 3000 4000Image dimensions

10 6

10 5

10 4

10 3

Prob

abilit

y dist

ribut

ion

width = 895 [313, 2491]95%height = 913 [323, 2702]95%

Image widthImage height

Fig. S3. Probability distribution of image dimensions. The red and blue curves show the probability distributions of the widths and heights of all images in our dataset on alog-linear plot. It can be observed that the width and height have a similar distribution and practically the same average value (895 pixels for width and 913 pixels for height).The shaded regions represent the intervals of width and height containing 95% of all images.


Fig. S4. Complexity measures H and C are uncorrelated with image dimensions. The scatter plots depict the values of H (left panels) and C (right panels) versus theimage length defined as the square root of the image area (that is,√nxny , where nx is the image width and ny is the image height). The first row shows the relationship ona linear scale, the second on a linear-log scale, and the third row on a log-log scale. Each dot represents an image in our dataset. We observe no correlations between thecomplexity measures and image length. In particular, the Pearson linear correlation is≈ 0.05 for the relationship between the image length and H, and≈ 0.01 for C. Also,no significant correlation is detected by the maximal information coefficient (MIC), whose values are≈ 0.07 for both relationships. This analysis indicates that our resultsobtained with embedding dimensions dx = dy = 2 are not biased by image dimensions.


102 103 104

Total by style

ImpressionismRealism

RomanticismExpressionism

Post-ImpressionismArt Nouveau (Modern)

SurrealismBaroque

SymbolismAbstract ExpressionismNaïve Art (Primitivism)

NeoclassicismCubismRococo

Northern RenaissanceMinimalismArt InformelAbstract Art

Color Field PaintingPop ArtUkiyo-e

Mannerism (Late Renaissance)Early RenaissanceHigh Renaissance

Magic RealismConceptual Art

AcademicismNeo-Expressionism

Op ArtLyrical Abstraction

Art DecoContemporary Realism

ConcretismFauvism

Nouveau Réalisme (New Realism)Neo-Romanticism

Hard Edge PaintingPost-Minimalism

TachismeInk and wash painting

PointillismS saku hangaSocial Realism

NaturalismConstructivism

Shin-hangaLuminism

DadaOrientalismDivisionism

RegionalismNeo-Dada

Fantastic RealismArt Brut

PrecisionismFuturism

American RealismProto Renaissance

Light and SpaceSocialist Realism

Post-Painterly AbstractionFeminist Art

OrphismNeo-Minimalism

ClassicismKinetic Art

Neo-Pop ArtStreet art

TenebrismPictorialism

International GothicPhotorealism

TonalismSuprematism

Metaphysical artNew European Painting

CloisonnismCubo-FuturismNeoplasticism

KitschPurism

MuralismSpatialism

Neo-baroqueBiedermeier

ZenNeo-Geo

P&D (Pattern and Decoration)Intimism

Action paintingByzantine

Neo-Rococo

Fig. S5. Image distribution among different artistic styles in our dataset. The barplot shows the number of images for all the 92 different styles that have at least 100images each.


0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

Entropy, H

0.05

0.10

0.15

0.20

0.25

Co

mp

lexity

,C

Art Deco

Color Field Painting

Conceptual Art

Concretism

Constructivism

Hard Edge Painting

Kinetic Art

Light and Space

Minimalism

Naturalism

Neo-Dada

Neo-Geo

Neo-Minimalism

Neo-Pop Art

Neoplasticism

Op Art

Post-MinimalismPost-Painterly Abstraction

Spatialism

Tenebrism

0.84 0.85 0.86 0.87 0.88 0.89 0.90

Entropy, H

0.10

0.11

0.12

0.13

0.14

Co

mp

lexity

,C

Abstract Art

Abstract Expressionism

Art Brut

Art Informel

Art Nouveau (Modern)

Classicism

Contemporary Realism

Dada

Feminist Art

Ink and wash

painting

Kitsch

Lyrical Abstraction

Neo-Expressionism

Neo-Rococo

New European Painting

Orphism

Pop ArtPrecisionism

Purism

Regionalism

Rococo

Suprematism

S saku hanga

Zen

0.900 0.905 0.910 0.915 0.920 0.925 0.930

Entropy, H

0.080

0.085

0.090

0.095

0.100

0.105

0.110

0.115

Co

mp

lexity

,C Academicism

Action painting

American Realism

Baroque

Cubism

Early Renaissance

Expressionism

Fantastic Realism

Futurism

High Renaissance

International Gothic

Magic Realism

Mannerism (Late Renaissance)

Metaphysical art

Naïve Art (Primitivism)

Northern Renaissance

Nouveau Réalisme (New Realism)

Orientalism

Photorealism

Pictorialism

Proto Renaissance

Realism

Romanticism

Social Realism

Socialist RealismStreet art

Surrealism

Tachisme

Tonalism

0.930 0.935 0.940 0.945 0.950

Entropy, H

0.055

0.060

0.065

0.070

0.075

0.080

0.085

0.090

Co

mp

lexity

,C

Biedermeier

Byzantine

Cloisonnism

Cubo-Futurism

Divisionism

Fauvism

Impressionism

Intimism

Luminism

Muralism

Neo-Romanticism

Neo-baroque

Neoclassicism

P&D (Pattern and Decoration)

Pointillism

Post-Impressionism

Shin-hanga

Symbolism

Ukiyo-e

Fig. S6. Distinguishing among different artistic styles with the complexity-entropy plane. The colored dots represent the average values of H and C for every style inour dataset. Error bars represent the standard error of the mean. The insets highlight three different regions of the plane for better visualization. All 92 styles having at least 100images are shown in this plot.


Fig. S7. The average values of H and C are statistically significantly different among most styles. The matrix plot shows the outcome of the bootstrap two-samplet-test that compares the differences between the average values of H and C among all possible pairs of styles. We have also considered the Bonferroni correction in order toaccount for the multiple hypothesis testing. The yellow cells indicate pairwise comparisons where the null hypothesis is rejected at 95% confidence (that is, there is a significantdifference between the values of H and/or C between the two styles), while the purple cells indicate pairwise comparisons where the null hypothesis cannot be rejected (that is,no significant difference between the values of H and/or C is observed between the two styles). We note that the null hypothesis is rejected in 91.7% of pairwise comparisons.


0.05 0.10 0.15 0.20

Distance threshold

0.35

0.40

0.45

0.50

0.55

0.60

Silh

ou

ette

co

effic

ien

t

(0.03, 0.57)

Fig. S8. Silhouette coefficient of clusters obtained by cutting the dendrogram of Figure 3B at different distance thresholds. This coefficient quantifies the quality ofthe clustering analysis. Its value is between−1 to +1, and the higher the value, the better the match among styles within a cluster in comparison to the neighboring clusters.Thus, by finding the distance threshold that maximizes the silhouette coefficient, we are maximizing the quality of the clustering obtained from the dendrogram. It can beobserved that the silhouette coefficient has a maximum value (0.57) at the distance threshold of 0.03. We have thus used this value to cut the dendrogram and define thenumber of clusters in Figure 3B.


Ne

o-E

xp

ressio

nis

mN

eo

-Ro

ma

ntic

ism

Ne

o-D

ad

aN

ou

ve

au

Ré

alis

me

(N

ew

Re

alis

m)

Po

p A

rtK

itsch

Ph

oto

rea

lism

Pic

tori

alis

mL

igh

t a

nd

Sp

ace

Op

Art

Co

ncre

tism

Ne

op

lastic

ism

Ab

str

act

Art

Cu

bis

mO

rph

ism

Div

isio

nis

mP

oin

tillis

mF

au

vis

mIm

pre

ssio

nis

mP

ost-

Imp

ressio

nis

mC

on

ce

ptu

al A

rtM

inim

alis

mP

ost-

Min

ima

lism

P&

D (

Pa

tte

rn a

nd

De

co

ratio

n)

Fe

min

ist

Art

Kin

etic

Art

Byza

ntin

eIn

tern

atio

na

l Go

thic

Hig

h R

en

ais

sa

nce

Pro

to R

en

ais

sa

nce

Ba

roq

ue

Ne

o-b

aro

qu

eN

eo

cla

ssic

ism

Ea

rly R

en

ais

sa

nce

No

rth

ern

Re

na

issa

nce

Cla

ssic

ism

Na

tura

lism

Re

alis

mT

en

eb

rism

Lu

min

ism

Clo

iso

nn

ism

To

na

lism

Ne

o-M

inim

alis

mN

eo

-Po

p A

rtB

ied

erm

eie

rP

uri

sm

Ne

o-R

oco

co

Ro

co

co

Art

De

co

Art

No

uve

au

(M

od

ern

)R

eg

ion

alis

mM

ura

lism

Str

ee

t a

rtP

recis

ion

ism

Ha

rd E

dg

e P

ain

ting

Lyri

ca

l Ab

str

actio

nP

ost-

Pa

inte

rly A

bstr

actio

nA

bstr

act

Exp

ressio

nis

mC

olo

r F

ield

Pa

intin

gA

ctio

n p

ain

ting

Art

In

form

el

Ta

ch

ism

eS

ocia

l Re

alis

mS

ocia

list

Re

alis

mF

an

tastic

Re

alis

mC

on

tem

po

rary

Re

alis

mA

me

rica

n R

ea

lism

Ma

gic

Re

alis

mC

ub

o-F

utu

rism

Su

pre

ma

tism

Fu

turi

sm

Sp

atia

lism

Ne

w E

uro

pe

an

Pa

intin

gD

ad

aS

urr

ea

lism

Co

nstr

uctiv

ism

Exp

ressio

nis

mIn

timis

mR

om

an

ticis

mS

ym

bo

lism

Art

Bru

tN

eo

-Ge

oM

an

ne

rism

(L

ate

Re

na

issa

nce

)A

ca

de

mic

ism

Me

tap

hysic

al a

rtN

aïv

e A

rt (

Pri

miti

vis

m)

Ukiy

o-e

Sh

in-h

an

ga

Ssa

ku

ha

ng

aO

rie

nta

lism

Ink a

nd

wa

sh

pa

intin

gZ

en

Neo-ExpressionismNeo-RomanticismNeo-DadaNouveau Réalisme (New Realism)Pop ArtKitschPhotorealismPictorialismLight and SpaceOp ArtConcretismNeoplasticismAbstract ArtCubismOrphismDivisionismPointillismFauvismImpressionismPost-ImpressionismConceptual ArtMinimalismPost-MinimalismP&D (Pattern and Decoration)Feminist ArtKinetic ArtByzantineInternational GothicHigh RenaissanceProto RenaissanceBaroqueNeo-baroqueNeoclassicismEarly RenaissanceNorthern RenaissanceClassicismNaturalismRealismTenebrismLuminismCloisonnismTonalismNeo-MinimalismNeo-Pop ArtBiedermeierPurismNeo-RococoRococoArt DecoArt Nouveau (Modern)RegionalismMuralismStreet artPrecisionismHard Edge PaintingLyrical AbstractionPost-Painterly AbstractionAbstract ExpressionismColor Field PaintingAction paintingArt InformelTachismeSocial RealismSocialist RealismFantastic RealismContemporary RealismAmerican RealismMagic RealismCubo-FuturismSuprematismFuturismSpatialismNew European PaintingDadaSurrealismConstructivismExpressionismIntimismRomanticismSymbolismArt BrutNeo-GeoMannerism (Late Renaissance)AcademicismMetaphysical artNaïve Art (Primitivism)Ukiyo-eShin-hangaS saku hangaOrientalismInk and wash paintingZen

0.20.40.60.81.0

Dis

tan

ce

Fig. S9. Hierarchical organization of styles according to the keywords extracted from the corresponding Wikipedia pages. For each of the 92 different styles havingat least 100 images, we have obtained the textual content from its Wikipedia page. These texts were processed using the term frequency-inverse document frequency (TF-IDF)approach, and the top-100 keywords were obtained for each style. We thus define the “distance” between two styles as the inverse of 1 plus the number of shared keywordsbetween the two styles. Thus, styles having no common keywords are at the maximum “distance” of 1, while styles sharing several keywords are at a closer “distance”. Thematrix plot shows this “distance” matrix as well as the dendrogram. The colored labels indicate the clusters obtained by cutting the dendrogram at the threshold distancemaximizing the silhouette coefficient. This approach yields 24 different style clusters, which is considerably more than the 14 clusters obtained from the complexity-entropyplane (Figure 3). However, both groups of clusters share similarities, which can be quantified by the homogeneity h, completeness c, and the v-measure metrics. Perfecthomogeneity (h = 1) implies that all clusters obtained from the Wikipedia texts contain only styles belonging to the same clusters obtained from the complexity-entropy plane.On the other hand, perfect completeness (c = 1) implies that all styles belonging to the same cluster obtained from the complexity-entropy plane are grouped in the sameclusters obtained from the Wikipedia texts. The v-measure is the harmonic mean between h and c, that is, v = 2hc/(h+ c). Our results yields h = 0.49, c = 0.40, andv = 0.44. These values are significantly larger than those obtained from a null model where the number of shared keywords is randomly chosen from a uniform distributionbetween 0 and 100 (hrand = 0.42± 0.02, crand = 0.35± 0.01, and vrand = 0.38± 0.01 – average values over 100 independent realizations).


10 6 10 4 10 2 100

Neural network

0.14

0.15

0.16

0.17

0.18

Scor

es

Training scoreCross-validation score

0.2 0.4 0.6 0.8Training size

0.13

0.14

0.15

0.16

0.17

0.18

0.19

Scor

es


0 100 200 300 400 500Number of neighbors

0.1

0.2

0.3

0.4

0.5

Scor

es



0.165

0.170

0.175

0.180

0.185

Scor

es


100 102 104 106

SVC

0.150

0.175

0.200

0.225

0.250

0.275

Scor

es


10 4 10 2 100

SVC C

0.10

0.15

0.20

0.25

0.30

0.35

Scor

es



0.14

0.15

0.16

0.17

Scor

es


0 20 40 60 80 100Number of trees

0.2

0.4

0.6

0.8

1.0

Scor

es


0 10 20 30Max. depth

0.0

0.2

0.4

0.6

0.8

1.0

Scor

es



0.170

0.175

0.180

0.185

0.190

0.195

Scor

es


Nearest Neighbors

Support Vector Machine (RBF)

Random Forest

Neural Network

A B

F G H

C D E

I J

Fig. S10. Training and cross-validation scores obtained with the four different machine learning algorithms used for predicting styles as a function of their mainparameters and the training size. Panels (A) and (B) show the results for the nearest neighbors algorithm. Panels (C), (D) and (E) show the scores for the random forestalgorithm. The main parameters, in this case, are the number of trees in the forest and the maximum depth of the trees (Max. depth). Panels (F), (G) and (H) show the scoresfor the support vector classification (SVC) with the radial basis function kernel (RBF). The parameter γ is associated with the width of the RBF, and C′ is the penalty parameter.Panels (I) and (J) show the results for a neural network algorithm. The parameter α is the so-called L2 penalty, and the number of neurons is equal to 100. The averageaccuracies reported in Figure 4 where obtained for = 400 neighbors; γ = 104 and C′ = 0.1; number of trees = 400 and Max. depth = 5; and α = 10−4. All algorithmsare implemented in the scikit-learn library, and the learning curves are estimated with the best tuning parameters of each learning model.


0.00

0.04

0.08

0.12

Accu

racy

Nea

rest

Nei

ghbo

rsR

ando

m F

ores

tS

uppo

rt V

ecto

r M

achi

ne (R

BF)

Neu

ral N

etw

ork

Dum

my

(Strat

ified

)D

umm

y (U

nifo

rm)

Fig. S11. Accuracies of learning methods when considering the 92 different artistic styles with more than 100 images each. Depicted is the comparison between thefour different statistical learning algorithms (nearest neighbors, random forest, support vector machine, and neural network), and the null accuracies obtained from two “dummy”classifiers. Error bars represent the standard error of the mean. The four classifiers have similar accuracies (≈13%) and significantly outperform the “dummy” classifiers. Theparameters of the learning methods are the same as those used in Figure 4.


Documents

Supplementary Information for · Surrealism Baroque Symbolism Abstract Expressionism Naïve Art (Primitivism) Neoclassicism Cubism Rococo Northern Renaissance Minimalism Art Informel