AVA: A Large-Scale Database for Aesthetic Visual Analysis
Calvin Deutschbein
An Aside: Aesthetics● Per Wikipedia, aesthetics is the study of
beauty and taste.
● It is highly subjective, especially at current knowledge levels
● As with many cases in image recognition, establishment of ground truth is difficult
An Aside: Aesthetics
Top 4 Google Results for “beautiful” and “ugly”
Increasing subtlety increases difficulty
An Aside: Aesthetics● Which of these is more beautiful?●
●
●
●
●
● The right image has higher page rank, but...
Paper Goals● Develop a consistent database for a clear
ground truth in aesthetics testing– As with Imagenet, Caltech 101, this spurs and
focuses research ● Through experimentation, demonstrate the
merits of an improved database– Consider both scale and quality
Database Creation● Artistic nature heavily complicates
– For rating art, humans often use:● Vested critics, i.e. journalists, content creators● Established evaluation frameworks, i.e. star ratings● Criticism aggregation, i.e. Rotten Tomatoes, Oscar's
● How to do this on the scale of a ML dataset?– Many methods (Mechanical Turk) fail to provide
vested critics
DPChallenge.com● Photograph challenge site
– Participants create and evaluate: “vested critics”– Challenges are named – can create semantic tags
● Submissions for contests are ranked– This determines “ground truth” aesthetics
● As this work is already complete, only a matter of datamining
An Example: “Fireworks”Place 1/157: 7.4/10 Place 157/157: 4.2/10
Semantic tags are also associated with some images, but these images were associated with none
Anyway, this is novel
Another Nicety: Elegant Spreads● Distributions of scores are approx. normal!
● Standard deviation is a function of mean score!
● High variance occurs on unusual images!
● This is as good as one could hope for...
Examples of Niceness
Legend gives %age of results in colored cluster
Semantics & Score
Something interesting happens at extremes...
Exercising AVA● To demonstrate the usefulness of AVA, it was
used in three ways:– Generic aesthetic quality categorization– Content based aesthetic classification– “Style” Categorization
Rating Aesthetic Quality● Perhaps the most interesting test (linear SVM):
– Binary classification a la social media– Produced better results when trained on more data
● Dimishing returns (not surprising)– Including middling images improves results
● Including only extreme examples led to model confusion when confronted with ambiguous images
Content Classification● This leveraged the semantic tags● Class-specific SVMs performed better than
generic SVMs that didn't utilize classification– Only true for content-based models– In other cases (color, SIFT), generic is favored
“Style” Categorization● Different photographic styles were used and
recognized by dpchallenge voters– Examples include silhoutte and vanish point
● This is novel to this paper– It leverages the domain expertise of voters heavily
● The SVMs could now say “why” something is beautiful instead of just whether or not it is