Data Science view of the KDD 2014

Data Science of the KDD ‘14 Review Process

Jure Leskovec (Stanford) andWei Wang (UCLA)

Joint work with Jason Hirshman and David Zeng (Stanford)

KDD 2014 Research Track Statistics

KDD 2014 Program

Largest KDD program ever:• 151 research papers (20% growth over KDD’13)• 43 industry & govt. papers (30% growth)• 26 workshops (75% growth)• 11 tutorials (83% growth)

Program highlights:• Paper spotlights early morning (8:15am)• Oral presentations (Mon-Wed)• Posters at the reception (Tue night)

KDD 2014 Research Track

• 1036 submissions from 2600 authors– 42% increase over KDD ’13

• 151 papers:– Acceptance rate

14.6%

20002002

20042006

20082010

20122014

20160

200

400

600

800

1000

1200

KDD year

Num

ber o

f sub

mis

sion

s

KDD Reviewing Process

46 Senior PC members + 340 PC members• 2971 reviews in total

(Rough) Acceptance rule: • Raw review score AND Standardized review score AND Raw

meta-review AND Standardized meta-review score ≥ Weak Accept

• 110 papers matched (immediate accepts)• Remaining papers were discussed with meta-reviewers and

final decisions were made

Submissions per Country

Acceptance Rate per Country

Acceptance by Subject Area

Predicting Paper AcceptanceFeatures Used AccuracyRandom Guessing 0.50Paper Abstract 0.57

Author Status (Past paper counts) 0.64

Author Status (DBLP graph connectivity) 0.61

Author Status (Counts + Graph) 0.65

Reviewer (Similarity, Graph distance to authors) 0.60

All (Abstract, Author Status, and Reviewer) 0.65

Predicting Paper Acceptance from the Review Text

Features Used Paper: Accepted?

Review: Score > 0?

Random Guessing 0.50 0.50

Review Text 0.68 0.72

Review Text + Numeric Score (Novelty, Presentation) 0.77 0.77

Human Reading of Review Text 0.88 0.73

I’m submitting a paper:What correlates with acceptance?

Academia + Industry Papers do Better

Submissions per Author: 5 is best!

No benefit in submitting >5 papers!

Having more authors (seems to) help

It is the most experienced author that matters!

What insights can we gain on the review process?

Most reviews are Weak Rejects

More granularity is needed at the Weak Reject / Weak Accept level

Revi

ew a

gree

s w

ith th

e fin

al o

utco

me

Review length is a good determinant of a review’s influence/quality

Revi

ew a

gree

s w

ith th

e fin

al o

utco

me

Shorter reviews are used for clear accepts and rejects

Never review co-author’s papers

The Curse of the Review Submission Deadline

Over 50% reviews submitted in the last 5 daysOver 20% reviews submitted in the last 24 hours

10% of reviewssubmitted late

Ratings increase near the deadline

Weak Rejects increase while

Rejects decrease

Reviews submitted late are less likely to agree with final outcome

Late reviews are shorter

Review quality drops: Accuracy of predicting score from review text

Conclusions• To get your papers accepted to KDD:– Collaborate in multidisciplinary teams– Have a senior author on board– Do not submit more than 5 papers

• To improve KDD community standards:– Avoid Weak Reject/Weak Accept scores– Write longer and clearer reviews– Submit reviews early!

Data & Analytics

Data Science view of the KDD 2014