29
Web-Scale Features for Full-Scale Parsing Mohit Bansal and Dan Klein UC Berkeley 1

Web-Scale Features for Full-Scale Parsingmbansal/papers/acl11_webFeatParsing... · 2016-08-02 · Web-Scale Features for Full-Scale Parsing Mohit Bansal and Dan Klein UC Berkeley

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Web-Scale Features for Full-Scale Parsing

    Mohit Bansal and Dan Klein

    UC Berkeley

    1  

  • Example

    They considered running the ad during the Super Bowl.

    running * during à 11K running it during à 239

    considered * during à 7K considered it during à 112

    [Nakov and Hearst 2005b]

    head arg

  • Canonical Ambiguity Type 1

    VBD

    VP

    NP

    NN spent

    PP

    with family

    time

    VBD

    VP

    NN

    spent

    PP

    with family

    time

    Prepositional phrase (PP) attachment ambiguities (isolated)

    vs.

    spent with time with

    [Volk 2001; Nakov & Hearst 2005b]

  • Canonical Ambiguity Type 2

    NP coordination ambiguities

    vs.

    [Nakov & Hearst 2005b; Bergsma et al. 2011]

    NP

    NP

    CC

    car

    NP

    truck production

    and

    NN

    NP

    NP

    NN NN

    CC

    NP

    production

    NN

    car

    and

    truck

  • Canonical Ambiguity Type 3

    Noun compound bracketing ambiguities

    vs.

    liver cell

    [Lapata & Keller 2004; Nakov & Hearst 2005a; Pitler et al. 2010]

    liver antibody

    NP

    NP

    NN NN

    NP

    antibody

    NN

    liver

    cell

    NP

    NP

    NN NN

    NP

    liver

    NN

    cell

    antibody

  • Parsing Errors

    Berkeley parser - errors cast as incorrect dependency attachments

    This work - single system that addresses various kinds of ambiguities

    21.2 19.6

    10.0 7.4

    5.3 4.5 4.2 3.5 3.3 3.1 3.1

    0.0

    5.0

    10.0

    15.0

    20.0

    25.0

    NNP IN NN RB NNS CC JJ VBD TO CD DT

    % o

    f Tot

    al E

    rror

    s

    Child Tag

  • PDT

    NP

    … half

    DT

    a

    NN

    dozen

    NNS

    newspapers

    WSJ Errors

    … ordered full pages in the Monday editions of half a dozen newspapers .

  • PDT

    NP

    … half

    DT

    a

    NN

    dozen

    NNS

    newspapers

    QP

    WSJ Errors

    … ordered full pages in the Monday editions of half a dozen newspapers .

  • VBZ

    VP

    … ´s

    ADVP

    RB

    just

    ADJP

    JJ

    fine

    WSJ Errors

    … a familiar message : Keep on investing , the market 's just fine .

  • VBZ

    VP

    … ´s

    RB

    just

    JJ

    fine

    ADJP

    WSJ Errors

    … a familiar message : Keep on investing , the market 's just fine .

  • Using Web-Scale Features

    raising $ 30 million from debt

    !(raising from) !($ from)

    "(head arg) Idea: Edge-factored features that encode web-counts

  • (v                        n1                p              n2)verb / noun attachment ?

    Prepositional Phrase (PP) disambiguation

    Web-Scale Statistics

    (Volk, 2001)

    Only 2 competing attachments !

  • 1[h, a]

    Dependency Features

    Discriminative dependency parsing

    (McDonald et al., 2005; inter alia)

    1[cluster(h), cluster(a)] (Koo et al., 2008; Finkel et al., 2008)

    ɸ(h a)

  • Web-Scale Features

    Affinity based Web features

    c(raising * from) = 20K

    raising * from

    Web-count  

    20  Binning  

    ɸ(raising +3 from)

  • Web-Scale Features

    Affinity based Web features

    1[VBG ― +3 ― IN, 20]

    ɸ(raising +3 from)

    Unlexicalized Lexicalized

  • Web-Scale Features

    Affinity based Web features

    1[POS(h) ― d ― POS(a), webcnt]

    ɸ(h [distance] a)

  • Web-Scale Features

    Paraphrase (context) based Web features

    ɸ(raising from)

    raising * from

    Top trigrams  

    raising money from raising funds from raising him from raising it from raising capital from

    ....

    [Nakov and Hearst 2005b]

  • Web-Scale Features

    Paraphrase (context) based Web features

    ɸ(raising from)

    1[VBG ― it ― IN]

  • Web-Scale Features

    Paraphrase (context) based Web features

    ɸ(h a)

    1[POS(h) ― context ― POS(a)]

  • Web-Scale Features

    Collecting top context words

    before :

    ɸ(h a)

    middle :

    after :

  • q1‘  =    w1  

    q2‘  =    wn  

    Web n-grams Query Count Trie

    counts

    q1  q2  

    q1  *q2  

    q1  **q2  

    q1  ***q2  

    w1            …        wn w1            …        wn w1            …        wn

    SC

    AN

    {q2}  hash {q1} hash

    Computing Web Statistics Efficiently

    Search engines inefficient – use Google n-grams (n = 1 to 5)

    Query q  =  q1q2

    #(w1…wn)  

    4 billion n-grams

    4.5 million queries

    < 2 hours

    Batch – Collect all queries beforehand, then scan all n-grams

  • Parsing Results

    92.0

    91.4

    90 91 92 93

    This work

    McDonald & Pereira 2006

    UAS

    Dependency Parsing Web-features integrated into underlying dynamic program

    Error reduction (relative) of 7.0% over order-2 features

  • Parsing Results

    Constituent Parsing

    91.4

    91.1

    90.2

    89 90 91 92

    Web + Configurational features

    Web features

    Petrov et al. 2006

    Parsing F1

    Get k-best parses and rerank them discriminatively

    Error reduction (relative) of 9.2% and 12.2%

    Collins+Charniak

  • Error Analysis

    Errors reduced for a variety of child (argument) types

    12.4

    9.1

    12.1

    5.8

    15.7

    11.7

    6.7

    1.4

    11.6

    -2.0 0.0 2.0 4.0 6.0 8.0

    10.0 12.0 14.0 16.0 18.0

    NN NNP IN DT NNS JJ CD VBD RB CC VB

    % E

    rror

    Red

    uctio

    n

    Child Tag (in decreasing order of total # of attachments)

  • Error Analysis

    Error reduction for each type of parent attachment for a given child

    18 23

    30

    20

    33

    0

    10

    20

    30

    40

    IN NN VB NNP VBN

    % E

    rror

    Red

    uctio

    n

    Parent Tag

    11 11

    20 18 23

    0

    10

    20

    30

    NN VBD NNS VB VBG

    % E

    rror

    Red

    uctio

    n

    Parent Tag

    9

    22 17

    9

    17

    0

    10

    20

    30

    NN NNS VBD JJ CD

    % E

    rror

    Red

    uctio

    n

    Parent Tag

    NN Child Tag IN JJ

  • Affinity features  

    High-Weight Features

    their bridge back into the big-time RB IN

    an Oct. 19 review of “The, Misanthrope” NN IN

    The new rate will be payable Feb. 15 DT NN

  • Paraphrase features  

    High-Weight Features

    the guile learned from his years in VBN this IN

    sow a row of male-fertile plants The NN IN

    about stock purchases and sales by NNS and NNS

    The row of

    learned this from

    purchases and sales

  • Web-features are powerful disambiguators  

    Conclusion

    Incorporation into end-to-end full-scale parsing  

    Uniform treatment of all attachment error types  

    7-12% relative error reduction in state-of-the-art parsers  

    Intuitive features surface in the learning setup  

  • Thank you!

    29  

    Questions?