43
LP&IIS 2013, Springer LNCS Vol. 7912, pp. 57–68 Aaron L.-F. Han, Derek F. Wong, and Lidia S. Chao [email protected], {derekfw, lidiasc}@umac.mo June 17 th -18 th , 2013, Warsaw, Poland Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory Department of Computer and Information Science University of Macau

LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

Embed Size (px)

DESCRIPTION

LP&IIS 2013 Presentation PPT. Authors: Aaron Li-Feng Han, Derek Fai Wong and Lidia Sam Chao In Proceeding of International Conference of Language Processing and Intelligent Information Systems. M.A. Klopotek et al. (Eds.): IIS 2013, LNCS Vol. 7912, pp. 57–68, 17 - 18 June 2013, Warsaw, Poland. Springer-Verlag Berlin Heidelberg 2013

Citation preview

Page 1: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

LP&IIS 2013, Springer LNCS Vol. 7912, pp. 57–68

Aaron L.-F. Han, Derek F. Wong, and Lidia S. Chao

[email protected], {derekfw, lidiasc}@umac.mo

June 17th-18th, 2013, Warsaw, Poland

Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory

Department of Computer and Information Science

University of Macau

Page 2: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

Motivation and related work in NER (CNER)

Problem analysis and the aim of this work

A study of Chinese characteristics (in PER, LOC, and ORG)

The designed and optimized feature set

Employed CRF model

Experiments

Comparison with related work

Different performance of sub features

Formal definitions of the problems in CNER

Conclusion

Reference

Page 3: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Related literatures that are influenced by named entity recognition:

Information extraction

text mining

machine translation

knowledge management

information retrieval, etc.

• Rapid development of NLP also promotes the NER research

• Development of computer technology allows the analysis on big data

storage capacity

computational power

Page 4: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Lev and Dan [1] perform NER on English

Using unlabeled text and Wikipedia gazetteers.

• Sang and Meulder [2] conduct NER research on German

• Special applications of NER:

geological text processing [3]

biomedical named entity detection [4]

• Chinese NER (CNER), more difficult. Why?

no word boundary in Chinese sentence

Page 5: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• International CNER shared tasks under the SIGHAN (special interest group for Chinese) and CIPS (Chinese information processing society)

before 2008 [5][6]

• Chinese personal name disambiguation

after 2008 by SIGHAN [7][8]

• Explored methods on CNER:

Maximum Entropy [9][10][16]

Hidden Markov Model [11]

Support Vector Machine [12]

Conditional Random Field [13][15]

• Combination with other researches:

Word segmentation, sentence chunking, word detection [14]

Page 6: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Problems in the employed methods:

Maximum Entropy, local optimal solution, label bias

Markov Model , strong independence assumption

Support Vector Machine, low performance

Conditional Random Field, challenges in features selection

• Problems in the research work:

More discussion with the algorithm, less on the issues in CNER

Different features , less or no explanation or backgrounds

Less analysis on Chinese characteristics

Page 7: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• The aim of this work:

• An introduction of Chinese characteristics

• Feature optimization based on linguistic analysis

PER, LOC, ORG

• Comparisons of the performances by different algorithms

• Issues analysis and problem formalization in CNER

Page 8: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Chinese personal names (PER):

clear format: Surname Given-name (we use x+y)

• Chinese surnames: 11,939 by Chinese academy of science [19][20]:

5313 of which consist of one character

4311 of two characters

1615 of three characters

571 of four characters, etc.

• Chinese Given-name:

usually contains one or two characters as shown in Table 1.

Page 9: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

Pl: place; Bud: building; Org: organization; Suf: suffix; Abbr: abbreviation

Page 10: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Chinese location names (LOC):

• Commonly used suffixes:

路(road), 區(district), 縣(county),

市(city), 省(province), 洲(continent), etc.

• Some standard formats, as in Table 1:

use building names

place + building

place + organization

Mix + suffix

abbreviations

Page 11: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Chinese organization names (ORG):

• Some ORG entities contain suffixes

but the suffixes own various expressions, not formalized

• Others do not have apparent suffixes:

named by the owners of the organization

e.g. 笑開花(XiaoKaiHua, a small art association)

• Table 2 lists some kinds of ORG entities:

including administrative unit, company, arts, public service, association, education and cultural, etc.

• Potentially implying that ORG may be one of the difficult category

Page 12: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics
Page 13: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics
Page 14: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• X: the variable representing sequence

• Y: corresponding label sequence

• P(Y|X): the conditional model in mathematics

• G=(V, E): a graph G, V of vertices or nodes, E of edges or lines

• 𝑌 = {𝑌𝑣|𝑣 ∈ 𝑉}, Y is indexed by vertices of G

• (X, Y) is a conditional random field model [24]:

• 𝑃𝜃 𝑦 𝑥 ∝ exp 𝜆𝑘𝑓𝑘 𝑒, 𝑦 𝑒, 𝑥 +𝑒∈𝐸,𝑘 𝜇𝑘𝑔𝑘 𝑣, 𝑦 𝑣, 𝑥𝑣∈𝑉,𝑘

𝑓𝑘 and 𝑔𝑘 are the feature functions,

𝜆𝑘 and 𝜇𝑘 are the parameters to be trained

Page 15: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Training methods for CRF including:

Iterative scaling algorithms [24]

Non-preconditioned conjugate-gradient [25]

Voted perceptron training [26]

Quasi-newton algorithm [27], used in this work

online tool: http://crfpp.googlecode.com/svn/trunk/doc/index.html

Page 16: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Data intro:

• To deal with an extensive kinds of named entities

• Using the SIGHAN Bakeoff-4 corpora [6]

• Containing PER, LOC, and ORG three kinds of entities

• CityU (traditional Chinese) and MSRA (simplified Chinese)

• Perform on closed track (without using external resources)

• Detailed information for training and test data in Table 4 and 5.

Page 17: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• NE means the total of three kinds of named entities

Page 18: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• OOV means the entities of the test data that do not exist in the training data, and Roov means the OOV rate

Page 19: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• The samples of training corpus are shown as Table 6.

• In the test data, there is only one column of Chinese characters

Page 20: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Recognition results:

Page 21: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Evaluation metrics:

• 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑜𝑢𝑡𝑝𝑢𝑡

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑜𝑡𝑎𝑙 𝑜𝑢𝑡𝑝𝑢𝑡

• 𝑅𝑒𝑐𝑎𝑙𝑙 =𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑐𝑜𝑟𝑟𝑒𝑐𝑡 𝑜𝑢𝑡𝑝𝑢𝑡

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑢𝑡ℎ

• 𝐹𝑠𝑐𝑜𝑟𝑒 = 𝐻𝑎𝑟𝑚𝑜𝑛𝑖𝑐 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛, 𝑅𝑒𝑐𝑎𝑙𝑙 =2×𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛×𝑅𝑒𝑐𝑎𝑙𝑙

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙

• Evaluation is performed on NE level (not token-per-token). – E.g., if a token is supposed to be B-LOC but it is labeled I-LOC instead, then this will not

be considered as a correct labeling

Page 22: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Evaluation scores:

Page 23: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• There are several main conclusions derived:

• 1. These experiments results corroborate our analysis of the Chinese characteristics: PER and LOC have simpler structures and expressions that make the recognition easier than the ORG – the Roov rate (in Table 5) of LOC is the lowest (0.1857 and 0.0861 respectively for CityU

and MSRA) and the corresponding recognition of LOC performed very well (0.8599 and 0.8988 respectively in F-score).

– in the MSRA corpus, the Roov of ORG (0.3533) is larger than PER (0.3026) and the corresponding F-scores of ORG are lower

– however, in CityU corpus, the Roov of ORG (0.4884) is much lower than PER (0.7850) while the recognition result of ORG also perform worse (0.6646 and 0.8036 respectively of F-scores for them)

Page 24: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• 2. The recognition of the OOV entities is the principal challenge for the automatic systems – the total OOV entity number in CityU (0.4882) is larger than MSRA (0.2142), and the

corresponding final F-score of CityU (0.7955) is also lower than MSRA (0.8833)

Page 25: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Comparison with baselines in Table 9:

• The baselines are produced by a left-to-right maximum match algorithm applied on the testing data with the named entity lists generated from the training data.

Page 26: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• The experiments have yielded much higher F-scores than the baselines

• The baseline scores are unstable on different entities resulting synthetically in the total F-scores of 0.5955 and 0.6105 respectively for CityU and MSRA corpus.

• On the other hand, our results show that the three kinds of entity recognitions get high scores generally without big twists and turns.

• This proves that the approaches employed in this research are reasonable and augmented.

• The improvements on ORG and PER are especially larger on both two corpora, leading to the total increases of F-scores 33.6% and 44.7% respectively.

Page 27: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Comparison with related works:

• Related works that use different features (various window sizes)

• algorithms (CRF, ME, SVM, etc.)

• external resources (external vocabularies, POS tools, name lists, etc.)

• the comparison test on MSRA, some works briefly in Table 10. – Due to the fact that most researchers undertake the test only on MSRA corpus

• use number n to represent the character – previous nth character when n<0

– the following nth character when n>0

– and the current token case when n=0

– E.g., B(-10, 01, 12) means the three bigram features (former one and current, current and next one, next two characters).

Page 28: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics
Page 29: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• From Table 10:

• when the window size of the features is smaller, the performance shows worse.

• too large window size cannot ensure good results – while it will bring in noises and cost more running time simultaneously.

• external materials do not necessarily ensure better performances – the combination of segmentation and POS will offer more information about the test

set; however, the segmentation and POS accuracy also influence the system quality.

• the experiment of this paper has yielded promising results by employing optimized feature set and a concise model.

Page 30: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• the performances of different sub features in our experiments

• the corresponding results respectively in Table 11

Page 31: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Table 11 shows:

• Generally speaking, more features lead to more training time, and when the feature set is small this conclusion also fit the case of iteration number.

• However, this conclusion does not stand when the feature set gets larger – e.g. testing on the MSRA corpus, the feature set (FS) FS4 needs 314 iteration number

which is less than 318 by FS2 although the former feature set is larger.

– This may be due to the fact that the feature set FS2 needs more iterations to converge to a fixed point.

• Employing the CRF algorithm, the optimized feature set is chosen as FS4 – and if we continue to expand the features the recognition accuracy will decrease as in

Table 11

Page 32: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Due to the changeful and complicated characteristics of Chinese

• there are some special combinations of characters, and sometimes we can label them with different performances with all results reasonable in practice.

• These make some confusion for the researchers.

• How do we deal with these problems?

• To facilitate further researches, we introduce and provide some formal definitions of the existing issues in CNER

Page 33: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• First, the Function-overload problem: – (also called as metonymy in some place)

• One word bears two or more meanings in the same text. – E.g., the word “大山”(DaShan) means an organization name in the chunk “大山國際銀

行” (DaShan International Bank) and the whole chunk means a company

– While “大山” (DaShan) also represents a person name in the sequence “大山悄悄地走了” (DaShan quietly went away) with the whole sequence meaning a person's action

• It is difficult for the computer to differ their meaning and assign corresponding different labels (ORG or PER) – they must be recognized through the analysis of context and semantics.

Page 34: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Furthermore, the Multi-segmentation problem in CNER:

• one sequence can be segmented into a whole or more fragments according to different meanings, and the labeling will correspondingly end in different results. – For example, the sequence “中興實業” (ZhongXing Corporation) can be labeled as a

whole chunk as "B-ORG I-ORG I-ORG I-ORG" which means it is an organization name

– It also can be divided as “中興 / 實業” and labeled as “B-ORG I-ORG / N N” meaning that the word “中興”(ZhongXing) can represent the organization entity and “實業” (Corporation) specifies common Chinese word, and this usage is widespread in Chinese documents.

Page 35: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• Another example of the Multi-segmentation problem in CNER: – the sequence “杭州西湖” (Hang Zhou Xi Hu) can be labeled as "B-LOC I-LOC I-LOC I-LOC"

as a place name

– but it can also be labeled as "B-LOC I-LOC B-LOC I-LOC" due to the fact that “西湖” (XiHu) is indeed a place that belongs to the city “杭州” (HangZhou).

• Which label sequences shall we select for them? Both of them are reasonable. This is a difficult problem for manual work, let alone for computer.

• Above discussed problems are only some of the existing ones in CNER. If we can deal with them well, the performances will be better in the future.

Page 36: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• This paper undertakes the researches of CNER which is a difficult issue in NLP literature.

• The characteristics of Chinese named entities are introduced respectively on personal names, location names and organization names.

• Employing the CRF algorithm, optimized features have shown promising performances compared with related works that use different feature sets and algorithms.

• Furthermore, to facilitate further researches, this paper discusses the problems existing in the CNER and puts forward some formal definitions combined with instructive solutions.

• The performance results can be further improved in the open test through employing other high quality resources and tools – e.g. externally generated word-frequency counts, common Chinese surnames and

internet dictionaries

Page 37: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• 1. Ratinov, L., Roth, D.: Design Challenges and Misconceptions in Named Entity

• Recognition. In: Proceedings of the Thirteenth Conference on Computational Natural

• Language Learning (CoNLL 2009), pp. 147–155. Association for Computational

• Linguistics Press, Stroudsburg (2009)

• 2. Sang, E.F.T.K., Meulder, F.D.: Introduciton to the CoNLL-2003 Shared Task:

• Language-Independent Named Entity Recognition. In: HLT-NAACL, pp. 142–147.

• ACL Press, USA (2003)

• 3. Sobhana, N., Mitra, P., Ghosh, S.: Conditional Random Field Based Named Entity

• Recognition in Geological text. J. IJCA 1(3), 143–147 (2010)

• 4. Settles, B.: Biomedical named entity recognition using conditional random fields

• and rich feature sets. In: Collier, N., Ruch, P., Nazarenko, A. (eds.) International

• Joint Workshop on Natural Language Processing in Biomedicine and its Applications,

• pp. 104–107. ACL Press, Stroudsburg (2004)

• 5. Levow, G.A.: The third international CLP bakeoff: Word segmentation and named

• entity recognition. In: Proceedings of the Fifth SIGHAN Workshop on CLP,

• pp. 122–131. ACL Press, Sydney (2006)

Page 38: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• 6. Jin, G., Chen, X.: The fourth international CLP bakeoff: Chinese word segmentation,

• named entity recognition and Chinese pos tagging. In: Sixth SIGHAN Workshop

• on CLP, pp. 83–95. ACL Press, Hyderabad (2008)

• 7. Chen, Y., Jin, P., Li, W., Huang, C.-R.: The Chinese Persons Name Disambiguation

• Evaluation: Exploration of Personal Name Disambiguation in Chinese News. In:

• CIPS-SIGHAN Joint Conference on Chinese Language Processing, pp. 346–352.

• ACL Press, BeiJing (2010)

• 8. Sun, L., Zhang, Z., Dong, Q.: Overview of the Chinese Word Sense Induction Task

• at CLP2010. In: CIPS-SIGHAN Joint Conference on CLP (CLP2010), pp. 403–409.

• ACL Press, BeiJing (2010)

• 9. Jaynes, E.: The relation of Bayesian and maximum entropy methods. J. Maximumentropy

• and Bayesian Methods in Science and Engineering 1, 25–29 (1988)

• 10. Wong, F., Chao, S., Hao, C.C., Leong, K.S.: A Maximum Entropy (ME) Based

• Translation Model for Chinese Characters Conversion. J. Advances in Computational

• Linguistics, Research in Computer Science. 41, 267–276 (2009)

Page 39: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• 11. Ekbal, A., Bandyopadhyay, S.: A hidden Markov model based named entity recognition

• system: Bengali and Hindi as case studies. In: Ghosh, A., De, R.K., Pal, S.K.

• (eds.) PReMI 2007. LNCS, vol. 4815, pp. 545–552. Springer, Heidelberg (2007)

• 12. Mansouri, A., Affendey, L., Mamat, A.: Named entity recognition using a new

• fuzzy support vector machine. J. IJCSNS 8(2), 320 (2008)

• 13. Putthividhya, D.P., Hu, J.: Bootstrapped named entity recognition for product

• attribute extraction. In: EMNLP 2011, pp. 1557–1567. ACL Press, Stroudsburg

• (2011)

• 14. Peng, F., Feng, F., McCallum, A.: Chinese segmentation and new word detection

• using conditional random fields. In: Proceedings of the 20th international conference

• on Computational Linguistics (COLING 2004), Article 562. Computational

• Linguistics Press, Stroudsburg (2004)

• 15. Chen, W., Zhang, Y., Isahara, H.: Chinese named entity recognition with conditional

• random fields. In: Fifth SIGHAN Workshop on Chinese Language Process-

• ing, pp. 118–121. ACL Press, Sydney (2006)

Page 40: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• 16. Zhu, F., Liu, Z., Yang, J., Zhu, P.: Chinese event place phrase recognition of emergency

• event using Maximum Entropy. In: Cloud Computing and Intelligence Systems

• (CCIS), pp. 614–618. IEEE, ShangHai (2011)

• 17. Qin, Y., Yuan, C., Sun, J., Wang, X.: BUPT Systems in the SIGHAN Bakeoff 2007.

• In: Sixth SIGHAN Workshop on CLP, pp. 94–97. ACL Press, Hyderabad (2008)

• 18. Feng, Y., Huang, R., Sun, L.: Two Step Chinese Named Entity Recognition Based

• on Conditional Random Fields Models. In: Sixth SIGHAN Workshop on CLP,

• pp. 120–123. ACL Press, Hyderabad (2008)

• 19. Yuan, Yida, Zhong, W.: Contemporary Surnames. Jiangxi people’s publishing house,

• China (2006)

• 20. Yuan, Yida, Qiu, J., Zhang, R.: 300 most common surname in Chinese surnamespopulation

• genetic and population distribution. East China Normal University

• Publishing House, China (2007)

• 21. Huang, D., Sun, X., Jiao, S., Li, L., Ding, Z., Wan, R.: HMM and CRF based

• hybrid model for chinese lexical analysis. In: Sixth SIGHAN Workshop on CLP,

• pp. 133–137. ACL Press, Hyderabad (2008)

Page 41: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• 22. Sun, G.-L., Sun, C.-J., Sun, K., Wang, X.-L.: A Study of Chinese Lexical Analysis

• Based on Discriminative Models. In: Sixth SIGHANWorkshop on CLP, pp. 147–150.

• ACL Press, Hyderabad (2008)

• 23. Yang, F., Zhao, J., Zou, B.: CRFs-Based Named Entity Recognition Incorporated

• with Heuristic Entity List Searching. In: Sixth SIGHAN Workshop on CLP,

• pp. 171–174. ACL Press, Hyderabad (2008)

• 24. Lafferty, J., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic

• models for segmenting and labeling sequence data. In: Proceeding of 18th

• International Conference on Machine Learning, pp. 282–289. DBLP, Massachusetts

• (2001)

• 25. Shewchuk, J.R.: An introduction to the conjugate gradient method without the

• agonizing pain. Technical Report CMUCS-TR-94-125, Carnegie Mellon University

• (1994)

• 26. Collins, M., Duffy, N.: New ranking algorithms for parsing and tagging: kernels over

• discrete structures, and the voted perceptron. In: Proceedings of the 40th Annual

• Meeting on Association for Computational Linguistics (ACL 2002), pp. 263–270.

• Association for Computational Linguistics Press, Stroudsburg (2002)

Page 42: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

• 27. The Numerical Algorithms Group. E04 - Minimizing or Maximizing a Function,

• NAG Library Manual, Mark 23 (retrieved 2012)

• 28. Zhao, H., Liu, Q.: The CIPS-SIGHAN CLP2010 Chinese Word Segmentation Backoff.

• In: CIPS-SIGHAN Joint Conference on CLP, pp. 199–209. ACL Press, BeiJing

• (2010)

• 29. Zhou, Q., Zhu, J.: Chinese Syntactic Parsing Evaluation. In: CIPS-SIGHAN Joint

• Conference on CLP (CLP 2010), pp. 286–295. ACL Press, BeiJing (2010)

• 30. Xu, Z., Qian, X., Zhang, Y., Zhou, Y.: CRF-based Hybrid Model for Word Segmentation,

• NER and even POS Tagging. In: Sixth SIGHAN Workshop on CLP,

• pp. 167–170. ACL Press, India (2008)

Page 43: LP&IIS2013 PPT. Chinese Named Entity Recognition with Conditional Random Fields in the Light of Chinese Characteristics

Aaron L.-F. Han, Derek F. Wong, and Lidia S. Chao

[email protected], {derekfw, lidiasc}@umac.mo

Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory

Department of Computer and Information Science

University of Macau