Deﬁning Lexical Semantic Relationships for Terms of ... · MathPre and MathWeb corpora. concordances based on the use of different statistical corpus-based approaches [7]. The concordances

Defining Lexical Semantic Relationshipsfor Terms of Precalculus Study

VELISLAVA STOYKOVAInstitute for Bulgarian Language - IBLBulgarian Academy of Sciences - BAS

52, Shipchensky proh. str. bl. 17, 1113 [email protected]

MAYA MITKOVAGulf University for Science and Technology

P.O. Box 7207Hawally 32093

[email protected]

Abstract: In this work∗we present the results of a collaborative interdisciplinary research for defining the basicdomain concepts in teaching precalculus using the techniques for semantically-oriented statistical search of con-trasting educational electronic web-based text corpora of e-course materials. The research use the application of acombination of different statistical search techniques for defining the semantic relationships of some basic math-ematical concepts for precalculus study. We offer various example techniques by using the appropriate softwareapplication.

Key–Words:Extracting Mathematical Terms from Course Texts, Research and Education, Science and New Tech-nologies in Education, Computers and Internet in Education.

1 Introduction

The mathematics is a compulsory basic subject in en-gineering education. The university content and pro-portion of different mathematical subjects in coursesfor university mathematics may vary depending onthe national educational standards for secoundary anduniversity level education. However, mastering basicmathematical concepts is a key teaching activity forthe improvement of students mathematical ability anddeveloping engineering creativity. The problem of de-signing courses including teaching precalculus is of-ten considered as a problem of the way the differentbasic mathematical concepts and their order are trans-posed and introduced to students in the course outline.

The problem can be, also, addressed as a pro-cess of creating a mathematical terminological con-ceptual knowledge hierarchy which defines the or-ganization and the structure of teaching precalculuscourse material. Further, we are going to presentand discuss the results of a corpus-based research andanalysis of three web-based open-source course ma-terials for precalculus study based on the computer-supported extraction of lexical semantic and termino-logical database design techniques.

∗The research is performed within the framework of theproject ”Semantic Networks Conceptual Formal Representation”of IBL-BAS.

2 Computer-based approaches tolexical semantics and terminologyextraction

The corpus-based approach has been the most suc-cessful technique recently developed and applied forthe creation of higly structured and semantically-oriented lexical reference sources like dictionary, the-saurus, etc. The approach allows the application ofrule-based search techniques (mostly encoding gram-matical relations like inflection [11, 13], syntax, etc.)and statistically-based search techniques (mostly usedfor extracting the lexical semantic relationships [10,7]) to investigate the word behaviour in large-scaleelectronic text collections.

Some hybrid systems allow the combined appli-cation of both techniques, depending on the specificresearch tasks. Further, we are going to present thecombination of different statistical search techniquesto define the semantics of the basic concepts for theprecalculus study for three web-based e-courses in thesubject based on the use of a corpus-based approach.

2.1 Defining the basic concepts by usingword frequency

Acquiring mathematical knowledge is a complicatedprocess which depends on various requirements likethe appropriate curricula design and a related syllabus,

Recent Researches in Educational Technologies

ISBN: 978-1-61804-021-3 240

the use of highly successful teaching methods, thepreliminary students ground knowledge, the studentsgender [6], etc. But in general, it is influenced mostlyby the definition and mastering of the basic conceptsover which the new knowledge is structured.

The general approach we are using for definingthe basic domain concepts is similar to that used forthe EU Life Long Learning project KELLY (KEy-words for Language Learning for Young and adults)where the basic concepts are defined by searchingthe contrasting electronic multilingual text corporadatabase including the British National Corpus (BNC)for extracting keywords frequency lists [4]. Wordfrequency lists are interpreted by psychologists as abasic concepts for knowledge acquisition and under-standing. Educationalists hold the view that word fre-quency lists are playing an important role for learn-ing to read and similarity. Word frequency lists arewidely used for various educational tasks like curricu-lum development, writing syllabus and preparing testsfor quality evaluation.

3 Examining contrasting corpora forkeywords by statistical search

The mathematics, itself, can be regarded as a specialkind of symbolic language but, nevertheless, it usesnatural language both for definitions and for explana-tions [12]. Mathematical texts, and related teachingcourses, include numbers, formulas, tables, pictures,and graphics by means of which the meaning of theirsemantic content can be fully unterpreted. In our re-search, we have used the texts as they appear in theteaching courses but we have analysed and interpretedonly content words since they stand for the concepts.

3.1 Text corpora and keywords definitions

We have created three electronic text corpora - Math-Pre (consisting of precalculus e-course given at [14]),MathWeb (consisting of precalculus e-course given at[9]) and MathWiki (consisting of precalculus web en-cyclopedic description [8]) - of almost 200 000 words.Also, the BNC is used as a standard to compare andinterpret the results.

There are various statistical approaches to definekeywords. In general, most of them define the taskfor keywords extraction as the retrieval and clusteringof statistically similar words [5] and they differ withrespect to the statistics used. However, we use thestatistics incorporated in the Sketch Engine (SE) [2, 3]software for processing corpora which allows the useof combined statistical approaches and the compari-

Figure 1: Keywords for MathPre and MathWeb.

son of the results between several corpora. After us-ing the SE standard statistical options for processingour three corpora for keywords definition, we have ob-tained the results given at Fig. 1 (for MathPre andMathWeb) and Fig. 2 (for MathWiki). The results rep-resents the basic precalculus concepts likefunction(s),numbers,polynomials,graphs,equations, etc. Theydiffer with respect to the e-course material used andwith respect to the text type - the encyclopedic knowl-egde presentation of MathWiki corpus, which needs afurther elaboration to be tuned for the use for teach-ing courses. In general, our keywords frequency listsgive the basic mathematical concepts over which theprecalculus teaching is structured, however, the pro-portion and their order have to be clarified by usingfurther statistical research to define their semantic re-lationships. Further, we are going to define the seman-tic relationships for the basic conceptfunction(s).

4 Semantic similarity based on sta-tistical measurement

There are different approaches to define the semanticcontext. In fact, the context can be defined both in log-ical and in linguistical terms, however, it is highly de-pendent on the particular logical or linguistical theory.At the same time, the so-called context-free grammarshave been evaluated as unappropriate tools for natu-


ISBN: 978-1-61804-021-3 241

Figure 2: Keywords for MathWiki corpus.

ral language processing. Currently, the statistical cor-pus approaches based on the measurement of the wordsimilarity and defining words concordances have beenwidely used in lexicography for word sense definition.The related corpus query systems allow great flexi-bility of statistically-based search for co-occurrencesand collocations using different statistical techniqueswhere the context is defined in statistical quantitativeterms.

4.1 The basic lexical semantic relationships

In general, lexical semantic conceptual relations areregarded as to be of two types - horizontal and ver-tical. The horizontal (linear) semantic relationshipsare those of synonymy and anthonymy, i. e. show-ing semantic similarity [10] and semantic distance.The vertical semantic relationships express the rela-tionships of ordering or hierarchy. The vertical se-mantic relationships are those of meronymy realisedby hyperonymy and hyponymy. All types of seman-tic relationships can be defined by experting the re-lated contexts through the generation of related word

Figure 3: Concordances of the wordfunction fromMathPre and MathWeb corpora.

concordances based on the use of different statisticalcorpus-based approaches [7]. The concordances giveall occurrences of the word and its related context andcan be generated by using statistical search [10]. Theexample concordances for the basic conceptfunctionreceived from MathPre, MathWeb and MathWiki aregiven at Fig. 3 and Fig. 4, respectively. Concordancesdefine the context in quantitative terms and a furtherwork is needed to be done to define the semantic rela-tionships by searching for co-occurrences and collo-cations of the related keyword.

Figure 4: Concordances of the wordfunction fromMathWiki corpus.


ISBN: 978-1-61804-021-3 242

Figure 5: Collocations of the keywordfunction forMathPre and MathWeb corpora.

4.2 Using statistics to define co-occurrencesand collocations

Concordances and collocations are words which aremost probably to be found with the related keyword.They define the semantic relations between the key-word and its particular collocated word which mightbe of similarity or of a distance. The statistical ap-proaches we are using to search the co-occurrent andcollocated words are based on defining the probabil-ity of their co-occurrence and collocation. We haveused the techniques ofT − score, MI − score [1]andMI3 − score [7] incorporated in SE for process-ing and searching our three corpora. Basically for all,the following terms are used:N - corpus size,fA -number of occurrences of the keyword in the wholecorpus (the size of the concordance),fB - numberof occurrences of the collocate in the whole corpus,fAB - number of occurrences of the collocate in theconcordance (number of co-occurrences). The relatedformulas for definingT − score , MI − score andMI3 − score are given at Fig. 7.

The most likely collocations candidates words(which are the most frequent collocates) for the key-word functionare given at Fig. 5 (for MathPre andMathWeb) and Fig. 7 (for MathWiki) and arepoly-nomial, exponential,rational, logarithmic, trigono-metric, etc. They express the hierarchical concep-tual semantic relationships of the keyword. Alterna-tively, the relatively not too frequent collocations likeperiodic, continuous, inverse, increasing, decreas-

Figure 6: Collocations of the keywordfunction forMathWiki corpus.

2log AB

A B

f N

f f

A BAB

AB

f ff

N

f

−

3

2log AB

A B

f N

f f

MI-Score

T-Score

MI3-Score

Figure 7: The statistics used forT − score , MI −

score andMI3 − score measurement.

ing, real-valued,multi-valued,positive,negative, etc.represent the attributive semantic relationships of thekeywordfunction.

4.3 Conceptual semantic hierarchy

Even the most frequent collocations represent the rela-tionship of similarity, they do not necessarily expressthe semantic relationship of synonymy. Mostly, theycan be interpreted in their hierarchical relationships.Thus, for our research, we are using such interpreta-tion, and we analyse thepolynomial functionandex-ponential functionas the most important concepts tobe mastered in teaching precalculus and therationalfunctionas the basic concept to start with. Thelog-arithmic functioncan be presented as the inverse tothe exponential functionand thetrigonometric func-tion can be presented as divided into its subsequentpartssine,cosine,tangent, andcotangent functions.


ISBN: 978-1-61804-021-3 243

5 Conclusion and future work

In our research, we have used three related web-basedelectronic corpora consisting of open-source mathe-matical texts about precalculus. The final results con-firm that it is possible by using the statistically-basedsoftware incorporated in the SE to search for key-words for defining the basic mathematical conceptsof teaching precalculus and to refine their concep-tual hierarchy by searching for collocations and co-occurrences. The BNC comparative results show verylow frequency of the basic mathematical concepts forteaching precalculus. Surprisingly, we did not foundthere the termprecalculusinstead.

Further, we are going to continue our research byextending it for building a thesaurus-like conceptualhierarhy and by comparing the results for some morelanguages.

References:

[1] Church, K. and Hanks, P. (1991). Word AssociationNorms, Mutual Information and Lexicography.Com-putational Linguistics16(1), 22-29.

[2] Kilgarriff, A. and Rundell, M. (2002). Lexical Pro-filing Software and its Lexicographic Applications:a Case Study. InProceedings from EURALEX 2002.Copenhagen. 807-811.

[3] Kilgarriff, A., Rychly, P., Smrz, P., and Tugwell, D.(2004). The Sketch Engine. InProceedings from EU-RALEX 2004.Lorient, France, 105–116.

[4] Kilgarriff, A., Reddy, S., Pomikalek, J. and Avinesh,P. (2010). A Corpus Factory for Many Languages.In N. Calzolari ed.Proceedings of the LREC 2010.Malta. 904-910.

[5] Lin, Dekang. (2002). Automatic Retrieval and Clus-tering of Similar Words. InProceedings of theCOLING-ACL.Montreal. 768-774.

[6] Mitkova, M. (2009). Achievement in Basic MathCourses Reflected Through the Gender Difference. InProceedings of the 4th International Conference onResearch and Education in Mathematics (ICREM4),76-81.

[7] Oakes, M. (1998).Statistics for Corpus Linguistics.Edinburgh University Press.

[8] Precalculus. (2011).http://en.wikipedia.org/wiki/Precalculus

[9] Precalculus Tutorial. (2011).http://jwbales.home.mindspring.com/precal/

[10] Sparck Jones, K. (1986).Synonymy and SemanticClassification.Edinburgh University Press.

[11] Stoykova, V. (2002). Bulgarian noun – definite ar-ticle in DATR. In D. Scott, ed.Artificial Intelli-gence: Methodology, Systems, and Applications. Lec-ture Notes in Artificial Intelligence 2443, Springer-Verlag, 152–161.

[12] Stoykova, V. (2009). Language-dependent orLanguage-independent e-Learning Mathematics.In Proceedings, Part II, Tempus JEP 41110-2006,TEMIT, 71–83.

[13] Stoykova, V. (2010). Representing Lexical Knowl-edge for Bulgarian Inflectional Morphology inDATR. In N. Mastorakis, V. Mladenov, Z. Bojkovic,and D. Simian, eds.Latest Trends on Computers, vol.2, 612–616.

[14] Topics in Precalculus. (2011).http://www.themathpage.com


ISBN: 978-1-61804-021-3 244

Documents

Deﬁning Lexical Semantic Relationships for Terms of ... · MathPre and MathWeb corpora. concordances based on the use of different statistical corpus-based approaches [7]. The concordances