61
HASSAN MOAVIA 12080702-018 UNIVERSITY OF GUJRAT

Hassan presentation of corpus

Embed Size (px)

Citation preview

HASSAN MOAVIA 12080702-018UNIVERSITY OF GUJRAT

TOPIC:

USE OF CORPUS TO INVESTIGATE AND

DEVELOP LEXICAL KNOWLEDGE

CONTINUE…

• Use of corpus to investigate and develop lexical knowledge through phrase, phraseology, collocation, colligation ,polysymy, word formation ,lexical sets etc.

WHAT IS LEXIS

• According to oxford dictionary the level of language consisting of vocabulary , as opposed to grammar or syntax: the distinction between grammar and lexis.

• According to Collins English dictionary the totality of vocabulary items in a language, including all forms having lexical meaning and grammatical function from the Greek lexis words.

CONTINUE….

• Linguistic is the scientific study of language.

• According to ‘Sinclair’ ‘one does not study all botany by making artificial flowers’.

• According to ‘Trudgill’ Trudgill is even harsher in his criticism of those who do not base their descriptions on actual language use. in the final analysis if linguistic is not about language as it is opposed actually being spoken or written by human beings, then it is about nothing at all.

WHAT IS CORPUS?

• Corpus is not just any collection of text, a corpus is a collection of naturally occurring language text chosen to characterize a state or variety of a language. in other words corpus is designed and complied based on corpus design principles. another feature which is fundamental to corpus linguistic is that a corpus is ‘machine readable’.

LIMITATIONS OF CORPUS

• Corpora can tell us whether something is frequent, or not, but they are not able to tell us if something is possible in a language.

• Corpora can only show us what they contain.

• Corpora can give us evidence but the user must then interpret this information.

• Corpora contain examples of language outside of divorced from their original ‘visual and social context’.

CONTINUE….

• It’s a healthy, vibrant discipline’.

• The key to its success remains the same basic method.

• Large quantities of ‘raw’ text are opposed directly in order to present the researcher with objective evidence.

CORPUS-DRIVEN AND CORPUS-BASED APPROACHES

• Corpus-based approach is ‘deductive’ because reasoning works from the more general to the more specific, which is a ‘top down’ approach. The researcher begins with a theory about a topic of interest and then narrows that down into more specific hypothesis that can be tested using a corpus.

• Corpus-driven approach is ‘inductive’ Inductive reasoning works from specific observation to broader generalizations and theories, and therefore a bottom-up approach. the researcher begins with specific observations and measures in order to identify patterns and regularities.

PHRASE

• In everyday speech, a phrase may refer to any group of words. In linguistics, a phrase is a group of words that form a constituent and so function as a single unit in the syntax of a sentence.

• Examples

• in the big hall

• at the market day

TYPES OF PHRASE

Phrase Example Hits

Adverb phrase Too slowly 6

Adjective phrase Very happy 66

Noun phrase A group of students 7

Preposition phrase At lunch 12

Verb phrase Watch TV 18

PHRASEOLOGY

• 1. The way in which words and phrases are used in speech or writing; style.

• 2. A set of expressions used by a particular person or group.

• In linguistics, phraseology is the study of set or fixed expressions, such as idioms and other types of multi-word lexical units.

EXAMPLES

Phrases Hits

Because of 2404

Al though 1743

In addition 690

On the other hand 683

As a result of 471

In spite of 308

FIXED EXPRESSION

• A fixed expression is a little like a secret code that allows access to a club that not everyone can enter. It’s a phrase that has a very specific meaning that can’t be expressed any other way and also can’t be deduced just by considering the sum of its parts.

EXAMPLES

Fixed expressions Hits

At least 2212

For the first time 861

Of course 735

Just in case 6

To top it all of 2

IDIOM• An idiom is a phrase where the words together have a meaning that is

different from the dictionary definitions of the individual words, which can make idioms hard for ESL students and learners to understand.

EXAMPLES

Idioms Hits

Take root 18

Bring home 7

Not for nothing 5

Leading edge 4

Brush with death 3

COMPOUND WORDS

• Compound word is a combination of two or more words which function as a single word (Richard, et al, 1985).

• Compound words are formed when two or more words are put together to form a new word with a new meaning.

• Three types of compound words

• Hyphenated compound words

• examples

• One-half, well being, fast-food, full-time etc.

• Open compound words

• Ice cream, post office, middle class etc.

• Close compound words

• examples

• Bedroom, motorcycle, software, everything, football etc.

COLLOCATIONAND

COLLIGATION

COLLOCATION

• Collocation is aexpression consisting of two or more words that correspond to some convertional way of saying things.

• Collocation is a sequence of two or more consecutive words.

• The words togather can mean more then their sum of parts.

• For example

red bull, dark age, the time of india

Numbers Words Concordance hits

1 Red handed 20

2 Red rose 4

3 Red crescent 4

4 Red mosque 56

5 Red bull 9

COLLIGATION

• It is a grammatical category . This idea is given by hoey.

• The grammatical company that a word or phrases is associative with

• For example

he eats, they go,

EXAMPLE OF “THEY GO”

COMPOUND WORDS IN CORPUS

kind Compound words Type Hits

Noun + Noun Bedroom close 41

football close 546

Motor cycle close 304

water tank open 16

Noun + Verb Rainfall close 71

haircut close 05

Noun + Adverb passer-by hyphenated 32

Verb + Noun Washing machine open 07

swimming pool open 26

Kind Compound

Words

Type Hits

Verb +Adverb lookout close 10

take-off hyphenated 16

drawback close 19

Adjective +Noun greenhouse close 127

software close 457

Adjective + Verb dry-cleaning hyphenated 01

public speaking open 09

Adverb + Verb out put open 616

over through open 62

input close 102

Adverb + Noun onlooker close 03

by-stand hyphenated 0

COINAGE

• One of the least common processes of word formation in English is coinage, that is, the invention of totally new terms.

• The act of creating new word or phrase that other people begin to use.

• Examples:

blog, google, aspirin, nylon, zipper.

COINAGE WORDS IN CORPUSWords Coinage Words Hits

Web log blog 44

Googol google 144

robotics robot 29

salicylic acid aspirin 05

heroisch heroin 95

BORROWING

• A word adopted from one language to use in another language.

• English has borrowed extensively from other languages.

• Especially French, Latin, Greek and other languages also included.

BORROWED WORDS IN CORPUS

Language Borrow Word Hits

French murder 951

torture 382

unique 339

trophy 709

menu 52

Arabic average 1005

coffee 205

sahib 304

safari 39

zero 344

Language Borrow Words Hits

URDU balti 05

purdah 05

Hindi samosa 48

jungle 93

basmati 135

sari 20

shampoo 09

LEXICAL SETS

• In general a group of words that share a specific form or meaning is called lexical set.

• Lexical set is a group of words with the same topic, function or form e.g cat , dog, tortoise goldfish is a part of typical lexical set.

• Lexical sets are not always easily identified from corpus data.

TYPES OF LEXICAL SETSSynonyms:

a word having the same or nearly the sane meaning as another word,.

Word or an expression that serves as an figurative or symbolic substitute for another.

An example of synonyms are the words begin and commence.

Synonyms can be any part of speech(such as verb, adverb, adjective, preposition) as long as both words belong to the same part of speech.

Examples:

Verb(buy and purchase)

Adjective(big and long)

Adverb(quickly and speedly)

Preposition(on and upon)

Numbers Words Hits

1 Buy 823

Purchase 437

2 Big 2330

Large 4582

3 Quickly 505

Speedly 0

Fastly 0

Numbers Words Frequency

4 On 100350

Upon 1684

5 Old 4085

Ancient 226

Aged 250

ANTONYMS

• A word having a meaning opposite to that of another word, e.g fast is an antonym of slow.

• Antonyms can be explored with corpus data by corpora in particular to establish whether they share the ranges of references and phraseological patterns .

• Some examples of antonyms from corpus are as below.

Numbers Words Frequency

1 Early 2531

Late 2354

2 Empty 245

Full 2261

Numbers Words Hits

3 Wife 995

Husband 687

4 Dead 1555

Alive 378

5 Day 10575

Night 1998

POLYSEMYDefinition:

A polysemy is a word or phrase with different, but related senses or have multiple related meanings.

English has many words which are polysemous.

Examples:

Main article: Man

List of polysemes:

1. The human species (i.e., man vs. animal)

2. Males of the human species

(i.e., man vs. woman)

3. Adult males of the human species

(i.e., man vs. boy)

This example shows the specific polysemy where the same word is used at different levels of a taxonomy.

• Main article:Bank

• List of polysemes:

1. A financial institution.(3340)

2. Bank of river(beach)(2767)

• Main article:Book

• List of polysemes:

1) a bound collection of pages(1284)

2) a text reproduced and distributed (thus, someone who has read the same text on a computer has read the same book as someone who had the actual paper volume)(112)

• Main article:Wood

• List of polysemes:

I. A piece of a tree(121)

II. A geographical area with many trees.(10)

• Main article:Crane

• List of polysemes:

I. A bird(2)

II. A type of construction equipment(6)

Word Hits Freq Rank

Man 3684 3320 351

Bank 6007 3340 322

Wood 131 110 7626

Book 1396 1284 1029

Crane 8 6 37347

BLENDING

• Def:• A blend word or a blend is a word formed from parts of two or more other words.

• Formation:

Most blends are formed by one of the following methods

I. The beginning of one word is added into the end of the other word. e.g brunch is a blend of breakfast and lunch.

• smoke (1) + fog (2) → smog

• spoon (1) + fork (2) → spork

•smart (1) + sassy (2) → smassy

2. The beginnings of two words are combined. For example, cyborg is a blend of cybernetic and organism.

3. Two words are blended around a common sequence of sounds. For example, the word motel is a blend of motor and hotel.

Word Hits Freq Rank

Brunch 1 1 76218

Smog 10 9 31957

Motel 6 3 53737

Cyborg 0 0 0

DERIVATION• Definition:

derivation is the process of forming a new word on the basis of an existing word, e.g. happiness and unhappy from happy,

Derivational morphology often involves the addition of a derivational suffix or other affix.

• Adjective-to-noun: -ness (slow → slowness)

• Adjective-to-verb: -ise (modern → modernise)

• Adjective-to-adverb: -ly (personal → personally)

• Noun-to-verb: -fy (glory → glorify)

• Verb-to-adjective: -able (drink → drinkable)

• Verb-to-noun (abstract): -ance (deliver → deliverance)

• Verb-to-noun (agent): -er (write → writer)

• Examples of English derivational patterns and their suffixes:

Pattern Affix word Concordance

Hits

Rank

Adj-to-verb ise Modern-to-(modernise) 41 14029

Adj-to-adverb ly Personal-to-(personally) 261 4419

Verb-to-adj able Drink-to-(drinkable) 1 81023

Verb-to-

noun(agent

er Write-to-(writer) 842 1571

Noun-to-verb fy Glory-to-(glorify) 3 51907

ACRONYMS1. Acronyms are new words formed from the initial letters of a set of other

words. Where the pronunciation consists of saying each separate letter.

More typically, acronyms are pronounced as new single words, as in

NATO, NASA or UNESCO. These examples have kept their capital letters.

2. Many acronyms simply become everyday terms such as laser (“light

amplification by stimulated emission of radiation”), radar (“radio

detecting and ranging”) and zip (“zone improvement plan”) code.

3. Examples:

• PEMRA

• OGRA

• NATO

• NADRA

• First at 203 that is PPP

Numbers Words Hits

1 NATO 505

2 PEMRA 77

3 OGRA 65

4 NADRA 111

CLIPPING• The element of reduction that is noticeable in blending is even more apparent in the

process described as clipping. This occurs when a word of more than one syllable (facsimile) is reduced to a shorter form (fax) Other common examples are ad (advertisement)

• sub (submarine)*auto (automobile)*exam (examination)*fan (fanatic)*deli (delicatessen)*memo (memorandum)*ref (referee)*champ (champion)*bike (bicycle)*ad (advertisement)*burger (hamburger)*grad (graduate)*teen (teenager)*math (mathematics)*dorm (dormitory)*copter (helicopter)*phone (telephone)*plane (airplane)*stats (statistics)

Numbers Words Hits

1 AD(advertisement) 182

2 Fan(fanatic) 133

3 Math(mathematics) 23

4 Stat(statistics) 6

BACK FORMATION

• A very specialized type of reduction process is known as backformation. Typically, a word of one type (usually a noun) is reduced to form a word of another type (usually a verb. A good example of backformation is the process whereby the noun television first came into use and then the verb televise was created from it. Other examples of words created by this process are: donate (from “donation”), emote (from“emotion”), other examples are.

Numbers Words Frequency

1 Donate (VERB) from donation(noun) 34

2 Emote (verb) from emotion (noun) 1

3 Work (verb) from worker (noun) 4936

CONVERSION• A change in the function of a word,

• NOUN TO VERB : as for example when a noun comes to be used as a verb (without any reduction), is generally known as conversion.

NUMBERS WORDS FREQUENCY

1 CHAIR (AS A VERB) 142

2 Chair (as a noun) 166

3 Chair 308

• VERB TO NOUN

• Examples:• guess

• Must

• Spy

Numbers Words Frequency

1 Guess (as a verb) 86

2 Guess (as a noun) 80

3 Guess 166

•Phrasal verb to noun• Examples are as follows.

• To print out a print out

• To take over a take over

Numbers Words Ferquency

1 To take over Mostly as a verb

2 A take over Some time as a noun

3 Take over 196

•Verb to adjective• Examples are as follows:

• See through in see through material

• Stand up standup comedian

•Adjective to verb:• Dirty floor to dirty

• Empty room to empty