Transcript
Page 1: XWindows apps:  emacs, xkwic

1

XWindows apps: emacs, xkwic

LING 5200Computational Corpus LinguisticsMartha PalmerFebruary 9, 2006

Page 2: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

52002

Emacs

emacs –nw

Control x, control c – exit (C-x,C-c) Control x, control s – save (C-x, C-s) Control x, control v – visit (C-x, C-v) Appropos

Page 3: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

52003

Emacs – Hour 12 in book

emacs –nw

Control x, b – switch to a new buffer, Control x, Control b – show all buffers, Control x, 1 – just show one window, Control g – ignore the last command, Control h – help (works on verbs?)

Page 4: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

52004

Preparing to run xkwic: modifying your .cshrc

Don't forget to make a back-up copy of your .cshrc file before editing it.

Page 5: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

52005

Preparing to run xkwic: modifying your .cshrc

ls –a Create an alias for cp to make it

prompt you before blowing away a file

alias cp 'cp –i'

Page 6: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

52006

Echo command

Don't forget to make a back-up copy of your .cshrc file before editing it.

You can check the value of an environment variable by using the echo command. Try it now:

Enter echo $TGREP_CORPUS. What do you see? You shouldn't see anything, because you haven't defined the TGREP_CORPUS variable. If you do see something, ask for help.

Page 7: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

52007

Add to .cshrc – See Lab 4

# xkwic stuff setenv CWBHOME /corpora2/imscorpus setenv CORPUS_REGISTRY $CWBHOME/registry setenv MANPATH $CWBHOME/man:$MANPATH setenv UIDPATH "/usr/local/ims-cwb/lib/ X11/uid/

%N/%U"

# tgrep stuff #setenv TGREP_CORPUS /corpora/treebank2/tbl_075/tgrepabl/brwn_cmb.crp setenv TGREP_CORPUS /corpora/treebank2/tgrepabl/wsj_mrg.crp

Page 8: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

52008

The PATH variable

One very important environment variable is the PATH variable. You can view the current value of your path variable by typing echo $PATH. As you can see, you already have a value defined. We're going to change it.

Open your .cshrc file with a text editor (emacs .cshrc or pico -w .cshrc. Find a line that looks something like this:

Page 9: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

52009

The PATH variable (cont.)

set path=($HOME/bin /usr/local/bin /usr/local/etc /usr/local/lang/bin /usr/ucb /bin /usr/bin /usr/sbin /usr/local/ssh/bin /usr/local/TeX/bin /usr/local/mh/bin /usr/local/elm/bin /usr/local/metamail/bin /usr/local/gnu/bin /usr/ucb /usr/openwin/bin /usr/local/X11/bin /usr/ccs/bin /etc . )

Page 10: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520010

Adding PATH’s

Now you'll define some new environment variables in your .cshrc file. There are two ways to do it. One would be to copy the following lines into your .cshrc file, either by hand or by copying and pasting off of this web page.

The other would be by tailing my .cshrc (/home/mpalmer/.cshrc), and appending the output to your .cshrc (hint: >>). Don't forget to make a back-up copy of it first, and don't forget to source .cshrc afterwards!

Page 11: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520011

A PATH for xkwic

Now enter the string /usr/local/ims-cwb/bin before the period that precedes the closing parenthesis, so that it looks something like this:

set path=($HOME/bin /usr/local/bin /usr/local/etc /usr/local/lang/bin /usr/ucb /bin /usr/bin /usr/sbin /usr/local/ssh/bin /usr/local/TeX/bin /usr/local/mh/bin /usr/local/elm/bin /usr/local/metamail/bin /usr/local/gnu/bin /usr/ucb /usr/openwin/bin /usr/local/X11/bin /usr/ccs/bin /etc /usr/local/ims-cwb/bin . )

Page 12: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520012

Running xkwic

Save your file, source it, and check the value of your path variable again. You should see /usr/local/ims-cwb/bin in it now (in addition to the rest of the stuff that was there before).

You're now ready to run xkwic! Start it by entering xkwic at the command line.

Page 13: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520013

Fire it up

To start xkwic: $babel> xkwic & First step: select a corpus

Page 14: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520014

Select a corpus

Page 15: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520015

Select a corpus

BNC is lemmatized… …Brown and WSJ aren't

Page 16: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520016

Select a corpus and a search pattern1. Select the BNC corpus by clicking on the

question-mark next to the Search Space text field.

2. Search for the word research with the query [word = "research"]. How many results do you get?

Page 17: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520017

Word attribute

Page 18: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520018

Output of a search: KWIC

Page 19: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520019

Select a corpus and a search pattern1. Select the BNC corpus by clicking on the

question-mark next to the Search Space text field.

2. Search for the word research with the query [word = "research"]. How many results do you get?

3. Search for the lemma research with the query [lemma = "research"]. How many results do you get? Why the difference?

Page 20: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520020

Lemma attribute outputInflected forms

Case differences

Page 21: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520021

Regular expressions in attributes of a position

Page 22: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520022

Searching with POS tags

1. Search for tokens of research that are not verbs with the query [lemma = "research" & pos != "V.*"]. How many results do you get?

2. Modify the display so that you can see the POS of all words: File -> Display Attributes -> Concordance -> Positional Attributes; highlight "word" and "pos", click "update" and "Dismiss". What are two non-verb POS tags that research occurs with?

Page 23: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520023

POS attribute

Page 24: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520024

Page 25: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520025

I am SOOO frustrated…

Page 26: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520026

Basic unit of xkwic:the position Attributes of a position:

Word POS Lemma (BNC)

Searching for "positions" by attribute…

Page 27: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520027

Multiple attributes of a position

[word = "research" & pos = "NN1"]

Page 28: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520028

Multiple attributes of a position

[word = "research" & pos = "NN1"]

Ampersand to connect the two attributes

Page 29: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520029

Multiple attributes of a position

[word = "research" & pos = "NN1"]

Single pair of square brackets around all attributes of the single position

Page 30: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520030

Negation

[word = "research" & pos != "NN1"]

= means "is" or "does match"

!= means "isn't" or "doesn't match"

Page 31: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520031

Regular expressions in attributes of a position

Wildcard: . Character classes: [word = "[Tt]he"] Grouping Alternation: | Quantifiers: Kleene star, Kleene plus

Page 32: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520032

Sequences of positions

[lemma = "research"] [word = "the"]

Each position gets its own set of square brackets

Page 33: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520033

Sequences of positions

[lemma = "research"] [word = "the"]

A space between the positions

Page 34: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520034

Regular expressions over positions

Wildcard: [] Any single position

Quantifier: * [lemma = "research"] []* [word = "funding"]

Page 35: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520035

Resources – Laura is bugging me to make a CU Corpora page… Like this

http://www.stanford.edu/dept/linguistics/corpora/cas-home.html

TGREP http://www.stanford.edu/dept/linguistics/corpora/cas-tut-tgrep.html

Page 36: XWindows apps:  emacs, xkwic

LING 5200, 2006 BASED on Kevin Cohen’s LING

520036

Xkwic resources

CQP home page: http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/

CQP User's Manual: http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQPUserManual/HTML/ (html version)


Recommended