34
CHAT and CLAN Fraibet Aveledo ESRC Centre for Research on Bilingualism in Theory and Practice

CHAT and CLAN Fraibet Aveledo ESRC Centre for Research on Bilingualism in Theory and Practice

Embed Size (px)

Citation preview

CHAT and CLAN

Fraibet Aveledo

ESRC Centre for Research on Bilingualism in Theory and Practice

Page 2

The corpora (brief summary about the importance of corpora)

– The computerization of the data.

– Spontaneous speech that represent a community

– The size of the corpus

– Homogeneity

– Transcriptions and notations

– Analysis of the data

Page 3

CHILDES and Talkbank

The CHILDES Project: Child Language Data Exchange System

The goal of TalkBank is to foster fundamental research in the study of human and animal communication.

– It will construct sample databases within each of the subfields studying communication.

– It will use these databases to advance the development of standards and tools for creating, sharing, searching, and commenting upon primary materials via networked computers.

Page 4

CHAT Codes for the Human Analysis of Transcripts

• Standardized format for computerized transcripts of face-to-face conversational interactions.

• CHAT allows

– to transcribe basic conversations

– provides options for coding more specialized information that allows

» to analyze syntax, phonology, and morphology phenomena.

Page 5

CHAT Codes for the Human Analysis of Transcripts

When transcribing

• Be careful no to transcribe spoken language as written language.

• Some issues have to be discussed, depending on the characteristics of the corpus.

• Tendency to use punctuations as in written language.

Page 6

Transcription in CHAT

• Transcription is done in CLAN programme.

• The sound can be accessed in the same page when the transcription is taking place.

• CHAT format have three main components:

– Headers

– Main tiers

– Dependent tiers

Page 7

Page 8

Headers

– Component for including information about subjects from transcription, date of recording, date of transcription, ages, etc.

– There are hidden, initial, constant and changeable headers.

– Hidden: @Font: and @UTF8

» Do not appear in CLAN but necessary for running the programme.

– Headers: should start by symbol @

– Then, the name of the header, followed by “:” and a tab “ “

Page 9

@Date: 25-JAN-1983

•IMPORTANT: headers never finish in any punctuation.

Between the “:” and the number 2

there is a TAB

Page 10

There are three initial headers, they are obligatory. Without them, CLAN does not work.

- @Begin

- @Language:

- @Participants:

- @Options:

- @ID: (STATFREQ y OUTPUT TO EXCEL)

- @Media:

- @End

@Begin it is placed at the beginning of the transcription.

This header is not followed by a colon.

@Language: It tells to the programme what language has been used in the dialogues.

In the CHAT manual there is a Table with the abbreviation for each language

Page 11

@Participants: they have to be placed in the second line of the transcription.

The ID, the names, and roles are placed here.

@Participants: SAR Sue Target_Child, CAR Carol Mother

Participants are identified by three letters, usually, a pseudonym. These letters have to go in capital letters. When transcribing children conversations, the role of each participant is written.

Page 12

@Options Not obligatory

@ID: Not obligaroty.

Page 13

@Media:

Page 14

There is another set of headers that are optional. They offer important information about the participants:

- In a case where the child Julio, is called JUL:

@Birth of JUL:@Birth place of JUL:@L1 of JUL:

Participant-specific headers

Page 15

@Exceptions:

@Interacion type:

@Location:

@Number

Constant headers: are optionals

Page 16

@Recording Quality

@Room Layout

@Time Duration

Page 17

Other headers

@Time Start:

@Transcriber:

@Trancription:

@Warning:

Page 18

Chageable headers

They can go in any part of the transcription

– @Activities:

– @Bck: background material

– @Bg and @Bg: for GEM

– @Comment:

– @Date: date of the interaction

– @Eg and @Eg: for GEM

– @New episode

– @New Language

– @Page: only written text

– @Situation:

– @Tape location

Page 19

Main tiersMain tiers

Main tiers contains the utterances produced by speakers. Each tier must start:

*JUL: mamá, quiero agua [c] y quiero chocolate [c]!

*MAM: ya te los traigo [c].

Transcribers decide what should contain each tier.

Each tier must finish in : . ! ?

•Utterances begin with small letters; exceptions: 1st person pronoun « I », proper names.

Page 20

Trancription markers

In the main tiers, in our transcriptions we mark the language of the word:

– *KAY: but@2 it´s@2 not@2 so@2 loud@2 (be)cause@2 the@2 range@2 is@2 all@2 the@2 way@2 over@2 there@2 .

– Language markers: @2 = English, @3 = Spanish @0 = Undetermined, @23 = word with first morpheme(s) English, second morpheme(s) Spanish, @32 = word with first morpheme(s) Spanish, second morpheme(s) English, @02 = word with first morpheme(s) undetermined, second morpheme(s) English.

– There are constant discussion about cases in which it is difficult to determine to what language the word belong.

Page 21

Trancription markers

Trailing off: +...

– *TOD: I think that I +...

Interruption: +/.

– *TOD: it’s your +/.

– *LEO: do you have a lion ?

Lazy overlap: +<

– *TOD: it’s your +/.

– *LEO: +<do you have a lion ?

Self-interruption: +//.

– *TOD: I don’t think +//.

– *TOD: let’s play Go Fish.

Self-completion: +,

– *TOD: I don’t think that I +...

– *SUS: what ?

– *TOD: +, that I know how to play .

Page 22

Other symbols

Repetition: [/]

*TOD: what [/] what did you say ?

If the repetition applies to more than one word, use angle brackets < >

Repetition with self-repair: [//]

*TOD: <what do> [//] what did you say ?

Retracing with reformulation: [///]

*TOD: what did [///] when are you coming ?

Page 23

Other symbols

Quotations

– *TOD: he said +”/.

– *TOD: +” do you have a lion ?

Pauses:

– #

– ## long

– ### very long

Not understood, or transcriber’s best guess: [?]

*SIM: pairs [?] I want to play Candyland .

Page 24

Page 25

Page 26

Page 27

Simple events

Page 28

Commentaries in the transcription, and codify, should be done in the Dependent Tiers

*JUL: mamá, quie(r)o XXX [c] y quie(r)o choco(l)ate [c]!

%com: the child does not master the liquids.

*MAM: ya te los traigo [c].

Dependent tiers

Page 29

Page 30

Transcription process

Before starting the transcription the headers tiers must be ready.

Transcription is done in CLAN.

Sound mode: sound file can be accessed in the same file where the transcription is taking place.

– Sound playing from the waveform

– Waveform demarcation

– Linking : transcription to the sound

• Bullet system: allows you to save in the transcription each bits of conversations transcribed in each tier (e.g. SASTRE 9)

– Changing the waveform window: +H, -H (time displayed in the window); +V –V (wave amplitude).

– Chanels R and L.

Page 31

OPTIONS ◄

Page 32

Page 33

CLAN Programmes

CLAN: Computerized Language Analysis

Instructions:

– Open CLAN

– Open Commands

– Setting Working and Lib

Page 34