Transcript

CHAT and CLAN

Fraibet Aveledo

ESRC Centre for Research on Bilingualism in Theory and Practice

Page 2

The corpora (brief summary about the importance of corpora)

– The computerization of the data.

– Spontaneous speech that represent a community

– The size of the corpus

– Homogeneity

– Transcriptions and notations

– Analysis of the data

Page 3

CHILDES and Talkbank

The CHILDES Project: Child Language Data Exchange System

The goal of TalkBank is to foster fundamental research in the study of human and animal communication.

– It will construct sample databases within each of the subfields studying communication.

– It will use these databases to advance the development of standards and tools for creating, sharing, searching, and commenting upon primary materials via networked computers.

Page 4

CHAT Codes for the Human Analysis of Transcripts

• Standardized format for computerized transcripts of face-to-face conversational interactions.

• CHAT allows

– to transcribe basic conversations

– provides options for coding more specialized information that allows

» to analyze syntax, phonology, and morphology phenomena.

Page 5

CHAT Codes for the Human Analysis of Transcripts

When transcribing

• Be careful no to transcribe spoken language as written language.

• Some issues have to be discussed, depending on the characteristics of the corpus.

• Tendency to use punctuations as in written language.

Page 6

Transcription in CHAT

• Transcription is done in CLAN programme.

• The sound can be accessed in the same page when the transcription is taking place.

• CHAT format have three main components:

– Headers

– Main tiers

– Dependent tiers

Page 7

Page 8

Headers

– Component for including information about subjects from transcription, date of recording, date of transcription, ages, etc.

– There are hidden, initial, constant and changeable headers.

– Hidden: @Font: and @UTF8

» Do not appear in CLAN but necessary for running the programme.

– Headers: should start by symbol @

– Then, the name of the header, followed by “:” and a tab “ “

Page 9

@Date: 25-JAN-1983

•IMPORTANT: headers never finish in any punctuation.

Between the “:” and the number 2

there is a TAB

Page 10

There are three initial headers, they are obligatory. Without them, CLAN does not work.

- @Begin

- @Language:

- @Participants:

- @Options:

- @ID: (STATFREQ y OUTPUT TO EXCEL)

- @Media:

- @End

@Begin it is placed at the beginning of the transcription.

This header is not followed by a colon.

@Language: It tells to the programme what language has been used in the dialogues.

In the CHAT manual there is a Table with the abbreviation for each language

Page 11

@Participants: they have to be placed in the second line of the transcription.

The ID, the names, and roles are placed here.

@Participants: SAR Sue Target_Child, CAR Carol Mother

Participants are identified by three letters, usually, a pseudonym. These letters have to go in capital letters. When transcribing children conversations, the role of each participant is written.

Page 12

@Options Not obligatory

@ID: Not obligaroty.

Page 13

@Media:

Page 14

There is another set of headers that are optional. They offer important information about the participants:

- In a case where the child Julio, is called JUL:

@Birth of JUL:@Birth place of JUL:@L1 of JUL:

Participant-specific headers

Page 15

@Exceptions:

@Interacion type:

@Location:

@Number

Constant headers: are optionals

Page 16

@Recording Quality

@Room Layout

@Time Duration

Page 17

Other headers

@Time Start:

@Transcriber:

@Trancription:

@Warning:

Page 18

Chageable headers

They can go in any part of the transcription

– @Activities:

– @Bck: background material

– @Bg and @Bg: for GEM

– @Comment:

– @Date: date of the interaction

– @Eg and @Eg: for GEM

– @New episode

– @New Language

– @Page: only written text

– @Situation:

– @Tape location

Page 19

Main tiersMain tiers

Main tiers contains the utterances produced by speakers. Each tier must start:

*JUL: mamá, quiero agua [c] y quiero chocolate [c]!

*MAM: ya te los traigo [c].

Transcribers decide what should contain each tier.

Each tier must finish in : . ! ?

•Utterances begin with small letters; exceptions: 1st person pronoun « I », proper names.

Page 20

Trancription markers

In the main tiers, in our transcriptions we mark the language of the word:

– *KAY: but@2 it´s@2 not@2 so@2 loud@2 (be)cause@2 the@2 range@2 is@2 all@2 the@2 way@2 over@2 there@2 .

– Language markers: @2 = English, @3 = Spanish @0 = Undetermined, @23 = word with first morpheme(s) English, second morpheme(s) Spanish, @32 = word with first morpheme(s) Spanish, second morpheme(s) English, @02 = word with first morpheme(s) undetermined, second morpheme(s) English.

– There are constant discussion about cases in which it is difficult to determine to what language the word belong.

Page 21

Trancription markers

Trailing off: +...

– *TOD: I think that I +...

Interruption: +/.

– *TOD: it’s your +/.

– *LEO: do you have a lion ?

Lazy overlap: +<

– *TOD: it’s your +/.

– *LEO: +<do you have a lion ?

Self-interruption: +//.

– *TOD: I don’t think +//.

– *TOD: let’s play Go Fish.

Self-completion: +,

– *TOD: I don’t think that I +...

– *SUS: what ?

– *TOD: +, that I know how to play .

Page 22

Other symbols

Repetition: [/]

*TOD: what [/] what did you say ?

If the repetition applies to more than one word, use angle brackets < >

Repetition with self-repair: [//]

*TOD: <what do> [//] what did you say ?

Retracing with reformulation: [///]

*TOD: what did [///] when are you coming ?

Page 23

Other symbols

Quotations

– *TOD: he said +”/.

– *TOD: +” do you have a lion ?

Pauses:

– #

– ## long

– ### very long

Not understood, or transcriber’s best guess: [?]

*SIM: pairs [?] I want to play Candyland .

Page 24

Page 25

Page 26

Page 27

Simple events

Page 28

Commentaries in the transcription, and codify, should be done in the Dependent Tiers

*JUL: mamá, quie(r)o XXX [c] y quie(r)o choco(l)ate [c]!

%com: the child does not master the liquids.

*MAM: ya te los traigo [c].

Dependent tiers

Page 29

Page 30

Transcription process

Before starting the transcription the headers tiers must be ready.

Transcription is done in CLAN.

Sound mode: sound file can be accessed in the same file where the transcription is taking place.

– Sound playing from the waveform

– Waveform demarcation

– Linking : transcription to the sound

• Bullet system: allows you to save in the transcription each bits of conversations transcribed in each tier (e.g. SASTRE 9)

– Changing the waveform window: +H, -H (time displayed in the window); +V –V (wave amplitude).

– Chanels R and L.

Page 31

OPTIONS ◄

Page 32

Page 33

CLAN Programmes

CLAN: Computerized Language Analysis

Instructions:

– Open CLAN

– Open Commands

– Setting Working and Lib

Page 34