13
Gathering and Organizing System for PErsonal Language Skills G.O.S.PE.L.S. Student: Enrico Zanardo Supervisor: Prof. Vittore Casarosa Free University of Bolzano-Bozen 8 th October 2010

Gathering and Organizing System for PErsonal Language Skills - GOSPELS

Embed Size (px)

Citation preview

Gathering and Organizing System for PErsonal Language Skills

G.O.S.PE.L.S.

Student: Enrico ZanardoSupervisor: Prof. Vittore CasarosaFree University of Bolzano-Bozen8th October 2010

Goal

Provide appropriate documents to users based on their language skills in English, Italian and German as determined in accordance with guidelines provided by the European Language Portfolio.

DEEN

IT

Outline

● Problems;● Proposed Solution;● Prototype & Results;● Conclusion;

Objective

DE-A2

EN-B1IT-C2

EN-C1

IT-A2

DE-B2

IT-C2

EN-B2

IT-B1

DE-B2

DE-A2

Problems1. Classify documents according to “GOSPELS rating system” and match it to rating of the European Language Portfolio (A1, A2, ..., C1, C2).

2. Know user's language skills for the three language supported by the system (English, Italian and German).

3. Provide results in the three different languages according to user's language skills in each language.

Solution to step 1(Classify documents)

Algorithm

Level of complexityof the document

Frequency ofmost

common words

Part ofSpeech of the word

Docs

Solution to step 2(user's language skills)

Algorithm

Level of complexityof the document

Frequency ofmost

common words

Part ofSpeech of the word

TemplateDocuments

RangeLanguage

Levels

Docs

Match betweenGospels Algorithm & ELP

Example Results

A1 A2 B1 B2 C1 C2

A1 A2 B1 B2 C1 C2

0

500

1000

1500

2000

2500

3000

3500

4000

4500

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

12.66

23.9425.51

31.88

34.0935.72

Italian

Gospels Algorithm

Rating Known words Words

Solution to step 3 (three language results)

Apache Solr 1.4

WEB-GUIJ2EE

DBPostgresql 8.4.4

Apache Nutch 1.1

LanguageLevel plug-in

TreeTagger

Wiktionary

APACHE TOMCAT 6.0

ARCH LINUX 2010.05

CRAWLER

INDEXERSEARCHER

USER Profile

GOOGLE TRANSLATOR API

APACHE LUCENE

Internet “unibz.org”

Prototype

Conclusions and possible extensions● The prototype is stable and seems to work well.

● Further testing required to improve and tune the algorithm● Further testing required to improve the matching with ELP

● The architecture can easily support other languages● It needs the frequency of words in the new language● It needs the PoS tagger for the new language

● The prototype can be easily modified to become an additional function of an existing digital library● It has to be embedded in the indexer

demo?

QUESTIONS?

Danke Grazie

Thank-you