Upload
lorenzo-lopez-cerros
View
212
Download
0
Embed Size (px)
Citation preview
Accessing an Information System by Chatting
Bayan Abu Shawar and Eric Atwell
[email protected], [email protected]
School of Computing
University of Leeds
Presentation Outline
Introduction.
Chatbot and corpus definitions.
ALICE chatbot system.
What has been done so far.
System architecture of the Qur’an chatbot.
Results and Evaluation.
IntroductionMethods of Accessing an information system:Information Retrieval (IR): which retrieve a relevant
subset of documents from a large set.
Information Extraction (IE): which is the process of extracting specific pieces of data from documents to fill a list of slots in predefined templates.
We presented another way to access an information system using a chatbot tool.
Definitions
A Chatbot is a computer program, which is designed to simulate human conversation.
The user chats with the bot using textual or spoken natural language.
The chatbot must have access to knowledge (e.g., set of input/output rules), to accept input and match it against the rules to generate replies in the conversation.
We developed a machine learning approach to automatically generate chatting rules from machine readable text (corpora) and convert it to the ALICE chatbot format.
ALICE System
ALICE: the Artificial Linguistic Internet Computer Entity;
a software robot that you can chat with using natural language.
ALICE is composed of two parts:
1. Chatbot Engine
2. The language model
ALICE language model is stored in AIML files.
AIML: The Artificial Intelligence Mark up Language.
The AIML Format
< aiml version=”1.0” >
< topic name=” the topic” >
<category>
<pattern>PATTERN</pattern>
<template>Template</template>
</category>
..
</topic>
</aiml>
Implementing a Java ProgramThe primary goal of chatbots is to mimic real human conversations. We developed a Java program to read from ‘real’ human dialogues and generate conversational rules for the ALICE chatbot.
The program reads a dialogue corpus
Converts the dialogue transcript to AIML format.
The output AIML is used to retrain ALICE.
The Aim of the Automatic ProcessSaving time and effort in encoding the knowledge
manually.
Generating different versions of the chatbots that are not restricted to specific language and/or domain.
Creating versions that simulates ‘real’ human conversation.
Machine Learning ApproachUsing most significant word approach: based on the fact
that usually people respond according to the most significant word.
A frequency list has been obtained form each corpora then used to generate the least frequent word.
The Dialogue Corpora Used so Far
Minnesota: French dialogue corpus.
Spoken Afrikaans: Afrikaans dialogue corpus.
British National Corpus (BNC): Spoken transcripts.
The Holy book of Islam (Qur’an)
The Qur’an is written in the classical Arabic form.
Qur’an consists of 114 soora (chapters), which are grouped into 30 parts.
Each soora consists of sequential verses (sections).
The Original English Text Format of Qur’anSample:
THE DAYBREAK, DAWN, CHAPTER NO. 113
With the Name of Allah, the Merciful Benefactor, The Merciful Redeemer
113.1 Say: I seek refuge with the Lord of the Dawn
113.2 From the mischief of created things;
113.3 From the mischief of Darkness as it overspreads;
113.4 From the mischief of those who practise secret arts;
113.5 And from the mischief of the envious one as he practises envy.
Using the Qur’an as a Trainable Corpus
We selected the Qur’an to illustrate:
1. Whether or not we could access an information source via chatting?
2. How to convert a written text to the AIML format?
3. How to adapt ALICE to learn from a text which is not a dialogue transcripts?
4. How to adapt ALICE interpreter to recognise the Arabic characters?
The Qur’an chatbot
In this chatbot we used the parallel corpora English/Arabic text.
Input: a statement, question or a verse in English.
Output: Verse(s) extracted from Qur’an in both English and Arabic
Problems raised:
1. How to divide a non conversational text into utterance-like chunks?
2. How to enable ALICE interpreter to recognise Arabic characters?
System Architecture
Dialogue
corpora
Reading files
Filtration
Text pre-processing Phase
Reiteration Preparation
Restructuring Phase
Investigation Patterns Equalities
Investigation Templates Equalities
AIML FILE
String Vector
Results
Two versions of ALICE have been generated:
Qur’an0-13: http://www.pandorabots.com/pandora/talk?botid=832e8a088e348e83 [link]
Qur’an 14-114:
http://www.pandorabots.com/pandora/talk?botid=b60ec43cbe3481fe [link]
Sample of Chatting Dialogue Human: Is it right that our protector is Allah? Qur'an14-114: {Muhammad (Muhammad)} 11 That is because Allah is the Protector of those who believe, but those who reject Allah have no protector.
) محمد ) سورة لهم مولى ال الكافرين وأن آمنوا الذين مولى الله بأن 11ذلك{The Morning Star (At-Tariq)} 4 There is no soul but has a protector over it.
) الطارق ) سورة حافظ عليها ا لم نفس كل 4إن
Human: Allah, the eternal, Absolute
Qur'an14-114: {Sincerity (Al-Iklas)} 3 He begetteth not, nor is He begotten;يولد ولم يلد اإلخالص )لم 3(سورة
Human: He begetteth not, nor is He begotten Qur'an14-114: {Sincerity (Al-Iklas)} 4 And there is none like unto Him.
أحد كفوا له يكن اإلخالص )ولم 4(سورة
Evaluation
Problems of evaluation:
1. It is not easy to evaluate this kind of general information access.
2. The source is not in terms of specific questions.
3. Can not count numbers of hits in order to compute precision and recall scores.
Evaluation (cont.)
Evaluate the System in Terms of:
1. Achieving the proposed objectives.
2. Finding possible usefulness for the system.
3. User satisfaction.
Evaluation (Cont.)1. We achieved our goals which are focusing in using a
text which is not conversational in its nature and using the Arabic language.
2. The feedback from users were as follows:
Some users found the tool unsatisfactory since it does not provide answers to the questions.
Others found it interesting to:
a. Know more about Qur’an.
b. Find out from which soora a certain verse came from.
Conclusions1. We presented a novel way of accessing information
from an online source by having an informal chat.
2. The system may use as a search tool for verses that hold same words but have different connotations.
3. It may be good to know the soora name of a certain verse.
4. Students could use it as a new method to recite the Qur’an.
Thank YOU
?