26
Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems Engineering, School of Science and Engineering, Waseda University — A Case of Improvements for Quality of Educati

Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

  • View
    218

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

Knowledge Discovery from Questionnaires

Shigeichi Hirasawa

A Short Course at Tamkang University, May 2004

Department of Industrial and

Management Systems Engineering,

School of Science and Engineering,

Waseda University

— A Case of Improvements for Quality of Education —

Page 2: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.2

1. Introduction

Knowledge discovery from the questionnaire consisting of both the items and the texts. (1) An algorithm for simultaneously processing

answers with both the items and the texts and

(2) An algorithm for extracting important sentences from the text-parts of the documents are developed. By combining

(3) Statistical techniques applied to the item-parts, characteristics of each class or cluster are clarified.

(1) An algorithm for simultaneously processing answers with both the items and the texts and

(2) An algorithm for extracting important sentences from the text-parts of the documents are developed. By combining

(3) Statistical techniques applied to the item-parts, characteristics of each class or cluster are clarified. The results obtained in these analyses give

us useful knowledge to manage the object.

Page 3: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.3

2. Questionnaire Analysis Model

Model for the objectModel for the object

Questionnaire designQuestionnaire design

Documents

(3) Statistical techniques Data mining

(3) Statistical techniques Data mining

(1) Classification Clustering

(1) Classification Clustering

(2) Important sentences extraction

(2) Important sentences extraction

Evaluation & verificationEvaluation & verification

Fig. 2.1: Questionnaire analysis model

AnalysesItems Texts

ActionsActions

Page 4: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.4

(1) The set of documents is classified or clustered by the proposed algorithm for classification or clustering.

(2) For the texts only, important sentences, or the parts of them are extracted from the documents by the proposed algorithm for extracting important information.

(3) For the items only, statistical techniques are used to analyze the characteristics of each set of members. If the amount of the data is extremely large, a data mining technique is also used to analyze them.

Analyses phase:

Page 5: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.5

Text Mining:- Information Retrieval including- Clustering- Classification

Information Retrieval ModelBase Model

Set theory(Classical) Boolean ModelFuzzyExtended Boolean Model

Algebraic

(Classical) Vector Space Model (VSM) [BYRN99]Generalized VSMLatent Semantic Indexing (LSI) Model [BYRN99]Probabilistic LSI (PLSI) Model [Hofmann99]Neural Network Model

Probabilistic

(Classical) Probabilistic ModelExtended Probabilistic ModelInference Network ModelBayesian Network Model

Information Retrieval Model

Page 6: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.6

Format Example in paper archives matrix

Fixed format

Items

- The name of authors- The name of journals- The year of publication- The name of publishers

- The name of countries- The year of publication- The citation link

Free format

Text

The text of a paper - Introduction - Preliminaries ……. - Conclusion

DIG 1,0

DTH ,2,1,0

G = [ gmj ]: An item-document matrix

H = [ hij ] : A term-document matrix

dj : The j-th documentti : The i-th termim : The m-th item

gmj : The selected result of the m-th item (im ) in the j-th document (dj )

hij : The frequency of the i-th term (ti ) in the j-th document (dj )

Document

Page 7: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.7

3. Case of Student Questionnaire

Fixed format (multiple choice questions)Free format (text)

Questionnaire

The cycle of class improvement Class modelClass model

Questionnaire designQuestionnaire design

Analysis and verificationAnalysis and verification

Class management and syllabus planningClass management and syllabus planning

Student's satisfaction and score improvementStudent's satisfaction and score improvement

Japan Accreditation Board for Engineering Education (JABEE)

Page 8: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.8

3.1 Class model

Fig.3.1: Class model

Page 9: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.9

3.1 Class model

Table3.1: Contents of topics

Page 10: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.10

3.2 Contents of Questionnaire

Exercise ContentsInitial Questionnaire (IQ) Item type Text type

Midterm Test ( MT )Technical Reports ( TR )Final Test ( FT )Final Questionnaire ( FQ ) Item type Text type

7 questions (4-20 sub-questions each )5 questions (250-300 characters in Japanese each )5 subjects11 times ( each 1-2 subjects )5 questions

6 questions ( 6-21 sub-questions each )5 questions ( 250-300 characters in Japanese each )

Table 3.2 : Data of class

Page 11: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.11

“Introduction to Computer Science”

Compulsory Subject of

2nd Grade of Department of Industrial and Management Systems Engineering, School of Science and Engineering  

Mid April Mid May Mid July

Initial Questionnaire (I!Q)

Midterm Test (MT)

Final Questionnaire(FQ)

Final Test (FT)

Technical Reports (TR)

Page 12: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.12

Exercise Examples (sub questions)

IQ

Item-type

For how many years have you used computers? Do you have a plan to study abroad? Can you assemble a PC? Do you have a qualification related to information technology? Write 10 technical terms in information technology which you know.

Text-type

Write about your knowledge and experience on computer. What kind of work will you have after graduation? What do you imagine from the name of this class subject name?

Table 3.3(a) : Contents of a questionnaire (IQ)

Page 13: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.13

Exercise Examples (sub questions)

FQ

Item-type

Could you understand the contents of this lecture? Was the midterm test difficult? Was it easy to read the handwritings on the white-board? Do you think the contents of this lecture to be useful to yourself? Do you want to finish this course even if it is optional? Which are you interested in applied technology or the fundamentals of computers? Which do you choose class (S) or class (G)?

Text-type

Do you want to be a member of laboratories related to the information technology? In the future, will you get a job in industries related to the information technology? Did your image on computers change after taking this lecture?

Table 3.3(b) : Contents of a questionnaire (FQ)

This questionnaire is made in WEB form, and it is on the following Web Site. http : //hirasa.mgmt.waseda.ac.jp/users/comp-eng/

Page 14: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.14

3.3 Questionnaire analyses

(1) Prediction of scores

(2) Partition of students of class

(a) Partition by the contents of topics

     Class S (specialist): technical and professional topics

     Class G (generalist): wide and shallow technical topics

(b) Partition by the student's level

     Class H: a higher level

     Class L: a lower level

(c) Partition by the class managing method

     Class E: exercise- and practice-based lecture

     Class T: test-based lecture

3.3.1 Verification of class model by IQ

Page 15: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.15

Item-type and Text-type Questionnaire (IQ) → class division

Student's own choice (by curriculum)A clustering by PLSI method (similarity with the typical alumnus)

Table 3.4 : The classification of Class G and Class S

ClusteringA student's own choice

G S Total

G 29 20 49

S 34 28 62

Total 63 48 111

Condition : Only the initial questionnaire is used.

Class G (generalist)Class S (specialist)

Class G or Class S --- The different contents in each class

Page 16: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.16Table 3.5: Characteristics of Class G and Class S

Characteristics xl

Distinction coefficient

al

student's

choice

Interest in the theme of the class Interest in Information TechnologyHow long have you been using the PC?How many years have you had your own PC?A score of the test isn't concerned if a credit can be taken.A half year is enough for this class.This lecture is necessary for the department.I learn eagerly on also an uninterested subject.

   S   G

Clustering

I want attendance taken important.It is enough if I can only use a computer.I want to obtain a qualification in future.How long have you been using e-mail?Was this class necessary for yourself?I want to take a good score in all the subjects.Do you have the part-time job actively?Did you make a report by yourself?I learn eagerly on also an uninterested subject.

Discriminant function z 0≧   d class ∈Gz <0    d class ∈S

pp xaxaxaaz 22110

Discriminant analysis Classifying error rate23.6%

Page 17: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.17

Table 3.6: Prediction error for partition

(a) S or G (b) H or L (c) E or T

0.40 0.40 0.33

Page 18: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.18

3.3.2 Verification of class model by IQ and FQ(1) Scores of students

Table 3.7: Explanation of scores by item-type questionnaire

Contribution ratio =0.742

Explanatory variable xjl

Partial regression coefficient bl

Did you make a report by yourselfHow many years have you used PC? Do you want to study this field after the class?Do you have the part-time job actively?Are you interested in the principle of a computer?Did you feel the lecture difficult?Do you think that attendance and absence should be managed?Do you want to use a personal computer in the class?The degree of satisfaction to contents of this class.Are you a “science-type” ?Is it better for you to have the midterm test? (FQ)Are you interested in club?Do you want to have the midterm test?(IQ)Multiple linear regression analysis

Criterion variable (score) : ),0( 2110 Nxbxbby jppjj

Item-type and Text-type Questionnaire (IQ,FQ) → scores

Page 19: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.19

Table 3.8: Summarized sentences extracted from text-type questionnaire

Score Example of Sentences

High

Over70

- I think that this class is the one for giving interest to a computer.- Since I was interested in the structure of a computer, I wants to participate the lecture eagerly. However, since I have almost no prior knowledge, I am worried in the ability to catch up with a class.- Since I think that deep understanding in the field of information technology cannot be obtained without knowledge of computer structure, I want to learn firmly at this opportunity.

Low

Under69

- I can only use a few functions of computers. For example, Internet, Excel or Word etc.- I cannot effectively use a personal computer. Therefore, I think that I want to know various things related to computers.- I cannot imagine the contents of the class from the name of subject “Introduction to computer science” well. And this subject would be uninterested for me.

This technique is the one which is improved so that not only a text with high importance but contents might be covered.

Automatic sentence extraction method

Item-type and Text-type Questionnaire (IQ,FQ) → score (high or low )

Page 20: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.20

Explanatory variable

Partial regressio

n coefficien

t

t -value

FQ: I want to study this field after the class. FQ: The reports are necessary for every week. FQ: I actively attended the class. FQ: This lecture is necessary for our department. FQ: I had been interested in contents of a lecture. IQ : I want to have a qualification related to information technologies.IQ : Attendance should be taken. IQ : How many days in a week in average do you come to university?IQ : I don’t care if I lost a credit. IQ : I prefer science to literature.

1.51.00.90.80.9

-1.0-0.5-1.30.90.4

5.65.44.24.13.9

-3.6-3.5-3.23.02.8

Contribution ratio = 0.85

Table 3.9(a): Explanation of degree of satisfaction by item-type questionnaire

(2) Degree of satisfaction

Degree of satisfaction of Contents

Page 21: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.21

Explanatory variable

Partial regressio

n coefficien

t

t -valu

e

FQ: The reports are necessary for every week. FQ: The degree of interest of the contents of this class. IQ : I clearly have an object to learn in this class. FQ: I am interested in the contents of this class. IQ : I finished this class even if it is optioned. IQ : This class is sufficient for a half year. IQ : I don’t care even if I lost a credit. IQ : I checked a syllabus. FQ: I finished this class even if it is optioned. FQ: I want to have a qualification related to information technologies. FQ: I can use the most of a PC by this class. FQ: I made the reports by myself.

1.60.2

-1.41.51.31.21.50.60.9

-0.9-0.9-0.8

5.14.1

-3.83.73.33.13.02.72.6

-2.6-2.2-2.0Contribution ratio = 0.60

Table 3.9(b): Explanation of degree of satisfaction by item-type questionnaire

Degree of satisfaction for Management

Page 22: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.22

Explanatory variable

Partial regressio

n coefficien

t

F -ratio

FQ: The degree of interest point of the contents of this lecture. IQ : I have never never assembled a PC. IQ : I am a woman. IQ : How many years have you used your own PC. IQ : For how many years have you used a PC? FQ: The degree of interest in areas related to the information. FQ: This lecture is sufficient with in a half year. FQ: The degree of satisfaction of the contents of this lecture.

11.17.26.76.65.85.44.83.6

-0.2-3.12.1

-0.40.20.1

-0.6-0.2

Table 3.10: Explanation of favorite partition Class G or Class S

(3) Favorite partition

Mis-distinction ratio 21. 1%Class S: specialist Class G: generalist

The reasons why the students choose Class G or S, and Class E or T as their favorite partition (as objective variables) are shown in Table 3.10 and 3.11, respectively.

Page 23: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.23

Explanatory variablePartial

regression coefficient

F -ratio

FQ: You should call a roll every time. IQ : It is more interested in the foundation principle than the application of the computer. IQ : After the lecture, I want to study this field. IQ : It works for the subject as well which it isn't interested in and grapples. FQ: This lecture is a necessary lecture for itself. FQ: The end of a term report subject is better than a term examination. IQ : Web use 歴 .IQ : This lecture is the lecture which is necessary for the subject. FQ: I want to acquire a qualification related to the information.

23.910.910.4

8.57.55.65.25.05.0

1.61.3

-1.41.01.00.7

-0.5-1.0-0.9

Table 3.11: Explanation of favorite partition Class G or Class S

Mis-distinction ratio 11.2%Class E: exercise- and practice-based lectureClass T: test-based lecture

Page 24: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.243.4 Discussions(1) It is difficult to predict the student's final score by only IQ, and there is no expla

nation capability by the linear regression analysis. This can be, however, thought probably to be a natural result.

(2) Although the prediction error rates of the partition problems are 30-40[%] for which only the item-type questionnaire of IQ are used, combining with the important sentences extracted from the text-type questionnaire gives useful information for managing the class.According to the characteristics of each class, we can improve the quality of education.

(3) The student's own choice is insufficient for partition of Class G and Class S.(4) It is possible to explain the student's final score by IQ and FQ.

However, it is difficult to get the suggestion to improve the student's score, although we can get a student's tendency.

(5) It is a little difficult to explain the degree of satisfaction regarding the class management, but easy to explain that regarding the contents of topics by IQ and FQ.

(6) It is possible to explain the favorite partition to the students by IQ and FQ. This suggest us a proper partition to the next year by taking into account causal relations obtained in this year.

Page 25: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.25

(1) It can be concluded that we obtain useful information to improve the class management by student questionnaire with both the item-type and the text-type.

(2) The result shows verification of the class model for "Introduction to computer science".

(3) The degree of satisfaction for the students should be investigated in detail as a future work.

(4) Questionnaire must be carried out to collect data for several years, and their time series analysis and the review of the model also remain as further studies.

3.5 Conclusions and future works

Page 26: Knowledge Discovery from Questionnaires Shigeichi Hirasawa A Short Course at Tamkang University, May 2004 Department of Industrial and Management Systems

No.26

(1) We have shown the effective way for knowledge discovery from questionnaire by combining data mining and text mining techniques.

(2) One of the most remarkable points is to construct a questionnaire with both fixed format and free format and to simultaneously process them.

(3) The effective algorithm to extract the important sentences from questionnaire with free format (text) is also provided.

(4) A model which simply exhibits the real problem and the design of the questionnaire based on this model are important to successfully apply this method to actual problems.

(5) We have applied the method to improve the quality of education. Although there are many problems, we obtain effective information which leads to faculty developments.

(6) The developments of other algorithms such as data mining technique for classification and clustering which should be added to this method is a further research.

4. Concluding Remarks