DisenQNet: Disentangled Representation Learning for

DisenQNet: Disentangled Representation Learning for Educational Questions

Zhenya Huang, Xin Lin, Hao Wang, Qi Liu, Enhong Chen*, Jianhui Ma, Yu Su, Wei Tong

Unversity of Science and Technology of China, iFLYTEK Co., Ltd

The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'2021)

Outline

¨ Introduction

¨ Preliminary

¨ Our Method

¨ Experiment

¨ Conclusion

2

Introduction

¨ Online learning systemso Collect millions of learning materials n Course, question, test, etc

o Provide intelligent services to improve learning experiencen Students select suitable questions or courses to acquire knowledgen Systems provide personalized recommendations

3

Question Course

Introduction

¨ Real world challenges with millions of learning materialso How to organize, search and recommend questions?o How to promote question-based applications?n Search questions to find similar onesn recommend questions with property difficulty (for students)n ……

¨ Fundamental topic in AI educationo Question understanding (automatic)o Goal: learning informative representations of question

4

Related work

¨ Traditional NLP work (earlier)o Lexical analysis or Semantic analysisn Design fine-grained rules or grammars

o Representation: explicit trees or templates¨ NLP based worko End-to-end frameworksn Understand question contentn Learn from application tasks, e.g., difficulty estimation, similarity search

o Representation: latent semantic vector ¨ Recent Pre-training worko Pre-training with large question corpusn Enhance question semantics learning

o Representation: latent semantic vector

5

Related work—Limitations

¨ Supervised mannero Requiring sufficient labeled data n E.g., question difficulty, question pair similarity

o Scarcity of labels with high qualityn E.g., difficulty is being examined in standard tests (GRE)

6

Label


¨ Task-dependent representationo Different models for same questions in different application taskso Poor transferability across tasks

7

Model A

Model B

Model C

Task A

Task B

Task C

Question

Transfer


¨ One unified vector representationo All the information are integrated togethero Question with same concept are quite differentn Conceptn Personal properties (difficulty, semantics)

8

differentquestions

same concept

Introduction

¨ Ideal question representation modelo Get rid of labels in specific tasksn Try to learn information of question on their own

o Distinguish the different characteristics of questionsn Reduce noisen Explicit way to get good interpretability

o Question representations should be flexiblen Can be applied in different downstream tasks n Improve the applications in online learning systems

9

Our work—main idea

¨ Disentangled representationo Disentangle question information into two representationsn Concept representation n Individual representation

o Concept representation n high dependency to concept information (knowledge)

o Individual representation n high dependency to individual information (difficulty, semantics, et al.)

o Two representations with high independency to each othern Contain no information from each other

10

Outline

¨ Introduction

¨ Preliminary

¨ Our Method

¨ Experiment

¨ Conclusion

11

Problem definition

¨ Unsupervised question representation Learningo Given: ! = {$%, $', … , $)}，$ = {+%, +', … , +,} with - ∈ /o Goal: disentangled question representationn Concept representation 01 ∈ 23n Individual representation 04 ∈ 23

¨ Question-based supervised taskso Given ! = !5 ∪ !7 and !5 ≪ |!7|n Labeled !5 = $%, $', … , $5 with {:%, :', … , :5}n Unlabeled !7 = {$%, $', … , $7}

o Goal: predict properties of unknown questionsn e.g., difficulty of one questionn e.g., similarity of question pair

12

Outline

¨ Introduction

¨ Preliminary

¨ Our Method

¨ Experiment

¨ Conclusion

13

DisenQNet: glance

¨ Disentangled Question Network (DisenQNet)o Unsupervised model without labelso Question encodern Learn to disentangle one question into two ideal representations

o Self-supervised optimization n Three information estimators

14

Model architecture Model optimization

DisenQNet

¨ Question Encodero Learn to disentangle one question into two ideal representationsn Concept representation !"n Individual representation !#

o Key: they focus on different contentn e.g., concept: “function”, individual: “f(-1)=2”

15

Integrated semantics!$ = &'({!*: !* ∈ -})e.g., TextCNN, CNN, LSTM.

Disentangled semanticsAttention Network0$ = !", !#!" = Σ3456 73!373 = Softmax MLP !3, !$

DisenQNet

¨ How to optimize? — Self-supervised optimizationo Three estimators to measure the information dependency

16

MI Estimator!" = $%, $' contains all information of one questionØ maximize MI between () and each word *+ ∈ -

DisenQNet


17

Concept Estimator!" containsthegivenconceptmeaning explicitlyØ Multi-labelconceptclassificationtask:predictconcepts

DisenQNet


18

Disentangling EstimatorØ Keep%& and%* independent:%& mustnotcontaintheinformationby%*Ø Minimizethemutualinformationbetween%& and%* (cannotdirectlylearn)Ø Method:WGAN-likeadversarial training

Minimize WassersteindistancebetweenF %&, %* andF(%&) ⊗ F(%*)=>F %& ⊗ F(%*) ≈ F(%&, %*) => Independent %& and %*

DisenQNet+

¨ DisenQNet+ — Question-based supervised tasks o Transfer !" from DisenQNet to improve different applicationsn e.g., difficulty estimation, similarity search

o Key: individual !" focus more on unique information

19

Traditionally: end-to-end method Ø may suffer from overfitting due to insufficient data

Our improve: force !" from DisenQNet to task model via mutual information maximization

Outline

¨ Introduction

¨ Preliminary

¨ Our Method

¨ Experiment

¨ Conclusion

20

Experiment

¨ Dataseto System1: high school level questionso System2: middle school level questionsn Concepts: “Function”, “Triangle”, “Set”, etc

o Math23K: elementary school level questionsn Concepts (five operations): +, −, ×, ÷, ∧

21

Experiment

¨ DisenQNet Evaluation (!" and !#)o Task: Concept Prediction Performanceo Baseline: Text model, NLP pre-trained models, question pre-trained model

22

Disentangled representation learning is necessaryØ DisenQNet-!" is well predicted: !" capture the concept information of questionsØ DisenQNet-!# fails to predict concepts: !# removes the concept information

Experiment

¨ DisenQNet Visualization (!" and !#)23

Ø !" are easier to be grouped by concepts

Ø !# are scattered

Ø !" is more related to concept words (“Odd function”, “solution set”, “inequality”)Ø !# focuses more on mathematical expressions (“f (-1) = 2” )

Experiment¨ DisenQNet+ evaluation

24

Ø Disentangled learning is better than integrated learningØ !" improves the application performance (best)

Ø It can preserve personal information of questionsØ It has good ability to be transferred across different tasks

Similarity ranking

Difficulty ranking

Two tasks

Outline

¨ Introduction

¨ Preliminary

¨ Our Method

¨ Experiment

¨ Conclusion

25

Conclusion

¨ Summaryo Disentangled representation learning for educational questionso Unsupervised DisenQNetn Distinguish concept and individual information of questionsn Good interpretability

o Semi-supervised DisenQNet+n Improve the performance of different tasksn Good transferability

¨ Future worko More sophisticated models for disentanglement implementationo Heterogeneous questions, e.g., geometry o Deeper knowledge transferring

26

Thanks!

27

The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD'2021)

[email protected]@mail.ustc.edu.cn

Documents

DisenQNet: Disentangled Representation Learning for