37
Sentiment Analysis and Opinion Extraction of Game Reviews on Steam Fan Ji (u6356164) A report submitted for the course COMP8755 Individual Computing Project Supervised by: Dr. Penny Kyburz The Australian National University October 2019 c Fan Ji (u6356164) 2019

Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

Sentiment Analysis and OpinionExtraction of Game Reviews on

Steam

Fan Ji (u6356164)

A report submitted for the courseCOMP8755 Individual Computing Project

Supervised by: Dr. Penny KyburzThe Australian National University

October 2019c© Fan Ji (u6356164) 2019

Page 2: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

Except where otherwise indicated, this report is my own original work.

Fan Ji (u6356164)25 October 2019

Page 3: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

Acknowledgments

First of all, I am very grateful to my supervisor Dr Penny Kyburz for her carefulguidance of my project in the past 2 semesters, which greatly improved my under-standing of academic writing and taught me a lot of specific research skills. I am alsograteful to my group peers for inspiring discussion during this project.

iii

Page 4: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,
Page 5: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

Abstract

The purpose of this project is building a framework of natural language processingin order to accomplish sentiment analysis task. We have also build a web crawlerto download game reviews ,which are used as corpus for this project, from Steamplatform. The results show that the two algorithms implemented in this project, NaiveBayes and SVM, can effectively perform sentiment analysis on game reviews. Thisreport carried out a series of analysis on the experimental results and proposed someimprovement solutions.

v

Page 6: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

vi

Page 7: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

Contents

Acknowledgments iii

Abstract v

1 Introduction 11.1 Project Scope . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Report Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background and Related Work 32.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Data Collection and Pre-processing 53.1 Steam Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53.3 Data Pre-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Algorithms and Implementation 94.1 Naive Bayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.2 Support Vector Machine . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

5 Result and Discussion 135.1 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135.2 Improvement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

6 Conclusion 176.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

Bibliography 19

Appendix 21

vii

Page 8: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

viii Contents

Page 9: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

List of Figures

4.1 How to find hyperplane . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

ix

Page 10: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

x LIST OF FIGURES

Page 11: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

List of Tables

3.1 Important review attributes . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2 Raw reviews collected from steam platform . . . . . . . . . . . . . . . . . 63.3 Prepared data for training and prediction . . . . . . . . . . . . . . . . . . 6

5.1 Prediction accuracy of different algorithms . . . . . . . . . . . . . . . . . 13

xi

Page 12: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

xii LIST OF TABLES

Page 13: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

Chapter 1

Introduction

In recent years, the video games market has experienced tremendous growth. Peo-ple’s perceptions toward video games are gradually not limited to negative aspectsincluding potential influence of violence, depression, and addiction [Granic et al.,2014]. For many people, video games have become one of most common forms ofentertainment in their daily life. There are more than 2.5 billion gamers around theworld who has spent more than 152.1 billion dollars on games in 2019 [Tom, 2019].In contrast, the global movie box office revenue only amounted to 41.1 billion U.S.dollars in 2018 [Watson, 2019].

Review is an essential part of a game, since not only game developers need toimprove their product by reading feedbacks, but also game players need reviewsto help them make choices concerning consumption. In the past, game reviewswere usually created by professional game connoisseurs or organizations with certainauthority. However, with the booming of internet technology, now individuals canshare their game experience as reviews on the social media or in game communitieswhich tends to generate huge amounts of game reviews on the Internet. Obviously,it is not very feasible for game players to read every review one by one when theyhesitate to buy a game.

The answer is quite straight forward: we need the help of artificial intelligenceto analyze game reviews using natural language processing tools. Natural languageprocessing, called NLP, involving the interactions between human language andcomputer program, can be described as teaching computers to understand human-generated texts and complete specific linguistics tasks. To be specific, under sufficienttraining, computers are capable of identifying sentiment and emotion in the text,summarizing the meaning of documents, translation between different languages,clustering or categorizing large number of documents and more [n.d., 2016]. In thisproject, we would like to focus on sentiment analysis and opinion extraction whichmeans to predict whether a game review is positive or negative.

Sentiment analysis aims to identify and extract subjective message in the targetcontextual material [Cardie, 2014]. It is basic but popular research fields in naturallanguage processing. In almost every business and social field, sentiment analysissystems are implemented because emotions are fundamental to nearly all peoplerelated activities and are essential influencers of our behaviours. Our views and

1

Page 14: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

2 Introduction

interpretations of reality, as well as our decisions, are largely dependent on howothers see and judge the world. That’s why we always seek advice when we need tomake a decision.

1.1 Project Scope

The aim of this project is to apply natural language processing to Steam reviews,practice the process of natural language processing and predict whether the gamecomment text is positive or negative. The foundation of this project is supervisedmachine learning technique and word2vec word embedding approach. First, we willimplement naive Bayes and Support Vector machine algorithms to analyze steamreviews using Python. Only basic standard python libraries will be used in the mainpart of this project, including NumPy, Pandas. Second, analyze the performance ofour algorithms and compare them to some existing authoritative tools in the field ofnatural language processing, for example, BERT and Sklearn. Third, provide somesuggestions and improvement toward sentiment analysis algorithm.

1.2 Motivations

Natural language processing is currently a prevalent sub-field of AI. For this kind ofarea, there must be many quite highly integrated tools. Regarding this project, we didnot aim to use these effective tools to accomplish some certain tasks. Alternatively, weaim to gain insight into natural language processing through some implementationof classic algorithms.

The motivation of this project mainly concluded in two aspects. First, to build aframework of NLP in order to conduct other research in the future. Second, to gaininsight into the principles and formulas of various algorithms which tend to help uslearn how computers understand human languages.

1.3 Report Outline

The general idea of this project is presented in the introduction section. Followed byChapter 2 Background and Related Work, in which some research on game reviewsand essential knowledge for this report were provided. Then, we state the data sourceand preprocessing approach of data in Chapter 3. Chapter 4 discussed the algorithmswe have used in this project. We next present the results and analysis of this projectin Chapter 5. Finally, we conclude what we have done and the plan of the futurework in the last chapter.

Page 15: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

Chapter 2

Background and Related Work

Section 2.1 gives background material necessary in order to read this report ,andrelated work is given in Section 2.2.

2.1 Background

This project involves two basic concepts, natural language processing and sentimentanalysis. Natural language processing, called NLP, is a technique that computer canbe used to understand human languages to perform certain tasks [Chowdhury, 2003].NLP project usually has five basic steps in the workflow, including text-preprocess,text parsing, text representation, modeling and deployment/evaluation. NLP tasksare often based on databases with a significant amount of text. We need to train themodel based on the data in the database and then complete the desired tasks.

Text information can be divided into two categories, facts and opinions. The factis a specific description of a certain thing, and the point of view reflects the subjectivesentiment of people. The purpose of sentiment analysis is to mine the subjectivetendency of people in text information [Liu et al., 2010].

2.2 Related work

We have read through some articles related to sentiment analysis and game reviews.Strååt and Verhagen [2017] provides a method to collect and analyze consumer atti-tudes towards video games. In this paper, the features derived from a data set arebased on the frequency of words. The certain number of most frequent words canbe seen as features. Trneny [2017] explores how to predict whether a game will besuccessful. And through machine learning approach, analyses the relationship be-tween reviews of certain game and its success or not. Dasgupta and Sengupta [2016]points out that in the Internet era, it is not necessary to spend a lot of time and moneyon market research. Merchants can directly get people’s sentiment and opinion ona product through comments on social networks. They use the Samsung GalaxyS3 as an example and present some correlation analysis through natural language

3

Page 16: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

4 Background and Related Work

processing. Gifford [2013] analyze what features of game reviews have compared toother types of reviews. For example, video game reviews pay more attention to theprice of the work itself.

Page 17: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

Chapter 3

Data Collection and Pre-processing

This chapter introduces how we collect and pre-process the data from Steam platformand discuss the advantages and disadvantages of using Steam reviews as data source.

3.1 Steam Reviews

In the process of collecting data, we can find that the steam platform has the followingadvantages as a data source. First, there are massive game reviews on steam platform.We have collected more than 1.2 million reviews but these reviews only account forless than 10 percent of all review on steam platform. Second, in addition to thetext information, each review includes considerable number of other attributes. Lastbut not least, Steam is so friendly to web scrawler that we can downloads almosteverything required. However, steam reviews are less formal than newspaper articlesor academic reports which tend to have potential detrimental influence on this project.Specifically, abbreviations and grammatical errors are more common in steam reviewcontext.

3.2 Data Collection

Through Steam official API, we have learnt that there are 30404 applications on thesteam platform. Because we need to split reviews into different datasets accordingto which application it belongs to, the applications with more than 100 reviews arequalified for this project. Since the number of qualified applications is not clear beforewe finished collection, we have created a web crawler to detect how many reviewseach application has and write qualified application id down in a CSV file. Then weuse the application id as input for another web crawler to download and indexesreviews from steam websites.

For each game, all of its reviews were stored in one JSON file as a dataset. Foreach review, it contains 4 attributes, which are id, title, time, and comment. Table 3.1describe the detailed information of these attributes and Table 3.2 shows a couple ofexample reviews we have used in this project.

5

Page 18: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

6 Data Collection and Pre-processing

Table 3.1: Important review attributes

Attribute description

id the id of application which this review belongs to

comment the context of this review

title ’Recommended’ if this comment is positive, otherwise, ’Not Recommended’

hour how many hours this reviewer spend in this game

Table 3.2: Raw reviews collected from steam platform

index comment time title id

0 Just Cause 2 and 3 were as... 56.7 Not Recommended reviews 517630

1 Plus Points:+ Absolutely stunning... 9.9 Not Recommended reviews 517630

2 Feels like they’ve lost touch.... 11.5 Not Recommended reviews 517630

3 HOLD ON THERE STRANGER, SCRO ... 7.5 Not Recommended reviews 517630

4 Before you buy this game, you... 12.4 Not Recommended reviews 517630

Table 3.3: Prepared data for training and prediction

token comment

[caus, perfect, exampl, great, game, could, fi.. Just Cause 2 and 3 were as...

[plus, point, absolut, stun, map, best, world,. Plus Points:+ Absolutely stunning...

[feel, like, lost, touch, fan, seem, understan... Feels like they’ve lost touch....

[hold, stranger, scroll, review, hear, word, l.. HOLD ON THERE STRANGER, SCRO ...

[buy, game, realli, listen, negat, review, ign... Before you buy this game, you...

Page 19: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

§3.3 Data Pre-Processing 7

3.3 Data Pre-Processing

It is necessary to preprocess the review content with the purpose of analysis. Wehave to remove some less meaningful words and identify some words with the sameroot form, which can be called as stop words removing’ and ’stemming’ respectively.Table 3.3 shows the prepared data for training and prediction. Stop words are usuallyrefer to very commonly used words, such as "in", "at", "the", "an". Even if these wordsare ignored, the text information will not be changed or misconstrues. Therefore,we can remove all of the stop words. Stemming is the method of reducing inflectedwords to their word stem, base or root form in linguistic morphology and informationretrieval - usually a written word form. The stem does not need to be compatiblewith the word’s morphological origin [Kashyap et al., 2017]; it is generally enough tomap related words to the same stem, even though this stem is not a true root in itself[Lovins, 1968].

Page 20: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

8 Data Collection and Pre-processing

Page 21: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

Chapter 4

Algorithms and Implementation

Last chapter we described out process to collect and pre-process the steam reviewsdata needed in this project. The comment is converted from text to a sequenceof tokens which could be directly handled by a computer program. This chapterfocuses on two machine learning algorithms. How the prediction approaches worksis presented, with theoretical basis and formula derivation.

4.1 Naive Bayes

Naive Bayes is a straight forward probabilistic approach, as it assumes that thefeatures, also described as tokens in this project, are independent of each other whenthey go into the model. Specifically, the sentiment of a review is determined by theproduct of probability of each single token. The probability

P (Y|t1t2...tn) = P (Y|t1) (Y|t2) · · · ·P (Y|tn) (4.1)

where Y is the sentiment type and ti represents token i in the sequence. Therefore,if we can calculate the probability of a certain sentiment given each token ti, we couldknow the probability of a review’s sentiment given a sequence of tokens. Accordingto conditional probability formula:

P (t|Y) = P (t ∩Y)P (Y)

(4.2)

P (Y|t) = P (X ∩Y)P (t)

=P (t|Y) · P (Y)

P (t)(4.3)

The Bayes rule aims to find P (Y|ti) from P (ti|Y), P (Y) and P (ti), known from thetraining dataset. However, there are some practical implementation issues. First, thereare no less than two categories for sentiment, we should compute each category’sprobability and choose the one with highest probability. Second, due to the accuracyof the programming language, we can take the logarithm of the probability to calcu-late. Third, there must be some tokens in the test dataset that have never appearedin the training dataset. We should smooth its probability by Laplace Correction from

9

Page 22: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

10 Algorithms and Implementation

Figure 4.1: How to find hyperplane

0 to a very small value to avoid the case where the entire formula is 0 due to anunknown token.

4.2 Support Vector Machine

Support-vector machines (SVM)are supervised learning models applied in classifi-cation problems. Formally, SVM build a hyperplane in a high-dimensional space,which divides all points in this space into two parts. In this project, we convert eachcomment into a vector in high-dimensional space by a certain embedding technique,and then calculate the hyperplane using the SVM model and the train dataset. Withthe help of this hyperplane, we can predict the sentiment of each review in the testdataset. Figure 4.1 describe how hyperplane works, given two sets of blue points andred points. However, it is obvious that there are more than one hyperplanes beingvalid. We need to find the one with the biggest margin.

The hyperplane (4.4) and dataset (4.5) can be define like that:

−→ω · x +−→b = 0 (4.4)

T = {(x1, y1) , (x2, y2) , ....., (xn, yn)} (4.5)

Therefore, for a sample i in the dataset, its geometric spacing of the hyperplanecan be expressed as:

γi = yi ·( −→ω∥∥−→ω ∥∥ · xi +

−→b∥∥−→ω ∥∥)

(4.6)

Assuming that γ = minγi,we turn this problem into an optimisation problemwith the following constraints:

max γ s.t. yi ·( −→ω∥∥−→ω ∥∥ · xi +

−→b∥∥−→ω ∥∥)≥ γ, i = 1, 2 · · · ··, n (4.7)

Page 23: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

§4.2 Support Vector Machine 11

According to Berwick [2003], we finally turn the problem into:

minα

12

N

∑i=1

N

∑j=1

αiαjyiyj(xi · xj)−N

∑i=1

αi s.t.N

∑i=1

αiyi = 0, and 0 ≤ α ≤ C (4.8)

Therefore,

ω∗ =N

∑i=1

α∗i yixi b∗ = yj −N

∑i=1

α∗i yi(xi · xj) (4.9)

Finally, we know the hyperplane equation. Once get the equations in (4.9), we canpredict the review’s sentiment in the test dataset.

Page 24: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

12 Algorithms and Implementation

Page 25: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

Chapter 5

Result and Discussion

This chapter shows the result of sentiment analysis model we have implemented inthe previous chapters. This chapter also compares the difference between algorithmsand analyses some feasible solutions to improve the performance of the model.

5.1 Result

We select four games to assess the algorithms implementation, comparing with stan-dard machine learning library scikit-learn in Python. In order for us to find thesegames on the database and on the Steam, we have listed the basic information ofthese games. These four games are:

- ID 51730: Just Cause 4. It has 4050 reviews, while 42% of them are positive.

- ID 1106850: Totally Reliable Delivery Service Beta. It has 1004 reviews, while90% of them are positive.

- ID 17480: Command & Conquer: Red Alert 3. It has 2167 reviews, while 80%of them are positive.

- ID 275850: No Man’s Sky. It has 81979 reviews, while 53% of them are positive.

Table 5.1: Prediction accuracy of different algorithms

ID Naive Bayes NB in sktlearn Support Vector Machine SVM in sktlearn

517630 77.4% 83.8% 76.8% 83.1%

1106850 94.4% 93.2% 93.9% 93.4%

17480 84.1% 86.2% 86.0% 87.9%

275850 70.1% 83.1% 77.5% 80.2%

The experiment results are presented in Table 5.1. The results suggest that ourimplementation is reasonably effective, but there still have opportunities for furtherenhancements. At the same time, we can clearly found that if most reviews are posi-tive in a certain data set, the prediction results would be very accurate. Conversely, if

13

Page 26: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

14 Result and Discussion

all reviews are half positive, half negative, it will be more difficult for a computer toanalyze the sentiment.

In fact, for all machine learning problems, the model learning (training process)can be considered from two perspectives. One is to regard the model as a predictionfunction. The training process is to minimise the loss function to obtain the corre-sponding parameter w. The prediction process is to input the eigenvalue x of the newsample and output the label value y. The other is to regard the model as a probabilitydensity function, which can represent the distribution of data. The training processis the process of calculating the probability distribution of eigenvalue x while theprediction process is to obtain the conditional probability P(Y =?|x).

Naive Bayes is the second, using Bayesian law to solve this conditional probability.All we need to do is extracting certain number of features(consisting eigenvalue x),calculating prior probability P(y) and P(x|y), and then obtain the posterior P(y|x).This algorithms is simple but effective. Its computational complexity grows linearlyas the number of samples or the number of features grows which makes whichmakes the speed of training and prediction very fast. When the number of features isconstant, even if there are more than 1,000,000 reviews in the database, Naive Bayesstill works well. However, Naive Bayes is based on the fact that each feature doesnot affect each other independently. It cannot learn the interaction between featureswhich might greatly weaken the performance of Naive Bayes. For example, whenwe say "This game is not bad", it shows our positive opinion towards this game. Butcomputers only learn in this sentence there are features "not" and "bad", which havehigh probability of making this review as a negative one.

Support Vector Machine is the first model. Once we get the function of supportvector in the training set, the value of y of samples in predicting set can be directlycalculated, and the positive and negative of y shows the label of this sample. In theprocess of implementation, we can easily find that the computational complexity ofSVM increases exponentially as the number of samples grows. This limits the SVMto be able to run only in small-scale data, but in a high-dimensional feature space.Compare to NB, by the way of converting a sample into high-dimensional vector,SVM includes the interaction between features to some extent. However, when weencounter a situation that the data set is not linearly separable, we need to find asuitable kernel function to replace the linear kernel function. Although other kernelfunctions work well, it is difficult to calculate parameters through theories. We needto keep trying different parameters and finally approaching to the best ones throughcontinuous testing.

5.2 Improvement

Through the results and discussion of the above experiments, we proposed somesolutions that can improve the performance of the model. First, regarding we havediscussed in 5.1, the interaction between tokens could be overlooked as each featureconsists of only one token, when extracting features, we can select some features

Page 27: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

§5.2 Improvement 15

consisting of two adjacent tokens. Second, it is a good idea to treat every token un-equally according to their parts-of-speech.The degree of influence of different tokenson sentiment analysis is definitely different[Benamara et al., 2007]. Adjectives andadverbs are often subjective and can directly indicate whether a review is positive ornegative. Nouns tend to be more objective and indirectly express opinions. Third,when we tested different data sets, it is difficult to find the optimal number of ex-tracted features. We usually need to constantly try different numbers to approachthe best value. If we can use pre-trained model and constant parameter to extractfeatures from reviews like BERT model[Devlin et al., 2018], the performance could bebetter.

Page 28: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

16 Result and Discussion

Page 29: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

Chapter 6

Conclusion

In conclusion, we have built a valid natural language model based on supervisedmachine learning algorithms. This model could analyze the sentiment of given gamereviews on Steam. We also performed a number of experiments to compare differentalgorithms. Lastly, we proposed three feasible solution to improve the performance ofthe model, including bi-gram feature extraction, parts of speech labelling and usingstate of art pre-trained model to convert reviews content into tokens.

6.1 Future Work

In this project, we have built a framework of natural language processing. For anytask, all we have to do is to pre-process the text, extract the features, select the features,and then classify them by a certain algorithm. But unlike images, the characteristicsof texts are often very complex and irregular. To understand language, it requiresindividuals not only have a large amount of background knowledge accumulation,but also the context of language.

In the future work, on the one hand I want to compare and analyze more differentmachine learning algorithms. On the other hand, I hope to study how to train com-puters with massive amounts of text data so that they can learn how to automaticallyextract the features of reviews content like humans.

17

Page 30: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

18 Conclusion

Page 31: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

Bibliography

Benamara, F.; Cesarano, C.; Picariello, A.; Recupero, D. R.; and Subrahmanian,V. S., 2007. Sentiment analysis: Adjectives and adverbs are better than adjectivesalone. In ICWSM, 1–7. Citeseer. (cited on page 15)

Berwick, R., 2003. An idiotâAZs guide to support vector machines (svms). Retrievedon October, 21 (2003), 2011. (cited on page 11)

Cardie, C., 2014. Sentiment analysis and opinion mining bing liu (university ofillinois at chicago) morgan & claypool (synthesis lectures on human languagetechnologies, edited by graeme hirst, 5 (1)), 2012, 167 pp; paperbound, isbn 978-1-60845-884-4. (cited on page 1)

Chowdhury, G. G., 2003. Natural language processing. Annual review of informationscience and technology, 37, 1 (2003), 51–89. (cited on page 3)

Dasgupta, S. and Sengupta, K., 2016. Analyzing consumer reviews with text miningapproach: A case study on samsung galaxy s3. Paradigm, 20, 1 (2016), 56–68. (citedon page 3)

Devlin, J.; Chang, M.-W.; Lee, K.; and Toutanova, K., 2018. Bert: Pre-trainingof deep bidirectional transformers for language understanding. arXiv preprintarXiv:1810.04805, (2018). (cited on page 15)

Gifford, B., 2013. Reviewing the critics: Examining popular video game reviews througha comparative content analysis. Ph.D. thesis, Cleveland State University. (cited onpage 4)

Google, 2019. Google reseach bert. https://github.com/google-research/bert. (cited onpage 21)

Granic, I.; Lobel, A.; and Engels, R. C., 2014. The benefits of playing video games.American psychologist, 69, 1 (2014), 66. (cited on page 1)

Kashyap, N.; Choudhury, T.; Mehta, I. S.; and Srivastava, A. V., 2017. Humorgrounded taxonomy of music by investigating expressive statistics using audiofeature extraction. In 2017 International Conference on Computing, Communication andAutomation (ICCCA), 189–195. IEEE. (cited on page 7)

Liu, B. et al., 2010. Sentiment analysis and subjectivity. Handbook of natural languageprocessing, 2, 2010 (2010), 627–666. (cited on page 3)

19

Page 32: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

20 Bibliography

Lovins, J. B., 1968. Development of a stemming algorithm. Mech. Translat. & Comp.Linguistics, 11, 1-2 (1968), 22–31. (cited on page 7)

n.d., 2016. Natural language processing application. https://expertsystem.com/natural-language-processing-applications/. Accessed September 14, 2019. (cited onpage 1)

Strååt, B. and Verhagen, H., 2017. Using user created game reviews for sentimentanalysis: A method for researching user attitudes. In GHITALY@ CHItaly. (citedon page 3)

Tom, W., 2019. The global games market will generate $152.1 billion in 2019 asthe u.s. overtakes china as the biggest market. https://newzoo.com/insights/articles/the-global-games-market-will-generate-152-1-billion-in-2019-as-the-u-s-overtakes-china\-as-the-biggest-market/. Accessed October 1, 2019. (cited on page 1)

Trneny, M., 2017. Machine learning for predicting success of video games. (2017).(cited on page 3)

Watson, A., 2019. Global box office revenue from 2005 to 2018 (in billion u.s. dol-lars). https://www.statista.com/statistics/271856/global-box-office-revenue/. AccessedOctober 4, 2019. (cited on page 1)

Page 33: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,

Appendix

Appendix 1: Project Description

This project will involve developing and applying text analysis and/or topic miningalgorithms to video game reviews. The student will read existing papers and researchexisting approaches to text analysis, combine these methods and theory with gamereviews on Steam, and develop a program to analyze web-based game reviews. Tobe specific, this project needs to grab game reviews from Steam and classify them ina certain way first. Then use a text-based analysis method to evaluate whether dif-ferent games are worth recommending. The project will involve reviewing literature,developing a prototype, performing tests, and reporting on the results.

Appendix 2: Independent Study Contract

The contract for this project is presented at the end of this report.

Appendix 3: Description of Artefacts Produced.

Artefacts run on Jupyter Notebook and Python 3, consisting of three main parts.

• Web crawler and reviews data sets. It contains the programs for downloadingSteam reviews. All reviews is stored in the Reivew folder as json file.

• main.ipynb and other Python files. The main program of this project and itssupport library. This part of the code is done by myself.

• The folder Bert contains google Tensor-Flow code and pre-trained model Bert,retrievedfrom Google [2019]. This part of the body code is done by Google. In this projectwe just provide the special task of sentiment analysis and data set, in order tohelp us find how to improve the model built before.

Appendix 4: ReadMe

The ReadMe file for the artefacts is presented at the end of this report.

21

Page 34: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,
Page 35: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,
Page 36: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,
Page 37: Sentiment Analysis and Opinion Extraction of Game Reviews ... · training, computers are capable of identifying sentiment and emotion in the text, summarizing the meaning of documents,