Dissertation supervised by Prof. Ana Paula Rocha and Prof. António Castro
LIACC
2
3
4
Context
5
6
7
8
•
•
•
9
10
11
12
13
General
• Name
• Founded
• Origin
• Employees
• Sector
Financial
• Stock Price
• Market Capitalisation
• Revenue
Social
• Facebook Fans
• SocialBakers Total Score
• SocialBakers Fan Score
• SocialBakers Content Score
• SocialBakers Engagement Score
• SocialBakers Warning Score
• ReviewCentre Score
• ReviewCentre Reviewers
14
General
• Name
• Founded
• Origin
• Employees
• Sector
Financial
• Stock Price
• Market Capitalisation
• Revenue
Social
• Facebook Fans
• SocialBakers Total Score
• SocialBakers Fan Score
• SocialBakers Content Score
• SocialBakers Engagement Score
• SocialBakers Warning Score
• ReviewCentre Score
• ReviewCentre Reviewers
15
16
17
Automatic Stereotype Extractor
Group Similarity
Index
C4.5 Classification
Tree
Frequency Increase
18
19
20
21
SocialBakers Total Score > 48.9%
C4.5 Employees < 500 000
Stock Price < 144 dollars
SocialBakers Fan Score > 5%
Activity Sector: IT, consumer goods or transportation
GSI SocialBakers attributes
Activity Sector
FIc
22
Classification of new Enterprises
23
24
Faster than clustering
Missing Values
Classification Algorithms
• Multilayer Perceptron
• Sequential minimal optimization
• k-nearest neighbors
• Naive Bayes
• C4.5
• Random Forest
• Radial Basis Function Network
25
Error Rate
Best Algorithm
Missing Values
Dataset
Classifiers Test
Initial Dataset
No Missing Values
Naïve Bayes
4%
Missing Social Values
RBF Network
17%
Missing Financial Values
Naïve Bayes
2%
Test Dataset
No Missing Values
RBF Network
2%
Missing Social Values
Naïve Bayes
34%
Missing Financial Values
RBF Network
3%
26
27
28
29
30
31
32
33
34
A valid process was defined for enterprise
profiling
A method capable of identifying the clusters meaning was created
A software to automate the process was
developed
35
Hierarchical Clustering More Enterprises
More attributes
Questions?