LABDA at TASS-2018 Task 3: Convolutional Neural Networks

LABDA at TASS-2018 Task 3: Convolutional NeuralNetworks for Relation Classification in Spanish

eHealth documents

LABDA en TASS-2018 Task 3: Redes NeuronalesConvolucionales para Clasificacion de Relaciones en

documentos electronicos de salud en espanol

Vıctor Suarez-Paniagua1, Isabel Segura-Bedmar2, Paloma Martınez3

Computer Science DepartmentCarlos III University of MadridLeganes 28911, Madrid, Spain

1vspaniag,2isegura,[email protected]

Resumen: Este trabajo presenta la participacion del equipo LABDA en la subta-rea de clasificacion de relaciones entre dos entidades identificadas en documentoselectronicos de salud (eHealth) escritos en espanol. Usamos una Red Neuronal Con-volucional con el word embedding y el position embedding de cada palabra paraclasificar el tipo de la relacion entre dos entidades de la oracion. Anteriormente, estemetodo de aprendizaje automatico ya ha mostrado buen rendimiento para capturarlas caracterısticas relevantes en documentos electronicos de salud los cuales descri-ben relaciones. Nuestra arquitectura obtuvo una F1 de 44.44 % en el escenario 3 de latarea, llamada como Setting semantic relationships. Solo cinco equipos presentaronresultados para la subtarea. Nuestro sistema alcanzo el segundo F1 mas alto, siendomuy similar al resultado mas alto (micro F1=44.8 %) y mas alto que el resto de losequipos. Una de las principales ventajas de nuestra aproximacion es que no requiereningun recurso de conocimiento externo como caracterısticas.Palabras clave: Extraccion de relaciones, aprendizaje profundo, redes neuronalesconvolucionales, textos biomedicos

Abstract: This work presents the participation of the LABDA team at the subtaskof classification of relationships between two identified entities in electronic health(eHealth) documents written in Spanish. We used a Convolutional Neural Network(CNN) with the word embedding and the position embedding of each word to clas-sify the type of the relation between two entities in the sentence. Previously, thismachine learning method has already showed good performance for capturing therelevant features in electronic health documents which describe relationships. Ourarchitecture obtained an F1 of 44.44 % in the scenario 3 of the shared task, namedas Setting semantic relationships. Only five teams submitted results for this subtask.Our system achieved the second highest F1, being very similiar to the top score (mi-cro F1=44.8 %) and higher than the remainig teams. One of the main advantage ofour approach is that it does not require any external knowledge resource as features.Keywords: Relation Classification, Deep Learning, Convolutional Neural Network,biomedical texts

1 Introduction

Nowadays, there is a high increase in thepublication of scientific articles every year,which demonstrates that we are living in anemerging knowledge era. This explosion ofinformation makes it nearly impossible fordoctors and biomedical researchers to keepup to date with the literature in their fields.The development of automatic systems to ex-

tract and analyse information from electronichealth (eHealth) documents can significantlyreduce the workload of doctors.

The TASS workshop proposes sharedtasks on sentiment analysis in Spanish eachyear. Concretely, the goal of TASS-2018 Task3 (Martınez-Camara et al., 2018) is to createa competition where Natural Language Pro-cessing (NLP) experts can train their sys-

TASS 2018: Workshop on Semantic Analysis at SEPLN, septiembre 2018, págs. 71-76

ISSN 1613-0073 Copyright © 2018 by the paper's authors. Copying permitted for private and academic purposes.

tems for extracting the relevant informationin Spanish eHealth documents and evaluatethem in a objective and fair way.

Recently, Deep Learning has had a big im-pact on NLP tasks becoming the state-of-the-art technique. Convolutional Neural Network(CNN) is a Deep Learning architecture whichhas shown good performance in ComputerVision task such as image classification (Kriz-hevsky, Sutskever, y Hinton, 2012) and facerecognition (Lawrence et al., 1997).

The system described in (Kim, 2014) wasthe first work to use a CNN for a NLP task.It created a vector representation for eachsentence by extracting the relevant informa-tion with different filters in order to clas-sify them into predefined categories obtai-ning good results. In addition, CNN was usedwith good performance for relation classifica-tion between nominals in the work of (Zenget al., 2014). Furthermore, this architectu-re has been also used in the biomedical do-main for the extraction of drug-drug inter-actions in (Suarez-Paniagua, Segura-Bedmar,y Martınez, 2017a). This system did notrequire any external biomedical knowledgein order to provide very close results tothose obtained using lots of hand-craftedfeatures. We also employed the same ap-proach of (Suarez-Paniagua, Segura-Bedmar,y Martınez, 2017b), which was used for ex-tracting relationships between keyphrases inthe Semeval-2017 Task 10: ScienceIE (Au-genstein et al., 2017), which proposed verysimilar subtasks than those defined in TASS-2018 Task 3.

In this work, we describe the participationof the LABDA at the subtask C in the classifi-cation of relationships between two identifiedentities in Spanish documents about health.In this subtask, the test dataset includes thetext, the boundaries and the types of theirentities to generate the prediction.

2 Dataset

The task provides an annotated corpus fromMedlinePlus documents which is divided intothe training set for the learning step, develop-ment set for the validation and test set for theevaluation of the systems.

The relationship between entities definedas concepts are: is-a, part-of, property-of andsame-as. There are also relationships definedas roles: subject and target. The training setcontains 559 sentences with 3,276 entities,

1,012 relations and 1,385 roles, the develop-ment set contains another 285 sentences. Thedataset contains 3,276 entities and 1,012 re-lations and 1,385 roles in the train set, thedevelopment set contains 285 sentences. Adetailed description of the method used tocollect and process documents can be foundin (Martınez-Camara et al., 2018).

Unlike the other two previous subtasks,the documents include annotated entitieswith boundaries and types. In this way, itis possible to measure and compare the diffe-rent approaches only focusing on the goal ofthe subtask C.

2.1 Pre-processing phase

As some of the relationships types are asym-metrical, for each pair of entities markedin the sentence, we generate two instances.Thus, a sentence with n entities will have(n − 1) × n instances. Each instance is la-belled with one of the six classes is-a, part-of, property-of, same-as, subject and target.In addition, a None class is also consideredfor the non-relationship between the entities.Due to the fact that there are some over-lapped entities, we consider each sentence asa graph where the vertices are the entitiesand the edges are the non-overlapped entitieswith itself in order to obtain recursively allthe possible paths without overlapping, thuswe have different instances for each overlap-ped entities. Table 2 shows the resulting num-ber of instances for each class on the train,validation and test sets.

Label Train Validation Test

is-a 238 299 41part-of 222 171 36property-of 600 366 84same-as 42 19 8subject 1018 636 206target 1510 988 308None 27112 20631 5265

Tabla 2: Number of instances for each rela-tionship type in each dataset: train, valida-tion and test.

After that, we tokenize and clean thesentences following a similar approach asthat described in (Kim, 2014), convertingthe numbers to a common name, words tolower-case, replacing special Spanish accentsto Unicode, e.g n to n, and separating spe-cial characters with white spaces by regular

Víctor Suárez-Paniagua, Isabel Segura-Bedmar y Paloma Martínez

72

Relationship between entities Instances after entity blinding Label

(ataque de asma → produce) ’un entity1 se entity2 cuando los entity0 entity0 .’ None(ataque de asma ← produce) ’un entity2 se entity1 cuando los entity0 entity0 .’ target(ataque de asma → sıntomas) ’un entity1 se entity0 cuando los entity2 entity0 .’ None(ataque de asma ← sıntomas) ’un entity2 se entity0 cuando los entity1 entity0 .’ None(ataque de asma → empeoran) ’un entity1 se entity0 cuando los entity0 entity2 .’ None(ataque de asma ← empeoran) ’un entity2 se entity0 cuando los entity0 entity1 .’ None(produce → sıntomas) ’un entity0 se entity1 cuando los entity2 entity0 .’ None(produce ← sıntomas) ’un entity0 se entity2 cuando los entity1 entity0 .’ None(produce → empeoran) ’un entity0 se entity1 cuando los entity0 entity2 .’ subject(produce ← empeoran) ’un entity0 se entity2 cuando los entity0 entity1 .’ None(sıntomas → empeoran) ’un entity0 se entity0 cuando los entity1 entity2 .’ None(sıntomas ← empeoran) ’un entity0 se entity0 cuando los entity2 entity1 .’ target(asma → produce) ’un ataque de entity1 se entity2 cuando los entity0 entity0 .’ None(asma ← produce) ’un ataque de entity2 se entity1 cuando los entity0 entity0 .’ None(asma → sıntomas) ’un ataque de entity1 se entity0 cuando los entity2 entity0 .’ None(asma ← sıntomas) ’un ataque de entity2 se entity0 cuando los entity1 entity0 .’ None(asma → empeoran) ’un ataque de entity1 se entity0 cuando los entity0 entity2 .’ None(asma ← empeoran) ’un ataque de entity2 se entity0 cuando los entity0 entity1 .’ None(produce → sıntomas) ’un ataque de entity0 se entity1 cuando los entity2 entity0 .’ None(produce ← sıntomas) ’un ataque de entity0 se entity2 cuando los entity1 entity0 .’ None(produce → empeoran) ’un ataque de entity0 se entity1 cuando los entity0 entity2 .’ subject(produce ← empeoran) ’un ataque de entity0 se entity2 cuando los entity0 entity1 .’ None(sıntomas → empeoran) ’un ataque de entity0 se entity0 cuando los entity1 entity2 .’ None(sıntomas ← empeoran) ’un ataque de entity0 se entity0 cuando los entity2 entity1 .’ target

Tabla 1: Instances with two different entities relationship after the pre-processing phase withentity blinding of the sentence ’Un ataque de asma se produce cuando los sıntomas empeoran.’.

expressions.

Furthermore, the two target entities ofeach instance are replaced by the labels ”en-tity1 ”, ”entity2 ”, and by ”entity0 ”for the re-maining entities. This method is known asentity blinding, and supports the generaliza-tion of the model. For instance, the sentencein Figure 1: ’Un ataque de asma se produ-ce cuando los sıntomas empeoran.’ with theentities ataque de asma, asma, produce, sınto-mas and empeoran should be transformed tothe relation instances showed in Table 1.

Figura 1: Relationships and entities in thesentence ’Un ataque de asma se producecuando los sıntomas empeoran.’.

We observed that there are some instan-ces that involve relationships between an en-tity and its overlapped entity, for this reason,we remove them from the dataset because wecan not deal with these relations in the en-tity blinding process. Moreover, there are re-lationships with more than one label, in thiscase, we take just one label because our sys-tem is not able to cope with a multi-classproblem.

3 CNN model

In this section, we present the CNN archi-tecture which is used for the task of relationextraction in electronic health documents. Fi-gure 2 shows the entire process of the CNNstarting from a sentence with marked entitiesto return the prediction.

3.1 Word table layer

After the pre-processing phase, we created aninput matrix suitable for the CNN architec-ture. The input matrix should represent alltraining instances for the CNN model; there-fore, they should have the same length. Wedetermined the maximum length of the sen-tence in all the instances (denoted by n), andthen extended those sentences with lengthsshorter than n by padding with an auxiliarytoken ”0 ”.

Moreover, each word has to be representedby a vector. To do this, we randomly initia-lized a vector for each different word whichallows us to replace each word by its wordembedding vector: We ∈ R|V |×me where Vis the vocabulary size and me is the wordembedding dimension. Finally, we obtained avector x = [x1;x2; ...;xn] for each instancewhere each word of the sentence is represen-ted by its corresponding word vector from theword embedding matrix. We denote p1 andp2 as the positions in the sentence of the two

LABDA at TASS-2018 Task 3: Convolutional Neural Networks for Relation Classification in Spanish eHealth Documents

73

Word embeddings Position

embeddings Position

embeddings

Convolutional layer Pooling layer

Softmax layer with dropout

Look-up table layer

Un <e1>ataque de asma<\e1> se <e2>produce<\e2> cuando los <e0>síntomas<\e0> <e0>empeoran<\e0>.

Preprocessing

me

md md

entity1

Un

se

entity0

los entity0

cuando entity2

0

…

.

0 w

n-w+1 m

|V|

2n-1

n

We Wd1 Wd2

X

S

z

Ws

k

o

0

Figura 2: CNN model for the Setting semantic relationships subtask of TASS-2018-Task 3.

entities to be classified.The following step involves calculating the

relative position of each word to the two can-didate entities as i − p1 and i − p2, where iis the word position in the sentence (paddedword included), in the same way as (Zeng etal., 2014). In order to avoid negative values,we transformed the range (−n+ 1, n− 1) tothe range (1, 2n−1). Then, we mapped thesedistances into a real value vector using twoposition embeddings Wd1 ∈ R(2n−1)×md andWd2 ∈ R(2n−1)×md . Finally, we created aninput matrix X ∈ Rn×(me+2md) which is re-presented by the concatenation of the wordembeddings and the two position embeddingsfor each word in the instance.

3.2 Convolutional layer

Once we obtained the input matrix, we ap-plied a filter matrix f = [f1; f2; ...; fw] ∈Rw×(me+2md) to a context window of size win the convolutional layer to create higherlevel features. For each filter, we obtaineda score sequence s = [s1; s2; ...; sn−w+1] ∈R(n−w+1)×1 for the whole sentence as

si = g(

w∑j=1

fjxTi+j−1 + b)

where b is a bias term and g is a non-linearfunction (such as tangent or sigmoid). Note

that in Figure 2, we represent the total num-ber of filters, denoted by m, with the samesize w in a matrix S ∈ R(n−w+1)×m. However,the same process can be applied to filters withdifferent sizes by creating additional matricesthat would be concatenated in the followinglayer.

3.3 Pooling layer

In this layer, the goal is to extract the mostrelevant features of each filter using an aggre-gating function. We used the max function,which produces a single value in each filteras zf = max{s} = max{s1; s2; ...; sn−w+1}.Thus, we created a vector z = [z1, z2, ..., zm],whose dimension is the total number of filtersm representing the relation instance. If the-re are filters with different sizes, their outputvalues should be concatenated in this layer.

3.4 Softmax layer

Prior to performing the classification, we per-formed a dropout to prevent overfitting. Weobtained a reduced vector zd, randomly set-ting the elements of z to zero with a probabi-lity p following a Bernoulli distribution. Afterthat, we fed this vector into a fully connec-ted softmax layer with weights Ws ∈ Rm×k

to compute the output prediction values forthe classification as o = zdWs + d where dis a bias term; we have k = 6 classes in the


74

Label Correct Missing Spurious Precision Recall F1

is-a 8 61 8 50 % 11.59 % 18.82 %part-of 5 27 5 50 % 15.63 % 23.81 %property-of 9 53 12 42.86 % 14.52 % 21.69 %same-as 1 4 0 100 % 20 % 33.33 %subject 50 87 37 57.47 % 36.5 % 44.64 %target 113 99 72 61.08 % 53.3 % 56.93 %Scenario 3 186 331 134 58.12 % 35.98 % 44.44 %

Tabla 3: Results over the test set using a CNN with position embedding.

dataset and the ”None class. At test time, thevector z of a new instance is directly classifiedby the softmax layer without a dropout.

3.5 Learning

For the training phase, we need to learn theCNN parameter set θ = (We, Wd1, Wd2,Ws, d, Fm, b), where Fm are all of the mfilters f. For this purpose, we used the condi-tional probability of a relation r obtained bythe softmax operation as

p(r|x, θ) =exp(or)∑kl=1 exp(ol)

to minimize the cross entropy function for allinstances (xi,yi) in the training set T as fo-llows

J(θ) =T∑i=1

log p(yi|xi, θ)

In addition, we minimize the objective fun-ction by using stochastic gradient descentover shuffled mini-batches and the Adam up-date rule (Kingma y Ba, 2014) to learn theparameters.

4 Results and Discussion

The CNN model was training with the trai-ning set and we obtained the best values ofeach parameters fine-tuning them on the va-lidation set (see Table 4).

The results were measured with precision(P), recall (R) and F1, defined as:

P =C

C + SR =

C

C +MF1 = 2

P ×RP +R

where Correct (C) are the relations that mat-ched to the test set and the prediction, Mis-sing (M) are the relations that are in the testset but not in the prediction, and Spurious(S) are the relations that are in the predic-tion but not in the test set.

Parameter Value

Maximal length in the dataset, n 38Word embeddings dimension, Me 300Position embeddings dimension, Md 10Filter window sizes, w 3, 4, 5Filters for each window size, m 200Dropout rate, p 0.5Non-linear function, g ReLUMini-batch size 50Learning rate 0.001

Tabla 4: The CNN model parameters andtheir values used for the results.

Table 3 shows the results of the CNN con-figuration with position embeddings. We ob-serve that the number of Missing is very high.This may be due to the fact that the datasetis very unbalanced and these instances areclassified as None by the system. In fact, wesee that the classes that are more represen-tative have better Recall. To solve this pro-blem we propose to use sampling techniquesto increase the number of instances of the lessrepresentative classes.

Only five teams submitted results for thissubtask. Our system achieved the second hig-hest F1, being very similiar to the top sco-re (micro F1=44.8 %), but very much higherthan the other teams, which are bellow than11 % of F1. One of the main advantage of ourapproach is that it does not require any ex-ternal knowledge resource.

5 Conclusions and Future work

In this paper, we propose a CNN model forthe subtask C (Setting semantic relations-hips) of the TASS-2018 Task 3. The officialresults for this model show that the CNN isa very promising system because neither ex-pert domain knowledge nor external featuresare needed. The configuration of the architec-ture is very simple with a basic preprocessing

LABDA at TASS-2018 Task 3: Convolutional Neural Networks for Relation Classification in Spanish eHealth Documents

75

adapted for Spanish documents.The results show that the system produ-

ces very many false negatives. We think thatthis may be due to the unbalanced nature ofthe dataset. To solve this problem, we propo-se to use oversampling techniques to increa-se the number of instances of the less repre-sentative classes. Our system also seems tohave difficulties in order to distinguish thedirectionality of the relationships. For thesereasons, we will use more complex settings ofthe architecture for tackling the directiona-lity problem.

Moreover, we plan to use external featuresas part of the embeddings such as the entitylabels given by the second subtask, the Part-of-Speech (PoS) tags and the dependency ty-pes of each word for the Spanish documentsin order to increase the information of eachsentence. We want to explore in detail eachfeature contribution and the fine-tune all theparameters. Furthermore, we will use somerules to distinguish the relations and the roleswith the entity labels and train two differentclassifier, thus, they would be more accurate.In addition, we will use another neural net-work architectures like the Recurrent NeuralNetwork and possible combinations with theCNN.

Funding

This work was supported by the ResearchProgram of the Ministry of Economy andCompetitiveness - Government of Spain,(DeepEMR project TIN2017-87548-C2-1-R).

Bibliografıa

Augenstein, I., M. Das, S. Riedel, L. Vikra-man, y A. McCallum. 2017. Semeval2017 task 10: Scienceie - extracting keyph-rases and relations from scientific publi-cations. En Proceedings of the 11th In-ternational Workshop on Semantic Eva-luation (SemEval-2017), paginas 546–555,Vancouver, Canada, August. Associationfor Computational Linguistics.

Kim, Y. 2014. Convolutional neural net-works for sentence classification. En Pro-ceedings of the 2014 Conference on Em-pirical Methods in Natural Language Pro-cessing (EMNLP), paginas 1746–1751.

Kingma, D. P. y J. Ba. 2014. Adam: A met-hod for stochastic optimization. CoRR,abs/1412.6980.

Krizhevsky, A., I. Sutskever, y G. E. Hinton.2012. Imagenet classification with deepconvolutional neural networks. En Advan-ces in Neural Information Processing Sys-tems 25, paginas 1097–1105. Curran As-sociates, Inc.

Lawrence, S., C. L. Giles, A. C. Tsoi, yA. D. Back. 1997. Face recognition:a convolutional neural-network approach.IEEE Transactions on Neural Networks,8(1):98–113, Jan.

Martınez-Camara, E., Y. Almeida-Cruz,M. C. Dıaz-Galiano, S. Estevez-Velarde,M. A. Garcıa-Cumbreras, M. Garcıa-Vega, Y. Gutierrez, A. Montejo-Raez,A. Montoyo, R. Munoz, A. Piad-Morffis, y J. Villena-Roman. 2018.Overview of TASS 2018: Opinions,health and emotions. En E. Martınez-Camara Y. Almeida Cruz M. C. Dıaz-Galiano S. Estevez Velarde M. A.Garcıa-Cumbreras M. Garcıa-VegaY. Gutierrez Vazquez A. Montejo RaezA. Montoyo Guijarro R. Munoz GuillenaA. Piad Morffis, y J. Villena-Roman,editores, Proceedings of TASS 2018:Workshop on Semantic Analysis atSEPLN (TASS 2018), volumen 2172 deCEUR Workshop Proceedings, Sevilla,Spain, September. CEUR-WS.

Suarez-Paniagua, V., I. Segura-Bedmar, yP. Martınez. 2017a. Exploring convolu-tional neural networks for drug-drug inter-action extraction. Database, 2017:bax019.

Suarez-Paniagua, V., I. Segura-Bedmar, yP. Martınez. 2017b. LABDA at semeval-2017 task 10: Relation classification bet-ween keyphrases via convolutional neuralnetwork. En Proceedings of the 11th In-ternational Workshop on Semantic Eva-luation, SemEval@ACL 2017, Vancouver,Canada, August 3-4, 2017, paginas 969–972.

Zeng, D., K. Liu, S. Lai, G. Zhou, y J. Zhao.2014. Relation classification via convo-lutional deep neural network. En Pro-ceedings of the 25th International Confe-rence on Computational Linguistics (CO-LING 2014), Technical Papers, paginas2335–2344, Dublin, Ireland, August. Du-blin City University and Association forComputational Linguistics.


76