Open Access

Coreference Resolution for Anaphoric Pronouns in Texts on Medical Products


Coreference resolution is the task of finding all expressions that refer to the same entity in a text. It is one of the higher level NLP (Natural Language Processing) tasks. It allows, for example, to extract more information about medical products from larger texts. A product such as ‘ambidextrous gloves’ may appear in a text in many different forms. For example, they could be referred to by the pronoun ‘they’, such as in this sentence. The algorithm presented in this paper finds pronouns and for each of them (except the pleonastic ‘it’) it creates a coreference candidate with entities that appeared earlier in the same sentence or in the previous sentence. Each candidate (pair of mentions) is described by 48 binary features which represent their grammatical and location properties. In the training set, each pair is marked as a coreference or not, based on which a decision tree classifier is trained. A classifier with a high precision of 0.94 and a decent recall of 0.61 were obtained on the training set, still with a good precision out of a sample of 0.64.

Publication timeframe:
4 times per year
Journal Subjects:
Philosophy, other