Evaluation of Expert vs. Non-expert Annotations Quality for Curation of NLP to Extract Potential Drug-Drug Interactions (DDIs) from Drug Product Labeling

Seminar Date: 
Seminar Time: 
11am - 12pm
Seminar Location: 
5607 Baum Boulevard, Room 407A
Andres Hernandez, BS


As there are more sources for published documents containing studies about DDIs, there is a bigger need to build automated tools to predict and understand DDIs in large scale. Also a big challenge as the amount of information grows is the integration of data sources; currently there are around 8 relevant databases for DDIs that are used in most of this studies. These challenges make that most of DDIs are still discovered by accident in the clinic or during Phase IV clinical trials. The Dailymed Database is a compilation of the Drug labels approved and regulated by the FDA. The structured labels that are in there, are not enough to automate or completely support the detection of DDIs that are being reported daily. One common alternative to allow automatic detection is NLP, but it requires manual curation and this task is tedious, difficult and expensive. Usually requires multiple annotations by people who are trained in some degree depending on the field that belongs the project.

Our presentation is an overview of our ongoing research project that aims to evaluate the quality of annotations in two different groups (experts Vs. non-experts) in the detection of drug-drug interactions from drug product labels for NLP curation. We think that if we could show that non experts could reach a comparable performance than experts for the detection of DDIs, we could show that is possible to lower the costs and the difficulty for annotation experiments. In this talk, we will give an explanation of the two main tools that we are using to support and evaluate interagreement between annotations. The first is an NLP pipeline for the detection of drug - drug interactions. The process involves Named entity recognition, Json translation for describing drug mentions within a text, and finally extraction of drug drug interactions with a post processing module to increase the performance. The second tool is the Domeo annotation tool which is an extensible web application built using Google web toolkit (GWT) that enables the creation of annotations using the Open Annotation Ontology (AO) to provide a common model to document metadata from text mining and manual annotation of scientific papers. Domeo also allows annotation algorithm support for mass-scale manual annotation.