Natural Language Processing and Deep Phenotyping

Precise phenotype information is needed to unravel the effects of genetic, epigenetic, and other factors on tumor behavior and responsiveness. Current models for correlating EMR data with –omics data largely ignore the clinical text, which remains one of the most important sources of phenotype information for cancer patients. Unlocking the value of clinical text has the potential to enable new insights about cancer initiation, progression, metastasis, and response to treatment.

In collaboration with Dr. Guergana Savova, Dr. Harry Hochheiser’s work will produce novel methods for extracting detailed phenotype information directly from EMR data. Dissemination of the software will enhance the ability of cancer researchers to abstract meaningful clinical data for translational research. If successful, systematic capture and representation of these phenotypes from EMR data could later be used to drive clinical genomic decision support.

Sample of Related Publications:

Hochheiser H; Jacobson R; Washington N; Denny J; Savova G. 2015. Natural language processing for phenotype extraction: challenges and representation. AMIA Annual Symposium. Nov 2015, San Francisco, CA.

Dligach D, Miller T, Savova GK. 2015. Semi-supervised Learning for Phenotyping Tasks. AMIA Annual Symposium. Nov 2015, San Francisco, CA.

Chen L, Dligach D, Miller T, Bethard S, Savova G. 2015. Layered temporal modeling for the clinical domain. Journal of the American Medical Informatics Association.

Chen L, Miller T, Dligach D, Bethard S, Savova G. 2016. Improving Temporal Relation Extraction with Training Instance Augmentation. BioNLP workshop at the Association for Computational Linguistics conference. Berlin, Germany, Aug 2016

Hochheiser H, Castine M, Harris D, Savova G, Jacobson R. 2016. An Information Model for Cancer Phenotypes. BMC Medical Informatics and Decision Making.

Project Wiki: