Semantic data integration techniques for transforming big biomedical data into actionable knowledge

authored by
Maria Esther Vidal, Samaneh Jozashoori
Abstract

FAIR principles and the Open Data initiatives have motivated the publication of large volumes of data. Specifically, in the biomedical domain, the size of the data has increased exponentially in the last decade, and with the advances in the technologies to collect and generate data, a faster growth rate is expected for the next years. The available collections of data are characterized by the dominant dimensions of big data, i.e., they are not only large in volume, but they can be also heterogeneous and present quality issues. These data complexity problems impact on the typical tasks of data management, and particularly, in the task of integrating big biomedical data sources. We tackle the problem of big data integration and present a knowledge-driven framework able to extract and integrate data collected from structured and unstructured data sources. The proposed framework resorts to Natural Language Processing techniques to extract knowledge from unstructured data and short text. Furthermore, ontologies and controlled vocabularies, e.g., UMLS, are utilized to annotate the extracted entities and relations with terms from the ontology or controlled vocabulary. The annotated data is integrated into a knowledge graph. A unified schema is used to describe the meaning of the integrated data as well as the main properties and relations. As proof of concept, we show the results of applying the proposed framework to integrate clinical records from lung cancer patients with data extracted from open data sources like Drugbank and PubMed. The created knowledge graph enables the discovery of interactions between drugs in the treatments prescribed to lung cancer patients.

Organisation(s)
L3S Research Centre
External Organisation(s)
German National Library of Science and Technology (TIB)
Type
Conference contribution
Pages
563-566
No. of pages
4
Publication date
06.2019
Publication status
Published
Peer reviewed
Yes
ASJC Scopus subject areas
Radiology Nuclear Medicine and imaging, Computer Science Applications
Sustainable Development Goals
SDG 3 - Good Health and Well-being
Electronic version(s)
https://doi.org/10.1109/CBMS.2019.00116 (Access: Closed)