Electronic health record (EHR) data provides a new venue to elucidate disease comorbidities and latent phenotypes for precision medicine. To fully exploit its potential, a realistic data generative process of the EHR data needs to be modelled. We present MixEHR-S to jointly infer specialist-disease topics from the EHR data. As the key contribution, we model the specialist assignments and ICD-coded diagnoses as the latent topics based on patient's underlying disease topic mixture in a novel unified supervised hierarchical Bayesian topic model. For efficient inference, we developed a closed-form collapsed variational inference algorithm to learn the model distributions of MixEHR-S. We applied MixEHR-S to two independent large-scale EHR databases in Quebec with three targeted applications (1) Congenital Heart Disease (CHD) diagnostic prediction among 154,775 patients; (2) Chronic obstructive pulmonary disease (COPD) diagnostic prediction among 73,791 patients; (3) future insulin treatment prediction among 78,712 patients diagnosed with diabetes as a mean to assess the disease exacerbation. In all three applications, MixEHR-S conferred clinically meaningful latent topics among the most predictive latent topics and achieved superior target prediction accuracy compared to the existing methods, providing opportunities for prioritizing high-risk patients for healthcare services.
ML
Clustering count data with stochastic expectation propagation
Research networking is a difficult part of academics in spite of the multiple benefits that the Web has brought within this field in recent years. Even though scientific and business social networks provide a medium to discover peers worldwide, their usefulness meets its limits when real-world requirements come in. The broad audience of those tools and other bibliographic databases lead them to ignore cultural and geographical aspects such regional indexes, organizational structures, among others. On this poster, we introduce REDI, a Linked Data - powered research networking platform which combines both local (institutional/regional) and external (Web) scholarly sources in a consolidated knowledge base. Moreover, REDI leverages on its knowledge base to cluster authors within similar research areas easing networking and unveiling a variety of new information from data for multiple purposes.
2017
ML & SW
Authors semantic disambiguation on heterogeneous bibliographic sources
Ortiz, José,
Sumba, Xavier,
Segarra, José,
Cullcay, José,
Espinoza, Mauricio,
and Saquicela, Victor
In 2017 XLIII Latin American Computer Conference (CLEI)
2017
Data ambiguity from various sources remains as a complex problem that affects services provided by digital libraries. From the point of view of integration of information from different sources, the challenge of author ambiguity is one of the most important, and there are numerous methods proposed to deal with this issue using different approaches. They generally work for some scenarios but they have important limitations, specially when dealing with heterogeneous sources. In this work, we review a group of existing methods and then propose a technique that combines some of them, also incorporating a measure of distance using semantic technologies to solve the ambiguity of authors while integrating bibliographic data from various sources. This technique has been successfully tested in disambiguating Ecuadorian authors from both internal sources (institutional repositories) and external digital libraries.
SW
Identificación automática de artículos indexados en Latindex
Searching for scientific publications online is an essential task for researchers working on a certain topic. However, the extremely large amount of scientific publications found in the web turns the process of finding a publication into a very difficult task whereas, locating peers interested in collaborating on a specific topic or reviewing literature is even more challenging. In this paper, we propose a novel architecture to join multiple bibliographic sources, with the aim of identifying common research areas and potential collaboration networks, through a combination of ontologies, vocabularies, and Linked Data technologies for enriching a base data model. Furthermore, we implement a prototype to provide a centralized repository with bibliographic sources and to find similar knowledge areas using data mining techniques in the domain of Ecuadorian researchers community.