Semantic Health as Inference: Applications in Health Informatics
[Home]My PhD was submitted to the Queensland University of Technology in December 2013 and awarded in May 2014. The thesis won an Outstanding PhD Doctoral Thesis Award for top 5% of graduates for 2014.
Citation
B. Koopman. Semantic Search as Inference: Applications in Health Informatics. PhD thesis, Queensland University of Technology, Brisbane, Australia, May 2014.
@phdthesis{Koopman2014Semantic-Search, Address = {Brisbane, Australia}, Author = {Bevan Koopman}, Month = {May}, School = {Queensland University of Technology}, Title = {Semantic Search as Inference: Applications in Health Informatics}, Year = {2014} }
Abstract
In this thesis, we present models for semantic search: Information Retrieval (IR) models that elicit the meaning behind the words found in documents and queries rather than simply matching keywords. This is achieved by the integration of structured domain knowledge and data-driven information retrieval methods.
The research is set within the medical domain to tackle the unique challenges within this domain; specifically, how to bridge the `semantic gap' --- overcome the mismatch between raw medical data and the way human beings interpret it. Bridging the semantic gap involves addressing two issues: semantics; that is, aligning the meaning or concepts behind words founds in documents and queries; and leveraging inference, which utilises semantics to infer relevant information.
Three semantic search models --- all utilising concept-based rather than term-based representations --- are developed; these include: the Bag-of-concepts model, that uses concepts taken from the SNOMED~CT medical ontology as its underlying representation; the Graph-based Concept Weighting model, that captures concept dependence and importance in a novel weighting function; and the core contribution of the thesis, the Graph Inference model (GIN): a unified theoretical model of semantic search as inference, achieved by the integration of structured domain knowledge (ontologies) and statistical, information retrieval methods. It is the GIN that provides the necessary mechanism for inference to bridge the semantic gap. All three models are empirically evaluated using clinical queries and a real-world collection of clinical records taken from the TREC Medical Records Track (MedTrack).
Our evaluation shows that the use of concept-based representations in the Bag-of-concepts model leads to improved retrieval effectiveness. When concepts are combined within the Graph-based Concept Weighting model, further improvements are possible. The evaluation of GIN highlighted that its inference mechanism is suited to hard queries --- those that perform poorly on a term-based system. In depth analysis also revealed that the GIN returned many new documents not retrieved by term-based systems and therefore never evaluated for relevance as part of the TREC MedTrack. This highlights that using standard IR test collections may underestimate the effectiveness of semantic search systems.
This work represents a significant step forward in the integration of structured domain knowledge and data-driven information retrieval methods. Furthermore, the thesis provides an understanding of inference --- when and how it should be applied for effective semantic search. It shows queries with certain characteristics benefit from inference, while others do not. The detailed investigation into the evaluation of semantic search systems shows how standard IR test collections may underestimate effectiveness of such systems and new methods of evaluation are suggested. The Graph Inference model, although developed within the medical domain, is generally defined and has implications in other areas, including web search, where an emerging research trend is to utilise structured knowledge resources for more effective semantic search.
Download full thesis (PDF)
Slide Deck from PhD Final Seminar
Supervision
Principle: Prof. Peter Bruza, Professor of Information Technology, Queensland University of TechnologyAssociate: Dr. Laurianne Sitbon, Lecturer, Queensland University of Technology
Associate: Dr. Michael Lawley, Principal Research Scientist, CSIRO