Medical information retrieval test collection
[Home]Background
This page provides a test collection with associated queries and relevance judgement for evaluating information retrieval of medical records. Details of the design of the evaluation framework are provided in the following paper:Koopman, B., Bruza, P., Sitbon, L., Lawley, M. Evaluation of medical information retrieval. Poster proceedings of the 34st annual international ACM SIGIR conference on Research and development in information retrieval, 2011.If you make use of this collection please cite the above publication.
Test corpus
As our test corpus we use the BLULab NLP repository, a collection of 81,617 de-identified clinical records from multiple U.S. hospitals during 2007. The collection is available to the community for research purposes. A number of different medical record types are provided, including: History and Physical Exams, Progress Notes, Consultation Reports, Radiology Reports, Emergency Department Reports, Discharge Summaries, Operative Reports, Cardiology Reports.You will need to apply separately for access to the BLULab corpus.
Queries & Relevance Judgements
Below is a sample of the queries and relevane judgement for the BLULab collection. The complete data will be made available shortly.Queries are provided in two formats - Indri style query XML documents and TREC style topics.
| Description | Document List | Queries | Relevance Judgement | Evalation Run | |
|---|---|---|---|---|---|
| Indri | TREC | ||||
| Samples | sample.doclist | sample.iq | sample.topics | sample.iq.qrel | sample.iq.eval |
| All | all.doclist | all.iq | all.topics | all.iq.qrel | all.iq.eval |
| Discharge Summaries, History & Physical Exams, Emergency Department Reports. No laboratory based reports. | ds_er_hp.doclist | ds_er_hp.iq | ds_er_hp.topics | ds_er_hp.iq.qrel | ds_er_hp.iq.eval |
| Discharge Summaries | ds.doclist | ds.iq | ds.topics | ds.iq.qrel | ds.iq.eval |
| Discharge Summaries (excluding administrative non-clinical codes) | ds.doclist | ds_clinical.iq | ds_clinical.topics | ds_clinical.iq.qrel | ds_clinical.iq.eval |
| Discharge Summaries mapped to SNOMED CT concept descriptions | ds.doclist | snomedized-ds.iq | snomedized-ds.topics | snomedized-ds.iq.qrel | snomedized-ds.iq.eval |
| Download all files: med_eval_all.zip (9.6M) | |||||
Other resources
Indri parameters
Indri retrieval parameter configuration:<parameters> <index>/home/bevan/data/blulab/indri-index</index> <count>1500</count> <trecFormat>true</trecFormat> <baseline>tfidf</baseline> <stemmer> <name>porter</name> </stemmer> </parameters>