Literature – The Prevalence Project

PubMed has nearly 90,000 references where prevalence appears in the title.

There are over 400,000 PubMed references where prevalence is found in the title or the abstract.

Prevalence information is important, however, difficult to extract and get a comprehensive overview of a medical condition, in various geographical areas, over time.

The goal of the Literature component of the Prevalence Project is to develop Natural Language Processing (NLP) algorithms for automatically extracting the prevalence information from title and abstracts.

Challenge 1 – Extract medical condition information

  • What medical condition (disease/diagnosis) was the prevalence information for?
  • What other factors were included (age, sex)?

Challenge 2 – Extract geographical information

  • What city, region, country is the prevalence information for?

Challenge 3 – Extract temporal information

  • What time period (years) was the prevalence information from?

Challenge 4 – Extract prevalence information

  • What was the prevalence?

Download the PubMed Prevalence Title Dataset (90,000 references) [coming soon]

PubMed Example:
Kalua K, Chirwa T, Kalilani L, Abbenyi S, Mukaka M, Bailey R. Prevalence and risk factors for trachoma in central and southern Malawi. PloS one. 2010, 5(2).
BACKGROUND: Trachoma, one of the neglected tropical diseases is suspected to be endemic in Malawi. OBJECTIVES: To determine the prevalence of trachoma and associated risk factors in central and southern Malawi. METHODOLOGY/PRINCIPAL FINDINGS: A population based survey conducted in randomly selected clusters in Chikwawa district (population 438,895), southern Malawi and Mchinji district (population 456,558), central Malawi. Children aged 1-9 years and adults aged 15 and above were assessed for clinical signs of trachoma. In total, 1010 households in Chikwawa and 1016 households in Mchinji districts were enumerated within 108 clusters (54 clusters in each district). A total of 6,792 persons were examined for ocular signs of trachoma. The prevalence of trachomatous inflammation, follicular (TF) among children aged 1-9 years was 13.6% (CI 11.6-15.6) in Chikwawa and 21.7% (CI 19.5-23.9) in Mchinji districts respectively. The prevalence of trachoma trichiasis (TT) in women and men aged 15 years and above was 0.6% (CI 0.2-0.9) in Chikwawa and 0.3% (CI 0.04-0.6) in Mchinji respectively. The presence of a dirty face was significantly associated with trachoma follicular (TF) in both Chikwawa and Mchinji districts (P0.001). CONCLUSION/SIGNIFICANCE: Prevalence rates of trachoma follicles (TF) in Central and Southern Malawi exceeds the WHO guidelines for the intervention with mass antibiotic distribution (TF>10%), and warrants the trachoma SAFE control strategy to be undertaken in Chikwawa and Mchinji districts.