Challenge 4 – Literature

Challenge 4 – Extract prevalence information

What was the prevalence?

PubMed example: 
Kalua K, Chirwa T, Kalilani L, Abbenyi S, Mukaka M, Bailey R. Prevalence and risk factors for trachoma in central and southern Malawi. PloS one. 2010, 5(2). 
BACKGROUND: Trachoma, one of the neglected tropical diseases is suspected to be endemic in Malawi. OBJECTIVES: To determine the prevalence of trachoma and associated risk factors in central and southern Malawi. METHODOLOGY/PRINCIPAL FINDINGS: A population based survey conducted in randomly selected clusters in Chikwawa district (population 438,895), southern Malawi and Mchinji district (population 456,558), central Malawi. Children aged 1-9 years and adults aged 15 and above were assessed for clinical signs of trachoma. In total, 1010 households in Chikwawa and 1016 households in Mchinji districts were enumerated within 108 clusters (54 clusters in each district). A total of 6,792 persons were examined for ocular signs of trachoma. The prevalence of trachomatous inflammation, follicular (TF) among children aged 1-9 years was 13.6% (CI 11.6-15.6) in Chikwawa and 21.7% (CI 19.5-23.9) in Mchinji districts respectively. The prevalence of trachoma trichiasis (TT) in women and men aged 15 years and above was 0.6% (CI 0.2-0.9) in Chikwawa and 0.3% (CI 0.04-0.6) in Mchinji respectively. The presence of a dirty face was significantly associated with trachoma follicular (TF) in both Chikwawa and Mchinji districts (P0.001). CONCLUSION/SIGNIFICANCE: Prevalence rates of trachoma follicles (TF) in Central and Southern Malawi exceeds the WHO guidelines for the intervention with mass antibiotic distribution (TF>10%), and warrants the trachoma SAFE control strategy to be undertaken in Chikwawa and Mchinji districts.

Prevalence: multiple values.

  • 13.6% (CI 11.6-15.6) in Chikwawa 
  • 21.7% (CI 19.5-23.9) in Mchinji 

Algorithmic Approach:

  • Sentence extraction of the word “prevalence” and text processing.  

Project Status:

  • Java program which extracts sentences with the word “prevalence” and parses the sentence. The software works by extracting the information from the PDF.

Challenges:

  • Parsing of the sentence due to the variety of writing and phrasing sentences.
  • Boundary detection, so that information extracted is in the context of the results section and not the introduction or discussion sections.
  • Sentences reporting prevalence where the word prevalence is not included.
  • Other statistics that may cause confusion.

Datasets:

  • None.