Patients share valuable advice and experiences with their peers in online patient discussion groups. These uncensored experiences can provide a complementaryperspective to that of the health... Show morePatients share valuable advice and experiences with their peers in online patient discussion groups. These uncensored experiences can provide a complementaryperspective to that of the health professional and thereby yield novel hypotheses which could be tested in further rigorous medical research. This thesis focuses on the development of automatic extraction methods to harvest these patient experiences from online patient forums using text mining techniques. We also examine the complementary value of these patient-reported outcomes to traditional sources of medical knowledge for scientific hypothesis generation. Specifically, we focus on the extraction of adverse drug events (i.e., side effects) and coping strategies for dealing with adverse drug events. Show less
Dirkson, A.R.; Verberne, S.; Kraaij, W.; Oortmerssen, G. van; Gelderblom, H. 2022
Current methods of pharmacovigilance result in severe under-reporting of adverse drug events (ADEs). Patient forums have the potential to complement current pharmacovigilance practices by providing... Show moreCurrent methods of pharmacovigilance result in severe under-reporting of adverse drug events (ADEs). Patient forums have the potential to complement current pharmacovigilance practices by providing real-time uncensored and unsolicited information. We are the first to explore the value of patient forums for rare cancers. To this end, we conduct a case study on a patient forum for Gastrointestinal Stromal Tumor patients. We have developed machine learning algorithms to automatically extract and aggregate side effects from messages on open online discussion forums. We show that patient forum data can provide suggestions for which ADEs impact quality of life the most: For many side effects the relative reporting rate differs decidedly from that of the registration trials, including for example cognitive impairment and alopecia as side effects of avapritinib. We also show that our methods can provide real-world data for long-term ADEs, such as osteoporosis and tremors for imatinib, and novel ADEs not found in registration trials, such as dry eyes and muscle cramping for imatinib. We thus posit that automated pharmacovigilance from patient forums can provide real-world data for ADEs and should be employed as input for medical hypotheses for rare cancers. Show less
Hollander, D. den; Dirkson, A.R.; Verberne, S.; Kraaij, W.; Oortmerssen, G. van; Gelderblom, H.; ... ; Husson, O. 2022
Purpose Treatment with the tyrosine kinase inhibitor (TKI) imatinib in patients with gastrointestinal stromal tumours (GIST) causes symptoms that could negatively impact health-related quality of... Show morePurpose Treatment with the tyrosine kinase inhibitor (TKI) imatinib in patients with gastrointestinal stromal tumours (GIST) causes symptoms that could negatively impact health-related quality of life (HRQoL). Treatment-related symptoms are usually clinician-reported and little is known about patient reports. We used survey and online patient forum data to investigate (1) prevalence of patient-reported symptoms; (2) coverage of symptoms mentioned on the forum by existing HRQoL questionnaires; and (3) priorities of prevalent symptoms in HRQoL assessment.Methods In the cross-sectional population-based survey study, Dutch GIST patients completed items from the EORTC QLQ-C30 and Symptom-Based Questionnaire (SBQ). In the forum study, machine learning algorithms were used to extract TKI side-effects from English messages on an international online forum for GIST patients. Prevalence of symptoms related to imatinib treatment in both sources was calculated and exploratively compared.Results Fatigue and muscle pain or cramps were reported most frequently. Seven out of 10 most reported symptoms (i.e. fatigue, muscle pain or cramps, facial swelling, joint pain, skin problems, diarrhoea, and oedema) overlapped between the two sources. Alopecia was frequently mentioned on the forum, but not in the survey. Four out of 10 most reported symptoms on the online forum are covered by the EORTC QLQ-C30. The EORTC-SBQ and EORTC Item Library cover 9 and 10 symptoms, respectively.Conclusion This first overview of patient-reported imatinib-related symptoms from two data sources helps to determine coverage of items in existing questionnaires, and prioritize HRQoL issues. Combining cancer-generic instruments with treatment-specific item lists will improve future HRQoL assessment in care and research in GIST patients using TKI. Show less
Previous approaches to NLP tasks on online patient forums have been limited to single posts as units, thereby neglecting the overarching conversational structure. In this paper we explore the... Show morePrevious approaches to NLP tasks on online patient forums have been limited to single posts as units, thereby neglecting the overarching conversational structure. In this paper we explore the benefit of exploiting conversational context for filtering posts relevant to a specific medical topic. We experiment with two approaches to add conversational context to a BERT model: a sequential CRF layer and manually engineered features. Although neither approach can outperform the F1 score of the BERT baseline, we find that adding a sequential layer improves precision for all target classes whereas adding a non-sequential layer with manually engineered features leads to a higher recall for two out of three target classes. Thus, depending on the end goal, conversation-aware modelling may be beneficial for identifying relevant messages. We hope our findings encourage other researchers in this domain to move beyond studying messages in isolation towards more discourse-based data collection and classification. We release our code for the purpose of follow-up research. Show less
Dirkson, A.R.; Verberne, S.; Sarker, A.; Kraaij, W. 2019
In the medical domain, user-generated social media text is increasingly used as a valuablecomplementary knowledge source to scientific medical literature. The extraction of this knowledge... Show moreIn the medical domain, user-generated social media text is increasingly used as a valuablecomplementary knowledge source to scientific medical literature. The extraction of this knowledge iscomplicated by colloquial language use and misspellings. However, lexical normalization of suchdata has not been addressed effectively. This paper presents a data-driven lexical normalizationpipeline with a novel spelling correction module for medical social media. Our method significantlyoutperforms state-of-the-art spelling correction methods and can detect mistakes with an F1 of 0.63despite extreme imbalance in the data. We also present the first corpus for spelling mistake detectionand correction in a medical patient forum. Show less
In the medical domain, user-generated social media text is increasingly used as a valuable complementary knowledge source to scientific medical literature. The extraction of this knowledge is... Show moreIn the medical domain, user-generated social media text is increasingly used as a valuable complementary knowledge source to scientific medical literature. The extraction of this knowledge is complicated by colloquial language use and misspellings. Yet, lexical normalization of such data has not been addressed properly. This paper presents an unsupervised, data-driven spelling correction module for medical social media. Our method outperforms state-of-the-art spelling correction and can detect mistakes with an F0.5 of 0.888. Additionally, we present a novel corpus for spelling mistake detection and correction on a medical patient forum. Show less
Although narratives on patient forums are a valuable source of medical information, their systematic detection and analysis has so far been limited to a single study. In this study, we examine... Show moreAlthough narratives on patient forums are a valuable source of medical information, their systematic detection and analysis has so far been limited to a single study. In this study, we examine whether psycho-linguistic features or document embeddings can aid identification of narratives. We also investigate which features distinguish narratives from other social media posts. This study is the rst to automatically identify the topics discussed in narratives on a patient forum. Our results show that for classifying narratives, character 3-grams outperform psycho-linguistic features and document embeddings. We found that narratives are characterized by the use of past tense, health-related words and rst-person pronouns, whereas non-narrative text is associ-ated with the future tense, emotional support words and second-person pronouns. Topic analysis of the patient narratives uncovered fourteen dierent medical topics, ranging from tumor surgery to side eects. Future work will use these methods to extract experiential patient knowledge from social media. Show less