Previous approaches to NLP tasks on online patient forums have been limited to single posts as units, thereby neglecting the overarching conversational structure. In this paper we explore the... Show morePrevious approaches to NLP tasks on online patient forums have been limited to single posts as units, thereby neglecting the overarching conversational structure. In this paper we explore the benefit of exploiting conversational context for filtering posts relevant to a specific medical topic. We experiment with two approaches to add conversational context to a BERT model: a sequential CRF layer and manually engineered features. Although neither approach can outperform the F1 score of the BERT baseline, we find that adding a sequential layer improves precision for all target classes whereas adding a non-sequential layer with manually engineered features leads to a higher recall for two out of three target classes. Thus, depending on the end goal, conversation-aware modelling may be beneficial for identifying relevant messages. We hope our findings encourage other researchers in this domain to move beyond studying messages in isolation towards more discourse-based data collection and classification. We release our code for the purpose of follow-up research. Show less
In the medical domain, user-generated social media text is increasingly used as a valuable complementary knowledge source to scientific medical literature. The extraction of this knowledge is... Show moreIn the medical domain, user-generated social media text is increasingly used as a valuable complementary knowledge source to scientific medical literature. The extraction of this knowledge is complicated by colloquial language use and misspellings. Yet, lexical normalization of such data has not been addressed properly. This paper presents an unsupervised, data-driven spelling correction module for medical social media. Our method outperforms state-of-the-art spelling correction and can detect mistakes with an F0.5 of 0.888. Additionally, we present a novel corpus for spelling mistake detection and correction on a medical patient forum. Show less
Although narratives on patient forums are a valuable source of medical information, their systematic detection and analysis has so far been limited to a single study. In this study, we examine... Show moreAlthough narratives on patient forums are a valuable source of medical information, their systematic detection and analysis has so far been limited to a single study. In this study, we examine whether psycho-linguistic features or document embeddings can aid identification of narratives. We also investigate which features distinguish narratives from other social media posts. This study is the rst to automatically identify the topics discussed in narratives on a patient forum. Our results show that for classifying narratives, character 3-grams outperform psycho-linguistic features and document embeddings. We found that narratives are characterized by the use of past tense, health-related words and rst-person pronouns, whereas non-narrative text is associ-ated with the future tense, emotional support words and second-person pronouns. Topic analysis of the patient narratives uncovered fourteen dierent medical topics, ranging from tumor surgery to side eects. Future work will use these methods to extract experiential patient knowledge from social media. Show less
Veeningen, M.; Chatterjea, S.; Horváth, A.Z.; Spindler, G.; Boersma, E.; Spek, P. van der; ... ; Veugen, T. 2018
This paper describes two experiments in the domain of Cross Language Information Retrieval. Our basic approach is to translate queries word by word using machine readable dictionaries. The rst... Show moreThis paper describes two experiments in the domain of Cross Language Information Retrieval. Our basic approach is to translate queries word by word using machine readable dictionaries. The rst experiment compared different strategies to deal with word sense ambiguity: i) keeping all translations and integrate translation probabilitiesin the model, ii) a single translation is selected on the basis of the number of occurrences in the dictionary iii) word by word translation after word sense disambiguation in the source language. In a second experiment we constructed parallel corpora from web documents in order to construct bilingual dictionaries or improve translation probability estimates. We conclude that our best dictionary based CLIR approach is based on keeping all possible translations, not by simple substitution of a query term by its translations but by creating a structured query and including reverse translation probabilitiesin the retrieval model. Show less