In this resource paper we release ChiSCor, a new corpus containing 619 fantasy stories, told freely by 442 Dutch children aged 4-12. ChiSCor was compiled for studying how children render character... Show moreIn this resource paper we release ChiSCor, a new corpus containing 619 fantasy stories, told freely by 442 Dutch children aged 4-12. ChiSCor was compiled for studying how children render character perspectives, and unravelling language and cognition in development, with computational tools. Unlike existing resources, ChiSCor’s stories were produced in natural contexts, in line with recent calls for more ecologically valid datasets. ChiSCor hosts text, audio, and annotations for character complexity and linguistic complexity. Additional metadata (e.g. education of caregivers) is available for one third of the Dutch children. ChiSCor also includes a small set of 62 English stories. This paper details how ChiSCor was compiled and shows its potential for future work with three brief case studies: i) we show that the syntactic complexity of stories is strikingly stable across children’s ages; ii) we extend work on Zipfian distributions in free speech and show that ChiSCor obeys Zipf’s law closely, reflecting its social context; iii) we show that even though ChiSCor is relatively small, the corpus is rich enough to train informative lemma vectors that allow us to analyse children’s language use. We end with a reflection on the value of narrative datasets in computational linguistics. Show less
Duijn, M.J. van; Dijk, B.M.A. van; Kouwenhoven, T.; Valk, W. de; Spruit, M.; Putten, P.W.H. van der 2023
To what degree should we ascribe cognitive capacities to Large Language Models (LLMs), such as the ability to reason about intentions and beliefs known as Theory of Mind (ToM)? Here we add to this... Show moreTo what degree should we ascribe cognitive capacities to Large Language Models (LLMs), such as the ability to reason about intentions and beliefs known as Theory of Mind (ToM)? Here we add to this emerging debate by (i) testing 11 base- and instruction-tuned LLMs on capabilities relevant to ToM beyond the dominant false-belief paradigm, including non-literal language usage and recursive intentionality; (ii) using newly rewritten versions of standardized tests to gauge LLMs’ robustness; (iii) prompting and scoring for open besides closed questions; and (iv) benchmarking LLM performance against that of children aged 7-10 on the same tasks. We find that instruction-tuned LLMs from the GPT family outperform other models, and often also children. Base-LLMs are mostly unable to solve ToM tasks, even with specialized prompting. We suggest that the interlinked evolution and development of language and ToM may help explain what instruction-tuning adds: rewarding cooperative communication that takes into account interlocutor and context. We conclude by arguing for a nuanced perspective on ToM in LLMs. Show less
Dijk, B.M.A. van; Kouwenhoven, T.; Spruit, M.; Duijn, M.J. van 2023
Current Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. LLMs are appearing rapidly, and debates on LLM capacities have taken off, but... Show moreCurrent Large Language Models (LLMs) are unparalleled in their ability to generate grammatically correct, fluent text. LLMs are appearing rapidly, and debates on LLM capacities have taken off, but reflection is lagging behind. Thus, in this position paper, we first zoom in on the debate and critically assess three points recurring in critiques of LLM capacities: i) that LLMs only parrot statistical patterns in the training data; ii) that LLMs master formal but not functional language competence; and iii) that language learning in LLMs cannot inform human language learning. Drawing on empirical and theoretical arguments, we show that these points need more nuance. Second, we outline a pragmatic perspective on the issue of ‘real’ understanding and intentionality in LLMs. Understanding and intentionality pertain to unobservable mental states we attribute to other humans because they have pragmatic value: they allow us to abstract away from complex underlying mechanics and predict behaviour effectively. We reflect on the circumstances under which it would make sense for humans to similarly attribute mental states to LLMs, thereby outlining a pragmatic philosophical context for LLMs as an increasingly prominent technology in society. Show less
Haastrecht, M. van; Brinkhuis, M.; Wools, S.; Spruit, M. 2023
The influx of technology in education has made it increasingly difficult to assess the validity of educational assessments. The field of information systems often ignores the social dimension... Show moreThe influx of technology in education has made it increasingly difficult to assess the validity of educational assessments. The field of information systems often ignores the social dimension during validation, whereas educational research neglects the technical dimensions of designed instruments. The inseparability of social and technical elements forms the bedrock of socio-technical systems. Therefore, the current lack of validation approaches that address both dimensions is a significant gap. We address this gap by introducing VAST: a validation framework for e-assessment solutions. Examples of such solutions are technology-enhanced learning systems and e-health applications. Using multi-grounded action research as our methodology, we investigate how we can synthesise existing knowledge from information systems and educational measurement to construct our validation framework. We develop an extensive user guideline complementing our framework and find through expert interviews that VAST facilitates a comprehensive, practical approach to validating e-assessment solutions. Show less
Improving population health and reducing inequalities through better integrated health and social care services is high up on the agenda of policymakers internationally. In recent years, regional... Show moreImproving population health and reducing inequalities through better integrated health and social care services is high up on the agenda of policymakers internationally. In recent years, regional cross-domain partnerships have emerged in several countries, which aim to achieve better population health, quality of care and a reduction in the per capita costs. These cross-domain partnerships aim to have a strong data foundation and are committed to continuous learning in which data plays an essential role. This paper describes our approach towards the development of the regional integrative population-based data infrastructure Extramural LUMC (Leiden University Medical Center) Academic Network (ELAN), in which we linked routinely collected medical, social and public health data at the patient level from the greater The Hague and Leiden area. Furthermore, we discuss the methodological issues of routine care data and the lessons learned about privacy, legislation and reciprocities. The initiative presented in this paper is relevant for international researchers and policy-makers because a unique data infrastructure has been set up that contains data across different domains, providing insights into societal issues and scientific questions that are important for data driven population health management approaches. Show less
Improving population health and reducing inequalities through better integrated health and social care services is high up on the agenda of policymakers internationally. In recent years, regional... Show moreImproving population health and reducing inequalities through better integrated health and social care services is high up on the agenda of policymakers internationally. In recent years, regional cross-domain partnerships have emerged in several countries, which aim to achieve better population health, quality of care and a reduction in the per capita costs. These cross-domain partnerships aim to have a strong data foundation and are committed to continuous learning in which data plays an essential role. This paper describes our approach towards the development of the regional integrative population-based data infrastructure Extramural LUMC (Leiden University Medical Center) Academic Network (ELAN), in which we linked routinely collected medical, social and public health data at the patient level from the greater The Hague and Leiden area. Furthermore, we discuss the methodological issues of routine care data and the lessons learned about privacy, legislation and reciprocities. The initiative presented in this paper is relevant for international researchers and policy-makers because a unique data infrastructure has been set up that contains data across different domains, providing insights into societal issues and scientific questions that are important for data driven population health management approaches. Show less
OBJECTIVE To study the effects of a primary care medication review intervention centred around an electronic clinical decision support system (eCDSS) on appropriateness of medication and the number... Show moreOBJECTIVE To study the effects of a primary care medication review intervention centred around an electronic clinical decision support system (eCDSS) on appropriateness of medication and the number of prescribing omissions in older adults with multimorbidity and polypharmacy compared with a discussion about medication in line with usual care.DESIGN Cluster randomised clinical trial.SETTING Swiss primary care, between December 2018 and February 2021.PARTICIPANTS Eligible patients were a65 years of age with three or more chronic conditions and five or more long term medications.INTERVENTION The intervention to optimise pharmacotherapy centred around an eCDSS was conducted by general practitioners, followed by shared decision making between general practitioners and patients, and was compared with a discussion about medication in line with usual care between patients and general practitioners.MAIN OUTCOME MEASURES Primary outcomes were improvement in the Medication Appropriateness Index (MAI) and the Assessment of Underutilisation (AOU) at 12 months. Secondary outcomes included number of medications, falls, fractures, and quality of life. RESULTS In 43 general practitioner clusters, 323 patients were recruited (median age 77 (interquartile range 73-83) years; 45% (n=146) women). Twenty one general practitioners with 160 patients were assigned to the intervention group and 22 general practitioners with 163 patients to the control group. On average, one recommendation to stop or start a medication was reported to be implemented per patient. At 12 months, the results of the intention-to-treat analysis of the improvement in appropriateness of medication (odds ratio 1.05, 95% confidence interval 0.59 to 1.87) and the number of prescribing omissions (0.90, 0.41 to 1.96) were inconclusive. The same was the case for the per protocol analysis. No clear evidence was found for a difference in safety outcomes at the 12 month follow-up, but fewer safety events were reported in the intervention group than in the control group at six and 12 months.CONCLUSIONS In this randomised trial of general practitioners and older adults, the results were inconclusive as to whether the medication review intervention centred around the use of an eCDSS led to an improvement in appropriateness of medication or a reduction in prescribing omissions at 12 months compared with a discussion about medication in line with usual care. Nevertheless, the intervention could be safely delivered without causing any harm to patients.TRIAL REGISTRATION NCT03724539Clinicaltrials.gov NCT03724539 Show less
Toledo, C. van; Schraagen, M.; Dijk, F. van; Brinkhuis, M.; Spruit, M. 2023
This paper introduces a novel method to predict when a Google translation is better than other machine translations (MT) in Dutch. Instead of considering fidelity, this approach considers fluency... Show moreThis paper introduces a novel method to predict when a Google translation is better than other machine translations (MT) in Dutch. Instead of considering fidelity, this approach considers fluency and readability indicators for when Google ranked best. This research explores an alternative approach in the field of quality estimation. The paper contributes by publishing a dataset with sentences from English to Dutch, with human-made classifications on a best-worst scale. Logistic regression shows a correlation between T-Scan output, such as readability measurements like lemma frequencies, and when Google translation was better than Azure and IBM. The last part of the results section shows the prediction possibilities. First by logistic regression and second by a generated automated machine learning model. Respectively, they have an accuracy of 0.59 and 0.61. Show less
Learning analytics sits in the middle space between learning theory and data analytics. The inherent diversity of learning analytics manifests itself in an epistemology that strikes a balance... Show moreLearning analytics sits in the middle space between learning theory and data analytics. The inherent diversity of learning analytics manifests itself in an epistemology that strikes a balance between positivism and interpretivism, and knowledge that is sourced from theory and practice. In this paper, we argue that validation approaches for learning analytics systems should be cognisant of these diverse foundations. Through a systematic review of learning analytics validation research, we find that there is currently an over-reliance on positivistic validity criteria. Researchers tend to ignore interpretivistic criteria such as trustworthiness and authenticity. In the 38 papers we analysed, researchers covered positivistic validity criteria 221 times, whereas interpretivistic criteria were mentioned 37 times. We motivate that learning analytics can only move forward with holistic validation strategies that incorporate “thick descriptions” of educational experiences. We conclude by outlining a planned validation study using argument-based validation, which we believe will yield meaningful insights by considering a diverse spectrum of validity criteria. Show less
Toledo, C. van; Schraagen, M.; Dijk, F. van; Brinkhuis, M.; Spruit, M. 2022
We explore the use case of question answering (QA) by a contact centre for 130,000 Dutch government employees in the domain of questions about human resources (HR). HR questions can be answered... Show moreWe explore the use case of question answering (QA) by a contact centre for 130,000 Dutch government employees in the domain of questions about human resources (HR). HR questions can be answered using personnel files or general documentation, with the latter being the focus of the current research. We created a Dutch HR QA dataset with over 300 questions in the format of the Squad 2.0 dataset, which distinguishes between answerable and unanswerable questions. We applied various BERT-based models, either directly or after finetuning on the new dataset. The F1-scores reached 0.47 for unanswerable questions and 1.0 for answerable questions depending on the topic; however, large variations in scores were observed. We conclude more data are needed to further improve the performance of this task. Show less
Small and Medium-sized Enterprises (SMEs) constitute a very large part of every country's economy and play an essential role in economic growth and social development. SMEs are frequent targets of... Show moreSmall and Medium-sized Enterprises (SMEs) constitute a very large part of every country's economy and play an essential role in economic growth and social development. SMEs are frequent targets of cyberattacks. Unlike large enterprises, SMEs generally have limited capabilities regarding cybersecurity practices. Assessment and improvement of cybersecurity capabilities are crucial for SMEs to survive and sustain their operations. Despite the availability of maturity assessment models and standards to assess and improve cybersecurity capabilities, SMEs' specific requirements and roles in the digital ecosystem are often neglected. This paper presents high-level SME requirements regarding cybersecurity maturity assessment and standardization and translates them into an Adaptable Security Maturity Assessment and Standardization (ASMAS) framework to address this gap. The framework is demonstrated by a web-based software prototype. In the evaluation study conducted with SMEs, we obtained positive results for perceived usefulness, perceived ease of use of the framework, and intention to use it. Show less
Inpatient violence is a common and severe problem within psychiatry. Knowing who might become violent can influence staffing levels and mitigate severity. Predictive machine learning models can... Show moreInpatient violence is a common and severe problem within psychiatry. Knowing who might become violent can influence staffing levels and mitigate severity. Predictive machine learning models can assess each patient's likelihood of becoming violent based on clinical notes. Yet, while machine learning models benefit from having more data, data availability is limited as hospitals typically do not share their data for privacy preservation. Federated Learning (FL) can overcome the problem of data limitation by training models in a decentralised manner, without disclosing data between collaborators. However, although several FL approaches exist, none of these train Natural Language Processing models on clinical notes. In this work, we investigate the application of Federated Learning to clinical Natural Language Processing, applied to the task of Violence Risk Assessment by simulating a cross-institutional psychiatric setting. We train and compare four models: two local models, a federated model and a data-centralised model. Our results indicate that the federated model outperforms the local models and has similar performance as the data-centralised model. These findings suggest that Federated Learning can be used successfully in a cross-institutional setting and is a step towards new applications of Federated Learning based on clinical notes. Show less
The clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is... Show moreThe clinical notes in electronic health records have many possibilities for predictive tasks in text classification. The interpretability of these classification models for the clinical domain is critical for decision making. Using topic models for text classification of electronic health records for a predictive task allows for the use of topics as features, thus making the text classification more interpretable. However, selecting the most effective topic model is not trivial. In this work, we propose considerations for selecting a suitable topic model based on the predictive performance and interpretability measure for text classification. We compare 17 different topic models in terms of both interpretability and predictive performance in an inpatient violence prediction task using clinical notes. We find no correlation between interpretability and predictive performance. In addition, our results show that although no model outperforms the other models on both variables, our proposed fuzzy topic modeling algorithm (FLSA-W) performs best in most settings for interpretability, whereas two state-of-the-art methods (ProdLDA and LSI) achieve the best predictive performance. Show less
Fairness and bias are crucial concepts in artificial intelligence, yet they are relatively ignored in machine learning applications in clinical psychiatry. We computed fairness metrics and present... Show moreFairness and bias are crucial concepts in artificial intelligence, yet they are relatively ignored in machine learning applications in clinical psychiatry. We computed fairness metrics and present bias mitigation strategies using a model trained on clinical mental health data. We collected structured data related to the admission, diagnosis, and treatment of patients in the psychiatry department of the University Medical Center Utrecht. We trained a machine learning model to predict future administrations of benzodiazepines on the basis of past data. We found that gender plays an unexpected role in the predictions-this constitutes bias. Using the AI Fairness 360 package, we implemented reweighing and discrimination-aware regularization as bias mitigation strategies, and we explored their implications for model performance. This is the first application of bias exploration and mitigation in a machine learning model trained on real clinical psychiatry data. Show less
Spruit, M.; Verkleij, S.; Schepper, K. de; Scheepers, F. 2022
Diagnosing mental disorders is complex due to the genetic, environmental and psychological contributors and the individual risk factors. Language markers for mental disorders can help to diagnose a... Show moreDiagnosing mental disorders is complex due to the genetic, environmental and psychological contributors and the individual risk factors. Language markers for mental disorders can help to diagnose a person. Research thus far on language markers and the associated mental disorders has been done mainly with the Linguistic Inquiry and Word Count (LIWC) program. In order to improve on this research, we employed a range of Natural Language Processing (NLP) techniques using LIWC, spaCy, fastText and RobBERT to analyse Dutch psychiatric interview transcriptions with both rule-based and vector-based approaches. Our primary objective was to predict whether a patient had been diagnosed with a mental disorder, and if so, the specific mental disorder type. Furthermore, the second goal of this research was to find out which words are language markers for which mental disorder. LIWC in combination with the random forest classification algorithm performed best in predicting whether a person had a mental disorder or not (accuracy: 0.952; Cohen's kappa: 0.889). SpaCy in combination with random forest predicted best which particular mental disorder a patient had been diagnosed with (accuracy: 0.429; Cohen's kappa: 0.304). Show less
Haastrecht, M. van; Golpur, G.; Tzismadia, G.; Kab, R.; Priboi, C.; David, D.; ... ; Spruit, M. 2022
Background: Knowledge about adverse drug reactions (ADRs) in the population is limited because of underreporting, which hampers surveillance and assessment of drug safety. Therefore, gathering... Show moreBackground: Knowledge about adverse drug reactions (ADRs) in the population is limited because of underreporting, which hampers surveillance and assessment of drug safety. Therefore, gathering accurate information that can be retrieved from clinical notes about the incidence of ADRs is of great relevance. However, manual labeling of these notes is time-consuming, and automatization can improve the use of free-text clinical notes for the identification of ADRs. Furthermore, tools for language processing in languages other than English are not widely available. Objective: The aim of this study is to design and evaluate a method for automatic extraction of medication and Adverse Drug Reaction Identification in Clinical Notes (ADRIN). Methods: Dutch free-text clinical notes (N=277,398) and medication registrations (N=499,435) from the Cardiology Centers of the Netherlands database were used. All clinical notes were used to develop word embedding models. Vector representations of word embedding models and string matching with a medical dictionary (Medical Dictionary for Regulatory Activities [MedDRA]) were used for identification of ADRs and medication in a test set of clinical notes that were manually labeled. Several settings, including search area and punctuation, could be adjusted in the prototype to evaluate the optimal version of the prototype. Results: The ADRIN method was evaluated using a test set of 988 clinical notes written on the stop date of a drug. Multiple versions of the prototype were evaluated for a variety of tasks. Binary classification of ADR presence achieved the highest accuracy of 0.84. Reduced search area and inclusion of punctuation improved performance, whereas incorporation of the MedDRA did not improve the performance of the pipeline. Conclusions: The ADRIN method and prototype are effective in recognizing ADRs in Dutch clinical notes from cardiac diagnostic screening centers. Surprisingly, incorporation of the MedDRA did not result in improved identification on top of word embedding models. The implementation of the ADRIN tool may help increase the identification of ADRs, resulting in better care and saving substantial health care costs. Show less
Instant analysis of cybersecurity reports is a fundamental challenge for security experts as an immeasurable amount of cyber information is generated on a daily basis, which necessitates automated... Show moreInstant analysis of cybersecurity reports is a fundamental challenge for security experts as an immeasurable amount of cyber information is generated on a daily basis, which necessitates automated information extraction tools to facilitate querying and retrieval of data. Hence, we present Open-CyKG: an Open Cyber Threat Intelligence (CTI) Knowledge Graph (KG) framework that is constructed using an attention-based neural Open Information Extraction (OIE) model to extract valuable cyber threat information from unstructured Advanced Persistent Threat (APT) reports. More specifically, we first identify relevant entities by developing a neural cybersecurity Named Entity Recognizer (NER) that aids in labeling relation triples generated by the OIE model. Afterwards, the extracted structured data is canonicalized to build the KG by employing fusion techniques using word embeddings. As a result, security professionals can execute queries to retrieve valuable information from the Open-CyKG framework. Experimental results demonstrate that our proposed components that build up Open-CyKG outperform state-of-the-art models.1 (c) 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). Show less
Haastrecht, M. van; Golpur, G.; Tzismadia, G.; Kab, R.; Priboi, C.; David, D.; ... ; Spruit, M. 2021
Small- and medium-sized enterprises (SMEs) frequently experience cyberattacks, but often do not have the means to counter these attacks. Therefore, cybersecurity researchers and practitioners need... Show moreSmall- and medium-sized enterprises (SMEs) frequently experience cyberattacks, but often do not have the means to counter these attacks. Therefore, cybersecurity researchers and practitioners need to aid SMEs in their defence against cyber threats. Research has shown that SMEs require solutions that are automated and adapted to their context. In recent years, we have seen a surge in initiatives to share cyber threat intelligence (CTI) to improve collective cybersecurity resilience. Shared CTI has the potential to answer the SME call for automated and adaptable solutions. Sadly, as we demonstrate in this paper, current shared intelligence approaches scarcely address SME needs. We must investigate how shared CTI can be used to improve SME cybersecurity resilience. In this paper, we tackle this challenge using a systematic review to discover current state-of-the-art approaches to using shared CTI. We find that threat intelligence sharing platforms such as MISP have the potential to address SME needs, provided that the shared intelligence is turned into actionable insights. Based on this observation, we developed a prototype application that processes MISP data automatically, prioritises cybersecurity threats for SMEs, and provides SMEs with actionable recommendations tailored to their context. Subsequent evaluations in operational environments will help to improve our application, such that SMEs are enabled to thwart cyberattacks in future. Show less
The Cross-Industry Standard Process for Data Mining (CRISP-DM), despite being the most popular data mining process for more than two decades, is known to leave those organizations lacking... Show moreThe Cross-Industry Standard Process for Data Mining (CRISP-DM), despite being the most popular data mining process for more than two decades, is known to leave those organizations lacking operational data mining experience puzzled and unable to start their data mining projects. This is especially apparent in the first phase of Business Understanding, at the conclusion of which, the data mining goals of the project at hand should be specified, which arguably requires at least a conceptual understanding of the knowledge discovery process. We propose to bridge this knowledge gap from a Data Science perspective by applying Natural Language Processing techniques (NLP) to the organizations' e-mail exchange repositories to extract explicitly stated business goals from the conversations, thus bootstrapping the Business Understanding phase of CRISP-DM. Our NLP-Automated Method for Business Understanding (NAMBU) generates a list of business goals which can subsequently be used for further specification of data mining goals. The validation of the results on the basis of comparison to the results of manual business goal extraction from the Enron corpus demonstrates the usefulness of our NAMBU method when applied to large datasets. Show less