BackgroundPersistent somatic symptoms (PSS) are common in primary care and often accompanied by an increasing disease burden for both the patient and healthcare. In medical practice, PSS is... Show moreBackgroundPersistent somatic symptoms (PSS) are common in primary care and often accompanied by an increasing disease burden for both the patient and healthcare. In medical practice, PSS is historically considered a diagnosis by exclusion or primarily seen as psychological. Besides, registration of PSS in electronic health records (EHR) is ambiguous and possibly does not reflect classification adequately. The present study explores how general practitioners (GPs) currently register PSS, and their view regarding the need for improvements in classification, registration, and consultations.MethodDutch GPs were invited by email to participate in a national cross-sectional online survey. The survey addressed ICPC-codes used by GPs to register PSS, PSS-related terminology added to free text areas, usage of PSS-related syndrome codes, and GPs’ need for improvement of PSS classification, registration and care.ResultsGPs (n = 259) were most likely to use codes specific to the symptom presented (89.3%). PSS-related terminology in free-text areas was used sparsely. PSS-related syndrome codes were reportedly used by 91.5% of GPs, but this was primarily the case for the code for irritable bowel syndrome. The ambiguous registration of PSS is reported as problematic by 47.9% of GPs. Over 56.7% of GPs reported needing additional training, tools or other support for PSS classification and consultation. GPs also reported needing other referral options and better guidelines.ConclusionsRegistration of PSS in primary care is currently ambiguous. Approximately half of GPs felt a need for more options for registration of PSS and reported a need for further support. In order to improve classification, registration and care for patients with PSS, there is a need for a more appropriate coding scheme and additional training. Show less
Background: Financial codes are often used to extract diagnoses from electronic health records. This approach is prone to false positives. Alternatively, queries are constructed, but these are... Show moreBackground: Financial codes are often used to extract diagnoses from electronic health records. This approach is prone to false positives. Alternatively, queries are constructed, but these are highly center and language specific. A tantalizing alternative is the automatic identification of patients by employing machine learning on format-free text entries.Objective: The aim of this study was to develop an easily implementable workflow that builds a machine learning algorithm capable of accurately identifying patients with rheumatoid arthritis from format-free text fields in electronic health records.Methods: Two electronic health record data sets were employed: Leiden (n=3000) and Erlangen (n=4771). Using a portion of the Leiden data (n=2000), we compared 6 different machine learning methods and a naive word-matching algorithm using 10-fold cross-validation. Performances were compared using the area under the receiver operating characteristic curve (AUROC) and the area under the precision recall curve (AUPRC), and F1 score was used as the primary criterion for selecting the best method to build a classifying algorithm. We selected the optimal threshold of positive predictive value for case identification based on the output of the best method in the training data. This validation workflow was subsequently applied to a portion of the Erlangen data (n=4293). For testing, the best performing methods were applied to remaining data (Leiden n=1000; Erlangen n=478) for an unbiased evaluation.Results: For the Leiden data set, the word-matching algorithm demonstrated mixed performance (AUROC 0.90; AUPRC 0.33; F1 score 0.55), and 4 methods significantly outperformed word-matching, with support vector machines performing best (AUROC 0.98; AUPRC 0.88; F1 score 0.83). Applying this support vector machine classifier to the test data resulted in a similarly high performance (F1 score 0.81; positive predictive value [PPV] 0.94), and with this method, we could identify 2873 patients with rheumatoid arthritis in less than 7 seconds out of the complete collection of 23,300 patients in the Leiden electronic health record system. For the Erlangen data set, gradient boosting performed best (AUROC 0.94; AUPRC 0.85; F1 score 0.82) in the training set, and applied to the test data, resulted once again in good results (F1 score 0.67; PPV 0.97).Conclusions: We demonstrate that machine learning methods can extract the records of patients with rheumatoid arthritis from electronic health record data with high precision, allowing research on very large populations for limited costs. Our approach is language and center independent and could be applied to any type of diagnosis. We have developed our pipeline into a universally applicable and easy-to-implement workflow to equip centers with their own high-performing algorithm. This allows the creation of observational studies of unprecedented size covering different countries for low cost from already available data in electronic health record systems. Show less