A data mining scenario is a logical sequence of steps to infer patterns from data. In this thesis, we present two scenarios. Our first scenario aims to identify homogeneous subtypes in data. It was... Show moreA data mining scenario is a logical sequence of steps to infer patterns from data. In this thesis, we present two scenarios. Our first scenario aims to identify homogeneous subtypes in data. It was applied to clinical research on Osteoarthritis (OA) and Parkinson’s disease (PD) and in drug discovery. Thus, because OA and PD are characterized by clinical heterogeneity, a more sensitive classification of the cohort of patients may contribute to the search for the underlying diseases mechanism. In drug discovery, subtyping may improve the understanding of the similarity (and distance) between different phenotypic effects as induced by drugs and chemicals. Our second scenario aims to compare text classification algorithms. First, we show that common classifiers achieve comparable performance on most problems. Second, tightly constrained SVM solutions are high performers. In that situation, most training documents are bounded support vectors, SVM reduces to a nearest mean classifier and no training is necessary, which raises a question on SVM merits in sparse bag of words feature spaces. Also, SVM is shown to suffer from performance deterioration for particular combinations of training set size/number of features. This relate to outlying documents of distinct classes overlapping in the feature space. Show less