Real-life processes are characterized by dynamics involving time. Examples are walking, sleeping, disease progress in medical treatment, and events in a workflow. To understand complex behavior one... Show moreReal-life processes are characterized by dynamics involving time. Examples are walking, sleeping, disease progress in medical treatment, and events in a workflow. To understand complex behavior one needs expressive models, parsimonious enough to gain insight. Uncertainty is often fundamental for process characterization, e.g., because we sometimes can observe phenomena only partially. This makes probabilistic graphical models a suitable framework for process analysis. In this thesis, new probabilistic graphical models that offer the right balance between expressiveness and interpretability are proposed, inspired by the analysis of complex, real-world problems. We first investigate processes by introducing latent variables, which capture abstract notions from observable data (e.g., intelligence, health status). Such models often provide more accurate descriptions of processes. In medicine, such models can also reveal insight on patient treatment, such as predictive symptoms. The second viewpoint looks at processes by identifying time points in the data where the relationships between observable variables change. This provides an alternative characterization of process change. Finally, we try to better understand processes by identifying subgroups of data that deviate from the whole dataset, e.g., process workflows whose event dynamics differ from the general workflow. Show less
Finding subsets of a dataset that somehow deviate from the norm, i.e. where something interesting is going on, is a classical Data Mining task. In traditional local pattern mining methods, such... Show moreFinding subsets of a dataset that somehow deviate from the norm, i.e. where something interesting is going on, is a classical Data Mining task. In traditional local pattern mining methods, such deviations are measured in terms of a relatively high occurrence (frequent itemset mining), or an unusual distribution for one designated target attribute (subgroup discovery). These, however, do not encompass all forms of "interesting". To capture a more general notion of interestingness in subsets of a dataset, we develop Exceptional Model Mining (EMM). This is a supervised local pattern mining framework, where several target attributes are selected, and a model over these attributes is chosen to be the target concept. Then, subsets are sought on which this model is substantially different from the model on the whole dataset. For instance, we can find parts of the data where two target attributes have an unusual correlation, a classifier has a deviating predictive performance, or a Bayesian network fitted on several target attributes has an exceptional structure. We will discuss some real-world applications of EMM instances, including using the Bayesian network model to identify meteorological conditions under which food chains are displaced, and using a regression model to find the subset of households in the Chinese province of Hunan that do not follow the general economic law of demand. Show less