This dissertation presents the results of the importance of creativity for ICT-students of Dutch universities of applied sciences (in Dutch: hogescholen), and the functioning of training courses... Show moreThis dissertation presents the results of the importance of creativity for ICT-students of Dutch universities of applied sciences (in Dutch: hogescholen), and the functioning of training courses that aim to promote creative abilities is highlighted. The ability to generate new and potentially useful ideas and problem-solving skills as a result of creative thinking is an important driver of human evolution. According to many, creativity is a very valued and sought-after accomplishment for today's society and for the future. In addition, computers, and everything related to them, have become an integral part of society. The ‘computer’ is one of the most important innovations in the history of mankind. Computers have radically changed our lives. It is even hardly conceivable to innovate without ICT. It is therefore logical that ICT-professionals play an extremely prominent role in innovation. This applies in particular to students taking a Bachelor of ICT-course in a Dutch University of Applied Sciences, because they are trained as leading IT-specialists.These phenomena led to two interrelated research questions: (i) ”Is creativity training important for ICT-students at Dutch hogescholen?”; and (ii): “Does creativity training work, as it is integrated in the curriculum of these ICT-students?” Show less
Machine Learning is becoming a more and more substantial technology for industry. It allows to learn from previous experiences in an automated way to make decisions based on the learned behavior.... Show moreMachine Learning is becoming a more and more substantial technology for industry. It allows to learn from previous experiences in an automated way to make decisions based on the learned behavior. Machine learning enables the development of completely new products like autonomous driving or services which are purely driven by data.The development of such new data-driven products is often a long procedure. Even the application of machine learning algorithms to specific problems is mostly not straightforward. To illustrate this, a data-driven service, called Automated Damage Assessment, from the automotive industry is introduced in this work. Based on the gained experience from such data-driven service developments, this dissertation proposes a methodology to develop data-driven services in an accurate and fast manner. The Automated Damage Assessment service is based on sensor data, i.e. data recorded from vehicle on-board sensors over time. Using such time series from more than one sensor results in a multivariate time series. The existent methods to solve multivariate time series classification-problems are often complex and developed for specific problems without being scalable. To overcome this, suitable approaches with different complexities are proposed in this work. These approaches are applied on multiple publicly available data sets and on real-world data sets from medical and industrial domain with the result that especially two AutoML (Automated Machine Learning) approaches, namely GAMA and ATM, as well as one of the proposed approaches (PHCP) are most suitable to solve these particular multivariate time series problems. Show less
This thesis focuses on addressing four research problems in designing embedded streaming systems. Embedded streaming systems are those systems thatprocess a stream of input data coming from the... Show moreThis thesis focuses on addressing four research problems in designing embedded streaming systems. Embedded streaming systems are those systems thatprocess a stream of input data coming from the environment and generate a stream of output data going into the environment. For many embeddedstreaming systems, the timing is a critical design requirement, in which the correct behavior depends on both the correctness of output data and on the time at which the data is produced. An embedded streaming system subjected to such a timing requirement is called a real-time system. Some examples of real-time embedded streaming systems can be found in various autonomous mobile systems, such as planes, self-driving cars, and drones. To handle the tight timing requirements of such real-time embedded streaming systems, modern embedded systems have been equipped with hardware platforms, the so-called Multi-Processor Systems-on-Chip (MPSoC), that contain multiple processors, memories, interconnections, and other hardware peripherals on a single chip, to benefit from parallel execution. To efficiently exploit the computational capacity of an MPSoC platform, a streaming application which is going to be executed on the MPSoC platform must be expressed primarily in a parallel fashion, i.e., the application is represented as a set of parallel executing and communicating tasks. Then, the main challenge is how to schedule the tasks spatially, i.e., task mapping, and temporally, i.e., task scheduling, on the MPSoC platform such that all timing requirements are satisfied while making efficient utilization of available resources (e.g, processors, memory, energy, etc.) on the platform. Another challenge is how to implement and run the mapped and scheduled application tasks on the MPSoC platform. This thesis proposes several techniques to address the aforementioned two challenges. Show less
It is a common technique in global optimization with expensive black-box functions to learn a surrogate-model of the response function from past evaluations and use it to decide on the location of... Show moreIt is a common technique in global optimization with expensive black-box functions to learn a surrogate-model of the response function from past evaluations and use it to decide on the location of future evaluations.In surrogate-model-assisted optimization, selecting the right modeling technique without preliminary knowledge about the objective function can be challenging. It might be beneficial if the algorithm trains many different surrogate models and selects the model with the smallest training error. This approach is known as model selection.In this thesis, a generalization of this approach is developed. Instead of choosing a single model, the optimal convex combinations of model predictions is used to combine surrogate models into one more accurate ensemble surrogate model.This approach is studied in a fundamental way, by first evaluating minimalistic ensembles of only two surrogate models in detail and then proceeding to ensembles with more surrogate models.Finally, the approach is adopted and evaluated in the context of sequential parameter optimization. Besides discussing the general strategy, the optimal frequency of learning the convex combination of weights is investigated.The results provide insights into the performance, scalability, and robustness of the approach. Show less
In this PhD-thesis several new and existing data science application are described that are particularly focused on applications for tax administrations. The thesis contains a chapter on the... Show moreIn this PhD-thesis several new and existing data science application are described that are particularly focused on applications for tax administrations. The thesis contains a chapter on the managerial side of analytics with a balanced overview of the pros and cons of applying analytics within taxpayer supervision. Another topic is (tax) fraud detection with unsupervised anomaly detection techniques. Here a new type of outliers is described (singular outliers) and an algorithm is provided for finding them. Attention is also paid to improving risk selection models. It is noted that most current algorithms cannot treat interactions of categorical variables with many levels very well. An extension of logistic regression is provided that uses Factorization Machines, which resulted in a ten percent improvement in precision. A fourth topic is statistical testing on similar treatment of similar cases. A contribution is made by providing an algorithm to statistically test on similar treatment based on process logs. The thesis contains further a benchmark study of different anomaly detection algorithms. Finally HR Analytics, Reinforcement Learning and applications of fuzzy sets are shortly described. Show less
The thesis is part of a bigger project, the HEPGAME (High Energy Physics Game). The main objective for HEPGAME is the utilization of AI solutions, particularly by using MCTS for simplification of... Show moreThe thesis is part of a bigger project, the HEPGAME (High Energy Physics Game). The main objective for HEPGAME is the utilization of AI solutions, particularly by using MCTS for simplification of HEP calculations. One of the issues is solving mathematical expressions of interest with millions of terms. These calculations can be solved with the FORM program, which is software for symbolic manipulation. Since these calculations are computationally intensive and take a large amount of time, the FORM program was parallelized to solve them in a reasonable amount of time.Therefore, any new algorithm based on MCTS, should also be parallelized. This requirement was behind the problem statement of the thesis: “How do we design a structured pattern-based parallel programming approach for efficient parallelism of MCTS for both multi-core and manycore shared-memory machines?”.To answer this question, the thesis approached the MCTS parallelization problem in three levels: (1) implementation level, (2) data structure level, and (3) algorithm level.In the implementation level, we proposed task-level parallelization over thread-level parallelization. Task-level parallelization provides us with efficient parallelism for MCTS to utilize cores on both multi-core and manycore machines.In the data structure level, we presented a lock-free data structure that guarantees the correctness. A lock-free data structure (1) removes the synchronization overhead when a parallel program needs many tasks to feed its cores and (2) improves both performance and scalability.In the algorithm level, we first explained how to use pipeline pattern for parallelization of MCTS to overcome search overhead. Then, through a step by step approach, we were able to propose and detail the structured parallel programming approach for Monte Carlo Tree Search. Show less
The development process of any software has become extremely important not just in the IT industry, but in almost every business or domain of research. The effort in making this process quick,... Show moreThe development process of any software has become extremely important not just in the IT industry, but in almost every business or domain of research. The effort in making this process quick, efficient, reliable and automated has constantly evolved into a flow that delivers software incrementally based on both the developer's best skills and the end user's feedback. Software modeling and modeling languages have the purpose of facilitating product development by designing correct and reliable applications. The concurrency model of the Abstract Behavioural Specification (ABS) Language with features for asynchronous programming and cooperative scheduling is an important example of how modeling contributes to the reliability and robustness of a product. By abstracting from the implementation details, program complexity and inner workings of libraries, software modeling, and specifically ABS, allow for an easier use of formal analysis techniques and proofs to support product design. However there is still a gap that exists between modeling languages and programming languages with the process of software development often going on two separate paths with respect to modeling and implementation. This potentially introduces errors and doubles the development effort. \par The overall objective of this research is bridging the gap between modeling and programming in order to provide a smooth integration between formal methods and two of the most well-known and used languages for software development, the Java and Scala languages. The research focuses mainly on sequential and highly parallelizable applications, but part of the research also involves some theoretical proposals for distributed systems. It is a first step towards having a programming language with support for formal models. Show less
Optical projection tomography (OPT) is a tomographic 3D imaging technique used for specimens in the millimetre scale. 3D images are computed from a tomogram and therefore OPT is considered as... Show moreOptical projection tomography (OPT) is a tomographic 3D imaging technique used for specimens in the millimetre scale. 3D images are computed from a tomogram and therefore OPT is considered as computational imaging. In order to provide imaging and image analysis solutions for large scale biomedical research, optimisation of the OPT reconstruction is required. The aim of the optimisation presented in this thesis includes: (1) accelerate the reconstruction process; (2) reduce the reconstruction artefacts; (3) improve the image quality of 3D image; (4) Find optimal parameters for the iterative reconstruction.Starting from the optimisations that we have elaborated and implemented in the OPT imaging workflow, we have worked on case studies in zebrafish imaging. In this thesis we present one such particular case study (5) as it falls nicely in the order of magnitude for specimens in OPT imaging. The case study is on quantification of tumours in zebrafish and it is explored with image segmentation and object detection using artificial intelligence (AI) techniques. Show less
The database research community has made tremendous strides in developing powerful database engines that allow for efficient analytical query processing. However, these powerful systems have gone... Show moreThe database research community has made tremendous strides in developing powerful database engines that allow for efficient analytical query processing. However, these powerful systems have gone largely unused by analysts and data scientists. This poor adoption is caused primarily by the state of database-client integration. In this thesis we attempt to overcome this challenge by investigating how we can facilitate efficient and painless integration of analytical tools and relational database management systems. We focus our investigation on the three primary methods for database-client integration: client-server connections, in-database processing and embedding the database inside the client application. Show less
Optimization tasks in practice have multifaceted challenges as they are often black box, subject to multiple equality and inequality constraints and expensive to evaluate. The efficiency of a... Show moreOptimization tasks in practice have multifaceted challenges as they are often black box, subject to multiple equality and inequality constraints and expensive to evaluate. The efficiency of a constrained optimizer has a crucial importance when it comes to selecting a suitable method for solving real-world optimization problems from industry with strict resource limitations. The primary concern of this work is to develop new black box optimization algorithms which are generic enough to successfully handle a broad set of constrained optimization problems (COPs) efficiently and without requiring apriori parameter tuning for different classes of the problems. To achieve this goal we benefit from two main conceptual components in the development of new constrained solvers: 1. utilizing surrogate modeling techniques to save real function evaluations, 2. automatically adjusting sensitive problem-dependent parameters based on the information gained about the problems during the optimization procedure. This work eventuated in the development of two surrogate-assisted constrained solvers: SACOBRA and SOCU. It turns out that SACOBRA outperforms most other COP-solvers in solving the well-known G-problem suite and MOPTA08 (a COP from automotive industry), if the number of function evaluations is strongly limited. Show less
A P300-based Brain Computer Interface character speller, also known as P300 speller, has been an important communication pathway, under extensive research, for people who lose motor ability, such... Show moreA P300-based Brain Computer Interface character speller, also known as P300 speller, has been an important communication pathway, under extensive research, for people who lose motor ability, such as patients with Amyotrophic Lateral Sclerosis or spinal-cord injury because a P300 speller allows human-beings to directly spell characters using eye-gazes, thereby building communication between the human brain and a computer. Unfortunately, P300 spellers are still not used in human’s daily life and remain in an experimental stage at research labs. The reason for this situation is that the performance and the efficiency of current P300 spellers are unacceptably low for BCI users in their daily life. Therefore, in this thesis, we have focused our attention on developing high performance and efficient P300 spellers in order to bring P300 spellers into practical use. More specifically, in order to increase the performance of a P300 speller, we have developed methods to increase the character spelling accuracy and the Information Transfer Rate. In order to improve the efficiency of a P300 speller, we have developed methods to reduce the number of sensors needed to acquire EEG signals as well as to reduce the complexity of the classifier used in a P300 speller without losing the performance. Show less
Business Process Model and Notation (BPMN) has become the standard for business processes diagrams. In order to provide tools support to analyze the behavior of a BPMN model, we present a mapping... Show moreBusiness Process Model and Notation (BPMN) has become the standard for business processes diagrams. In order to provide tools support to analyze the behavior of a BPMN model, we present a mapping of BPMN models to Reo networks. The Reo coordination language is an exogenous coordination language that realizes the coordination patterns. We present a constraint-based framework, which unifies various formal semantics of Reo. In this framework, the behavior of a Reo network is described using constraints. The constraint-based nature of our approach allows the simultaneous coexistence of several semantics in a simple fashion. The behavior of a Reo network is determined by the solutions to these constraints. Since any solution should satisfy all the encoded formal semantics, the framework eliminated any inconsistent behavior between the Reo formal semantics. Another advantage of our proposed constraint-based approach is its efficiency due to optimization techniques that are used in the off-the-shelf constraint solvers. We support this claim with a case study. In this dissertation, we present an alternative approach to model priority in Reo by extending our constraint-based framework with priority-aware premises. Further, we extend our priority-aware formal model to support not only a binary notion of priority, but also numeric priorities. Show less
In multi/many-core System-on-Chips (SoCs), the performance is almost linearly scaling with the number of processing elements. In order to achieve higher performance, the many-core SoCs have to... Show moreIn multi/many-core System-on-Chips (SoCs), the performance is almost linearly scaling with the number of processing elements. In order to achieve higher performance, the many-core SoCs have to integrate more processing elements, which results in the communication between processing elements being a bottleneck for the performance improvement. A Network-on-Chip (NoC), with low network latency, high bandwidth, good scalability, and reusability, is promising communication fabric for the many-core SoCs. However, NoCs consume too much power in real chips, which constraints the utilization of NoCs in future large-scale many-core SoC. Meanwhile, with more advanced semiconductor technologies, applied in chip manufacturing, the static power consumption takes a larger proportion of the total power consumption of a NoC. Thus, in this thesis, we have focused our attention on reducing the static power consumption of NoCs in two directions: applying efficient power gating on NoCs to reduce the static power consumption and realizing a confined-interference communication on a simplified NoC infrastructure to achieve energy-efficient packet transmission. Show less
Real-life processes are characterized by dynamics involving time. Examples are walking, sleeping, disease progress in medical treatment, and events in a workflow. To understand complex behavior one... Show moreReal-life processes are characterized by dynamics involving time. Examples are walking, sleeping, disease progress in medical treatment, and events in a workflow. To understand complex behavior one needs expressive models, parsimonious enough to gain insight. Uncertainty is often fundamental for process characterization, e.g., because we sometimes can observe phenomena only partially. This makes probabilistic graphical models a suitable framework for process analysis. In this thesis, new probabilistic graphical models that offer the right balance between expressiveness and interpretability are proposed, inspired by the analysis of complex, real-world problems. We first investigate processes by introducing latent variables, which capture abstract notions from observable data (e.g., intelligence, health status). Such models often provide more accurate descriptions of processes. In medicine, such models can also reveal insight on patient treatment, such as predictive symptoms. The second viewpoint looks at processes by identifying time points in the data where the relationships between observable variables change. This provides an alternative characterization of process change. Finally, we try to better understand processes by identifying subgroups of data that deviate from the whole dataset, e.g., process workflows whose event dynamics differ from the general workflow. Show less
By training with virtual opponents known as computer generated forces (CGFs), trainee fighter pilots can build the experience necessary for air combat operations, at a fraction of the cost of... Show moreBy training with virtual opponents known as computer generated forces (CGFs), trainee fighter pilots can build the experience necessary for air combat operations, at a fraction of the cost of training with real aircraft. In practice however, the variety of CGFs is not as wide as it can be. This is largely due to a lack of behaviour models for the CGFs. In this thesis we investigate to what extent behaviour models for the CGFs in air combat training simulations can be automatically generated, by the use of machine learning.The domain of air combat is complex, and machine learning methods that operate within this domain must be suited to the challenges posed by the domain. Our research shows that the dynamic scripting algorithm greatly facilitates the automatic generation of air combat behaviour models, while being sufficiently flexible to be moulded into answers to the challenges. However, ensuring the validity of the newly generated behaviour models remains to be a point of attention for future research. Show less