In many real-world applications today, it is critical to continuously record and monitor certain machine or system health indicators to discover malfunctions or other abnormal behavior at an early... Show moreIn many real-world applications today, it is critical to continuously record and monitor certain machine or system health indicators to discover malfunctions or other abnormal behavior at an early stage and prevent potential harm. The demand for such reliable monitoring systems is expected to increase in the coming years. Particularly in the industrial context, in the course of ongoing digitization, it is becoming increasingly important to analyze growing volumes of data in an automated manner using state-of-the-art algorithms. In many practical applications, one has to deal with temporal data in the form of data streams or time series. The problem of detecting unusual (or anomalous) behavior in time series is commonly referred to as time series anomaly detection. Anomalies are events observed in the data that do not conform to the normal or expected behavior when viewed in their temporal context.This thesis focuses on unsupervised machine learning algorithms for anomaly detection in time series. In an unsupervised learning setup, a model attempts to learn the normal behavior in a time series — which might already be contaminated with anomalies — without any external assistance. The model can then use its learned notion of normality to detect anomalous events. Show less
The research topic of the thesis is the extension of evolutionary multi-objective optimization for real-world scheduling problems. Several novel algorithms are proposed: the diversity indicator... Show moreThe research topic of the thesis is the extension of evolutionary multi-objective optimization for real-world scheduling problems. Several novel algorithms are proposed: the diversity indicator-based multi-objective evolutionary algorithm (DI-MOEA) can achieve a uniformly distributed solution set; the preference-based MOEA can obtain preferred solutions; the edge-rotated cone can improve the performance of MOEAs for many-objective optimization; and dynamic MOEA takes the stability as an extra objective. Besides the classical flexible job shop scheduling, the thesis proposes solutions for the novel problem domain of vehicle fleet maintenance scheduling optimization (VFMSO). The problem originated from the CIMPLO (Cross-Industry Predictive Maintenance Optimization Platform) project and the project partners Honda and KLM. The VFMSO problem is to determine the maintenance schedule for the vehicle fleet, meaning to find the best maintenance order, location and time for each component in the vehicle fleet based on the predicted remaining useful lifetimes of components and conditions of available workshops. The maintenance schedule is optimized to bring business advantages to industries, i.e., to reduce maintenance time, increase safety and save repair expenses. After formulating the problem as a scalable benchmark in an industrially relevant setting, the proposed algorithms have been successfully used to solve VFMSO problem instances. Show less
In this thesis it is posed that the central object of preference discovery is a co-creative process in which the Other can be represented by a machine. It explores efficient methods to enhance... Show moreIn this thesis it is posed that the central object of preference discovery is a co-creative process in which the Other can be represented by a machine. It explores efficient methods to enhance introverted intuition using extraverted intuition's communication lines. Possible implementations of such processes are presented using novel algorithms that perform divergent search to feed the users' intuition with many examples of high quality solutions, allowing them to take influence interactively. The machine feeds and reflects upon human intuition, combining both what is possible and preferred. The machine model and the divergent optimization algorithms are the motor behind this co-creative process, in which machine and users co-create and interactively choose branches of an ad hoc hierarchical decomposition of the solution space.The proposed co-creative process consists of several elements: a formal model for interactive co-creative processes, evolutionary divergent search, diversity and similarity, data-driven methods to discover diversity, limitations of artificial creative agents, matters of efficiency in behavioral and morphological modeling, visualization, a connection to prototype theory, and methods to allow users to influence artificial creative agents. This thesis helps putting the human back into the design loop in generative AI and optimization. Show less
In today's volatile market environments, companies must be able to continuously innovate. In this context, innovation does not only refer to the development of new products or business models but... Show moreIn today's volatile market environments, companies must be able to continuously innovate. In this context, innovation does not only refer to the development of new products or business models but often also affects the entire organization, which has to transform its structures, processes, and ways of working.Corporate entrepreneurship (CE) programs are often used by established companies to address these innovation and transformation challenges. In general, they are understood as formalized entrepreneurial activities to (1) support internal corporate ventures or (2) work with external startups. The organizational design and value creation of CE programs exhibit a high degree of heterogeneity. On the one hand, this heterogeneity makes CE programs a valuable management tool that can be used for many purposes. On the other hand, it can be seen as a reason for the current challenges that companies experience in effectively using and managing CE programs.By systematically analyzing 54 different cases in established companies in Germany, Switzerland, and Austria, this study contributes to a better understanding of the heterogeneity of CE programs. The taxonomic approach provides clearly defined types of CE programs that are distinguished according to their organizational design and the outputs they generate. Show less
In this work, we attempt to answer the question: "How to learn robust and interpretable rule-based models from data for machine learning and data mining, and define their optimality?".Rules provide... Show moreIn this work, we attempt to answer the question: "How to learn robust and interpretable rule-based models from data for machine learning and data mining, and define their optimality?".Rules provide a simple form of storing and sharing information about the world. As humans, we use rules every day, such as the physician that diagnoses someone with flu, represented by "if a person has either a fever or sore throat (among others), then she has the flu.". Even though an individual rule can only describe simple events, several aggregated rules can represent more complex scenarios, such as the complete set of diagnostic rules employed by a physician.The use of rules spans many fields in computer science, and in this dissertation, we focus on rule-based models for machine learning and data mining. Machine learning focuses on learning the model that best predicts future (previously unseen) events from historical data. Data mining aims to find interesting patterns in the available data.To answer our question, we use the Minimum Description Length (MDL) principle, which allows us to define the statistical optimality of rule-based models. Furthermore, we empirically show that this formulation is highly competitive for real-world problems. Show less
This thesis mainly focuses on cross-modal retrieval and single-modal image retrieval via deep learning methods, i.e. by using deep convolutional neural networks.For cross-modal retrieval, Shannon... Show moreThis thesis mainly focuses on cross-modal retrieval and single-modal image retrieval via deep learning methods, i.e. by using deep convolutional neural networks.For cross-modal retrieval, Shannon information entropy and adversarial learning are integrated to learn a common latent space for image data and text data. Furthermore, this thesis explores single-modal image retrieval in an incremental learning context to reduce the catastrophic forgetting of deep models, thereby expanding the continuous retrieval ability. The efficacy of the proposed methods in this thesis is verified by thorough experiments on the considered datasets. This thesis also gives an overview of new ideas and trends for multimodal content understanding. Show less
Interactive exploration of large volumes of data is increasingly common, as data scientists attempt to extract interesting information from large opaque data sets. This scenario presents a... Show moreInteractive exploration of large volumes of data is increasingly common, as data scientists attempt to extract interesting information from large opaque data sets. This scenario presents a difficult challenge for traditional database systems, as (1) nothing is known about the query workload in advance, (2) the query workload is constantly changing, and (3) the system must provide interactive responses to the issued queries. This environment is challenging for index creation, as traditional database indexes require upfront creation, hence a priori workload knowledge, to be efficient.In this work, we introduce Progressive Indexing, a novel performance-driven indexing technique that focuses on automatic index creation while providing interactive response times to incoming queries. Its design allows queries to have a limited budget to spend on index creation. The indexing budget is automatically tuned to each query before query processing. This allows for systems to provide interactive answers to queries during index creation while being robust against various workload patterns and data distributions.We develop progressive algorithms to index one and multiple dimensions. In addition, we introduce Progressive Merges, a robust algorithm that merges appends into our Progressive Indexes without penalizing single queries. Show less
In deep reinforcement learning, searching and learning techniques are two important components. They can be used independently and in combination to deal with different problems in AI. These... Show moreIn deep reinforcement learning, searching and learning techniques are two important components. They can be used independently and in combination to deal with different problems in AI. These results have inspired research into artificial general intelligence (AGI).We study table based classic Q-learning on the General Game Playing (GGP) system, showing that classic Q-learning works on GGP, although convergence is slow, and it is computationally expensive to learn complex games.This dissertation uses an AlphaZero-like self-play framework to explore AGI on small games. By tuning different hyper-parameters, the role, effects and contributions of searching and learning are studied. A further experiment shows that search techniques can contribute as experts to generate better training examples to speed up the start phase of training.In order to extend the AlphaZero-likeself-play approach to single player complex games, the Morpion Solitaire game is implemented by combining Ranked Reward method. Our first AlphaZero-based approach is able to achieve a near human best record.Overall, in this thesis, both searching and learning techniques are studied (by themselves and in combination) in GGP and AlphaZero-like self-play systems. We do so for the purpose of making steps towards artificial general intelligence, towards systems that exhibit intelligent behavior in more than one domain. Show less
Natural history collections provide invaluable sources for researchers with different disciplinary backgrounds, aspiring to study the geographical distribution of flora and fauna across the globe... Show moreNatural history collections provide invaluable sources for researchers with different disciplinary backgrounds, aspiring to study the geographical distribution of flora and fauna across the globe as well as other evolutionary processes. They are of paramount importance for mapping out long-term changes: from culture, to ecology, to how natural history is practiced.This thesis describes computational methods for knowledge extraction from archives of natural history collections---here referring to handwritten manuscripts and hand-drawn illustrations. As we are dealing with heterogeneous real-world data, the task becomes exceptionally challenging. Small samples and a long-tailed distribution, sometimes with very fine-grained distinctions between classes, hamper model learning. Prior knowledge is therefore needed to bootstrap the learning process. Moreover, archival content can be difficult to interpret and integrate, and should therefore be formally described for data integration within and across collections. By serving extracted knowledge to the Semantic Web, collections are made amenable for research and integration with other biodiversity resources on the Web. Show less
Today, knowledge is the most crucial element to stimulate organizational competitiveness and economic development. The ability of a firm to quickly recognize, assimilate, and utilize external... Show moreToday, knowledge is the most crucial element to stimulate organizational competitiveness and economic development. The ability of a firm to quickly recognize, assimilate, and utilize external knowledge is one of the core capabilities that bring organizational competitive advantages. Such an ability is called absorptive capacity (AC). This study focuses on three AC-related topics in the context of Chinese SMEs: 1) How do SMEs absorb external knowledge in terms of its recognition, assimilation, and utilization? 2) What challenges do SMEs face when absorbing external knowledge? And, 3)Which knowledge assimilation mechanisms do have an impact on the performance of SMEs? Show less
A New Technology-Based Firm (NTBF) is a significant enabler of job creation and a driver of the economy through stimulating innovation. In the last two decades, we have seen an enormous development... Show moreA New Technology-Based Firm (NTBF) is a significant enabler of job creation and a driver of the economy through stimulating innovation. In the last two decades, we have seen an enormous development of the NTBFs. However, the liability of smallness, newness, and weak networking ties are three important obstacles in the early stages of an NTBF’s lifecycle. Consequently, there is a high rate of failure among NTBFs.A remedy to avoid these failures is in using the support and resources by Business Incubators (BIs). BIs provide supportive services to promote the NTBFs capabilities and to help them address their liabilities.So far, there is almost no reliable evidence on the effectiveness of BIs on the performance of NTBFs. Therefore, we aim to identify the supportive activities by BIs and, to understand to what extent the supports by them have a serious impact on the performance of their NTBFs. Building on qualitative and quantitative research methods, a model to measure the impact of support by BIs on the performances of NTBFs is developed, and tested among Dutch and German NTBFs. The research results provide practical guidelines for the management teams of the incubators, which can increase the effectiveness of their performances. Show less
This dissertation presents the results of the importance of creativity for ICT-students of Dutch universities of applied sciences (in Dutch: hogescholen), and the functioning of training courses... Show moreThis dissertation presents the results of the importance of creativity for ICT-students of Dutch universities of applied sciences (in Dutch: hogescholen), and the functioning of training courses that aim to promote creative abilities is highlighted. The ability to generate new and potentially useful ideas and problem-solving skills as a result of creative thinking is an important driver of human evolution. According to many, creativity is a very valued and sought-after accomplishment for today's society and for the future. In addition, computers, and everything related to them, have become an integral part of society. The ‘computer’ is one of the most important innovations in the history of mankind. Computers have radically changed our lives. It is even hardly conceivable to innovate without ICT. It is therefore logical that ICT-professionals play an extremely prominent role in innovation. This applies in particular to students taking a Bachelor of ICT-course in a Dutch University of Applied Sciences, because they are trained as leading IT-specialists.These phenomena led to two interrelated research questions: (i) ”Is creativity training important for ICT-students at Dutch hogescholen?”; and (ii): “Does creativity training work, as it is integrated in the curriculum of these ICT-students?” Show less
Machine Learning is becoming a more and more substantial technology for industry. It allows to learn from previous experiences in an automated way to make decisions based on the learned behavior.... Show moreMachine Learning is becoming a more and more substantial technology for industry. It allows to learn from previous experiences in an automated way to make decisions based on the learned behavior. Machine learning enables the development of completely new products like autonomous driving or services which are purely driven by data.The development of such new data-driven products is often a long procedure. Even the application of machine learning algorithms to specific problems is mostly not straightforward. To illustrate this, a data-driven service, called Automated Damage Assessment, from the automotive industry is introduced in this work. Based on the gained experience from such data-driven service developments, this dissertation proposes a methodology to develop data-driven services in an accurate and fast manner. The Automated Damage Assessment service is based on sensor data, i.e. data recorded from vehicle on-board sensors over time. Using such time series from more than one sensor results in a multivariate time series. The existent methods to solve multivariate time series classification-problems are often complex and developed for specific problems without being scalable. To overcome this, suitable approaches with different complexities are proposed in this work. These approaches are applied on multiple publicly available data sets and on real-world data sets from medical and industrial domain with the result that especially two AutoML (Automated Machine Learning) approaches, namely GAMA and ATM, as well as one of the proposed approaches (PHCP) are most suitable to solve these particular multivariate time series problems. Show less
This thesis focuses on addressing four research problems in designing embedded streaming systems. Embedded streaming systems are those systems thatprocess a stream of input data coming from the... Show moreThis thesis focuses on addressing four research problems in designing embedded streaming systems. Embedded streaming systems are those systems thatprocess a stream of input data coming from the environment and generate a stream of output data going into the environment. For many embeddedstreaming systems, the timing is a critical design requirement, in which the correct behavior depends on both the correctness of output data and on the time at which the data is produced. An embedded streaming system subjected to such a timing requirement is called a real-time system. Some examples of real-time embedded streaming systems can be found in various autonomous mobile systems, such as planes, self-driving cars, and drones. To handle the tight timing requirements of such real-time embedded streaming systems, modern embedded systems have been equipped with hardware platforms, the so-called Multi-Processor Systems-on-Chip (MPSoC), that contain multiple processors, memories, interconnections, and other hardware peripherals on a single chip, to benefit from parallel execution. To efficiently exploit the computational capacity of an MPSoC platform, a streaming application which is going to be executed on the MPSoC platform must be expressed primarily in a parallel fashion, i.e., the application is represented as a set of parallel executing and communicating tasks. Then, the main challenge is how to schedule the tasks spatially, i.e., task mapping, and temporally, i.e., task scheduling, on the MPSoC platform such that all timing requirements are satisfied while making efficient utilization of available resources (e.g, processors, memory, energy, etc.) on the platform. Another challenge is how to implement and run the mapped and scheduled application tasks on the MPSoC platform. This thesis proposes several techniques to address the aforementioned two challenges. Show less
It is a common technique in global optimization with expensive black-box functions to learn a surrogate-model of the response function from past evaluations and use it to decide on the location of... Show moreIt is a common technique in global optimization with expensive black-box functions to learn a surrogate-model of the response function from past evaluations and use it to decide on the location of future evaluations.In surrogate-model-assisted optimization, selecting the right modeling technique without preliminary knowledge about the objective function can be challenging. It might be beneficial if the algorithm trains many different surrogate models and selects the model with the smallest training error. This approach is known as model selection.In this thesis, a generalization of this approach is developed. Instead of choosing a single model, the optimal convex combinations of model predictions is used to combine surrogate models into one more accurate ensemble surrogate model.This approach is studied in a fundamental way, by first evaluating minimalistic ensembles of only two surrogate models in detail and then proceeding to ensembles with more surrogate models.Finally, the approach is adopted and evaluated in the context of sequential parameter optimization. Besides discussing the general strategy, the optimal frequency of learning the convex combination of weights is investigated.The results provide insights into the performance, scalability, and robustness of the approach. Show less
In this PhD-thesis several new and existing data science application are described that are particularly focused on applications for tax administrations. The thesis contains a chapter on the... Show moreIn this PhD-thesis several new and existing data science application are described that are particularly focused on applications for tax administrations. The thesis contains a chapter on the managerial side of analytics with a balanced overview of the pros and cons of applying analytics within taxpayer supervision. Another topic is (tax) fraud detection with unsupervised anomaly detection techniques. Here a new type of outliers is described (singular outliers) and an algorithm is provided for finding them. Attention is also paid to improving risk selection models. It is noted that most current algorithms cannot treat interactions of categorical variables with many levels very well. An extension of logistic regression is provided that uses Factorization Machines, which resulted in a ten percent improvement in precision. A fourth topic is statistical testing on similar treatment of similar cases. A contribution is made by providing an algorithm to statistically test on similar treatment based on process logs. The thesis contains further a benchmark study of different anomaly detection algorithms. Finally HR Analytics, Reinforcement Learning and applications of fuzzy sets are shortly described. Show less
The thesis is part of a bigger project, the HEPGAME (High Energy Physics Game). The main objective for HEPGAME is the utilization of AI solutions, particularly by using MCTS for simplification of... Show moreThe thesis is part of a bigger project, the HEPGAME (High Energy Physics Game). The main objective for HEPGAME is the utilization of AI solutions, particularly by using MCTS for simplification of HEP calculations. One of the issues is solving mathematical expressions of interest with millions of terms. These calculations can be solved with the FORM program, which is software for symbolic manipulation. Since these calculations are computationally intensive and take a large amount of time, the FORM program was parallelized to solve them in a reasonable amount of time.Therefore, any new algorithm based on MCTS, should also be parallelized. This requirement was behind the problem statement of the thesis: “How do we design a structured pattern-based parallel programming approach for efficient parallelism of MCTS for both multi-core and manycore shared-memory machines?”.To answer this question, the thesis approached the MCTS parallelization problem in three levels: (1) implementation level, (2) data structure level, and (3) algorithm level.In the implementation level, we proposed task-level parallelization over thread-level parallelization. Task-level parallelization provides us with efficient parallelism for MCTS to utilize cores on both multi-core and manycore machines.In the data structure level, we presented a lock-free data structure that guarantees the correctness. A lock-free data structure (1) removes the synchronization overhead when a parallel program needs many tasks to feed its cores and (2) improves both performance and scalability.In the algorithm level, we first explained how to use pipeline pattern for parallelization of MCTS to overcome search overhead. Then, through a step by step approach, we were able to propose and detail the structured parallel programming approach for Monte Carlo Tree Search. Show less
The development process of any software has become extremely important not just in the IT industry, but in almost every business or domain of research. The effort in making this process quick,... Show moreThe development process of any software has become extremely important not just in the IT industry, but in almost every business or domain of research. The effort in making this process quick, efficient, reliable and automated has constantly evolved into a flow that delivers software incrementally based on both the developer's best skills and the end user's feedback. Software modeling and modeling languages have the purpose of facilitating product development by designing correct and reliable applications. The concurrency model of the Abstract Behavioural Specification (ABS) Language with features for asynchronous programming and cooperative scheduling is an important example of how modeling contributes to the reliability and robustness of a product. By abstracting from the implementation details, program complexity and inner workings of libraries, software modeling, and specifically ABS, allow for an easier use of formal analysis techniques and proofs to support product design. However there is still a gap that exists between modeling languages and programming languages with the process of software development often going on two separate paths with respect to modeling and implementation. This potentially introduces errors and doubles the development effort. \par The overall objective of this research is bridging the gap between modeling and programming in order to provide a smooth integration between formal methods and two of the most well-known and used languages for software development, the Java and Scala languages. The research focuses mainly on sequential and highly parallelizable applications, but part of the research also involves some theoretical proposals for distributed systems. It is a first step towards having a programming language with support for formal models. Show less
Optical projection tomography (OPT) is a tomographic 3D imaging technique used for specimens in the millimetre scale. 3D images are computed from a tomogram and therefore OPT is considered as... Show moreOptical projection tomography (OPT) is a tomographic 3D imaging technique used for specimens in the millimetre scale. 3D images are computed from a tomogram and therefore OPT is considered as computational imaging. In order to provide imaging and image analysis solutions for large scale biomedical research, optimisation of the OPT reconstruction is required. The aim of the optimisation presented in this thesis includes: (1) accelerate the reconstruction process; (2) reduce the reconstruction artefacts; (3) improve the image quality of 3D image; (4) Find optimal parameters for the iterative reconstruction.Starting from the optimisations that we have elaborated and implemented in the OPT imaging workflow, we have worked on case studies in zebrafish imaging. In this thesis we present one such particular case study (5) as it falls nicely in the order of magnitude for specimens in OPT imaging. The case study is on quantification of tumours in zebrafish and it is explored with image segmentation and object detection using artificial intelligence (AI) techniques. Show less