A convolutional neural network (CNN) is a biologically inspired algorithm, highly capable at processing images and videos. Nowadays, CNNs are widely known and used: they watch our safety from the... Show moreA convolutional neural network (CNN) is a biologically inspired algorithm, highly capable at processing images and videos. Nowadays, CNNs are widely known and used: they watch our safety from the CCTV cameras, help doctors diagnose diseases, navigate cars, and do many other important things. One of the recent trends is to execute CNNs on edge devices: cameras, mobile phones, smart watches, etc. This helps to run CNNs faster and ensures privacy of the data used by the CNNs. This, however, is difficult to do. The problem is that the edge devices are small and often do not have enough resources to execute CNNs. In my dissertation, I study this problem and offer solutions for it. I propose specific manners to design and execute CNNs, so that they can run on edge devices efficiently. Show less
CNN design and deployment on embedded edge-processing systems is an error-prone and effort-hungry process, that poses the need for accurate and effective automated assisting tools. In such tools,... Show moreCNN design and deployment on embedded edge-processing systems is an error-prone and effort-hungry process, that poses the need for accurate and effective automated assisting tools. In such tools, pre-evaluating the platform-aware CNN metrics such as latency, energy cost, and throughput is a key requirement for successfully reaching the implementation goals imposed by use-case constraints. Especially when more complex parallel and heterogeneous computing platforms are considered, currently utilized estimation methods are inaccurate or require a lot of characterization experiments and efforts. In this paper, we propose an alternative method, designed to be flexible, easy to use, and accurate at the same time. Considering a modular platform and execution model that adequately describes the details of the platform and the scheduling of different CNN operators on different platform processing elements, our method captures precisely operations and data transfers and their deployment on computing and communication resources, signiflcantly improving the evaluation accuracy.We have tested our method on more than 2000 CNN layers, targeting an FPGA-based accelerator and a GPU platform as reference example architectures. Results have shown that our evaluation method increases the estimation precision by up to 5fl for execution time, and by 2fl for energy, compared to other widely used analytical methods. Moreover, we assessed the impact of the improved platform-awareness on a set of neural architecture search experiments, targeting both hardware platforms, and enforcing 2 sets of latency constraints, performing 5 trials on each search space, for a total number of 20 experiments. The predictability is improved by 4fl, reaching, with respect to alternatives, selection results clearly more similar to those obtained with on-hardware measurements. Show less