Eldorado Community:

Eldorado Community: http://hdl.handle.net/2003/74 2024-07-26T23:43:38Z 2024-07-26T23:43:38Z Unlocking efficiency in BNNs: global by local thresholding for analog-based HW accelerators Yayla, Mikail Frustaci, Fabio Spagnolo, Fanny Chen, Jian-Jia Amrouch, Hussam http://hdl.handle.net/2003/42564 2024-06-28T22:12:49Z 2023-09-14T00:00:00Z

Title: Unlocking efficiency in BNNs: global by local thresholding for analog-based HW accelerators Authors: Yayla, Mikail; Frustaci, Fabio; Spagnolo, Fanny; Chen, Jian-Jia; Amrouch, Hussam Abstract: For accelerating Binarized Neural Networks (BNNs), analog computing-based crossbar accelerators, utilizing XNOR gates and additional interface circuits, have been proposed. Such accelerators demand a large amount of analog-to-digital converters (ADCs) and registers, resulting in expensive designs. To increase the inference efficiency, the state of the art divides the interface circuit into an Analog Path (AP), utilizing (cheap) analog comparators, and a Digital Path (DP), utilizing (expensive) ADCs and registers. During BNN execution, a certain path is selectively triggered. Ideally, as inference via AP is more efficient, it should be triggered as often as possible. However, we reveal that, unless the number of weights is very small, the AP is rarely triggered. To overcome this, we propose a novel BNN inference scheme, called Local Thresholding Approximation (LTA). It approximates the global thresholdings in BNNs by local thresholdings. This enables the use of the AP through most of the execution, which significantly increases the interface circuit efficiency. In our evaluations with two BNN architectures, using LTA reduces the area by 42x and 54x, the energy by 2.7x and 4.2x, and the latency by 3.8x and 1.15x, compared to the state-of-the-art crossbar-based BNN accelerators.

2023-09-14T00:00:00Z Probabilistic reaction time analysis Günzel, Mario Ueter, Niklas Chen, Kuan-Hsun Brüggen, Georg von der Chen, Jian-Jia http://hdl.handle.net/2003/42513 2024-05-22T22:13:01Z 2023-09-09T00:00:00Z

Title: Probabilistic reaction time analysis Authors: Günzel, Mario; Ueter, Niklas; Chen, Kuan-Hsun; Brüggen, Georg von der; Chen, Jian-Jia Abstract: In many embedded systems, for instance, in the automotive, avionic, or robotics domain, critical functionalities are implemented via chains of communicating recurrent tasks. To ensure safety and correctness of such systems, guarantees on the reaction time, that is, the delay between a cause (e.g., an external activity or reading of a sensor) and the corresponding effect, must be provided. Current approaches focus on the maximum reaction time, considering the worst-case system behavior. However, in many scenarios, probabilistic guarantees on the reaction time are sufficient. That is, it is sufficient to provide a guarantee that the reaction does not exceed a certain threshold with (at least) a certain probability. This work provides such probabilistic guarantees on the reaction time, considering two types of randomness: response time randomness and failure probabilities. To the best of our knowledge, this is the first work that defines and analyzes probabilistic reaction time for cause-effect chains based on sporadic tasks.

2023-09-09T00:00:00Z A vision for edge AI Yayla, Mikail http://hdl.handle.net/2003/42431 2024-04-11T06:19:30Z 2024-01-01T00:00:00Z

Title: A vision for edge AI Authors: Yayla, Mikail Abstract: Edge Artificial Intelligence is progressively pervading all aspects of our life. However, to perform complex tasks, a massive amount of matrix multiplications needs to be computed. At the same time, the available hardware resources for computation are highly limited. The pressing need for efficiency serves as the motivation for this dissertation. In this dissertation, we propose a vision for highly-resource constrained future intelligent systems that are comprised of robust Binarized Neural Networks operating with approximate memory and approximate computing units, while being able to be trained on the edge.

2024-01-01T00:00:00Z Analyses and optimizations of timing-constrained embedded systems considering resource synchronization and machine learning approaches Shi, Junjie http://hdl.handle.net/2003/42279 2024-01-22T12:24:22Z 2023-01-01T00:00:00Z

Title: Analyses and optimizations of timing-constrained embedded systems considering resource synchronization and machine learning approaches Authors: Shi, Junjie Abstract: Nowadays, embedded systems have become ubiquitous, powering a vast array of applications from consumer electronics to industrial automation. Concurrently, statistical and machine learning algorithms are being increasingly adopted across various application domains, such as medical diagnosis, autonomous driving, and environmental analysis, offering sophisticated data analysis and decision-making capabilities. As the demand for intelligent and time-sensitive applications continues to surge, accompanied by growing concerns regarding data privacy, the deployment of machine learning models on embedded devices has emerged as an indispensable requirement. However, this integration introduces both significant opportunities for performance enhancement and complex challenges in deployment optimization. On the one hand, deploying machine learning models on embedded systems with limited computational capacity, power budgets, and stringent timing requirements necessitates additional adjustments to ensure optimal performance and meet the imposed timing constraints. On the other hand, the inherent capabilities of machine learning, such as self-adaptation during runtime, prove invaluable in addressing challenges encountered in embedded systems, aiding in optimization and decision-making processes. This dissertation introduces two primary modifications for the analyses and optimizations of timing-constrained embedded systems. For one thing, it addresses the relatively long access times required for shared resources of machine learning tasks. For another, it considers the limited communication resources and data privacy concerns in distributed embedded systems when deploying machine learning models. Additionally, this work provides a use case that employs a machine learning method to tackle challenges specific to embedded systems. By addressing these key aspects, this dissertation contributes to the analysis and optimization of timing-constrained embedded systems, considering resource synchronization and machine learning models to enable improved performance and efficiency in real-time applications with stringent constraints.

2023-01-01T00:00:00Z Complex scheduling models and analyses for property-based real-time embedded systems Ueter, Niklas http://hdl.handle.net/2003/42212 2023-12-05T12:53:03Z 2023-01-01T00:00:00Z

Title: Complex scheduling models and analyses for property-based real-time embedded systems Authors: Ueter, Niklas Abstract: Modern multi core architectures and parallel applications pose a significant challenge to the worst-case centric real-time system verification and design efforts. The involved model and parameter uncertainty contest the fidelity of formal real-time analyses, which are mostly based on exact model assumptions. In this dissertation, various approaches that can accept parameter and model uncertainty are presented. In an attempt to improve predictability in worst-case centric analyses, the exploration of timing predictable protocols are examined for parallel task scheduling on multiprocessors and network-on-chip arbitration. A novel scheduling algorithm, called stationary rigid gang scheduling, for gang tasks on multiprocessors is proposed. In regard to fixed-priority wormhole-switched network-on-chips, a more restrictive family of transmission protocols called simultaneous progression switching protocols is proposed with predictability enhancing properties. Moreover, hierarchical scheduling for parallel DAG tasks under parameter uncertainty is studied to achieve temporal- and spatial isolation. Fault-tolerance as a supplementary reliability aspect of real-time systems is examined, in spite of dynamic external causes of fault. Using various job variants, which trade off increased execution time demand with increased error protection, a state-based policy selection strategy is proposed, which provably assures an acceptable quality-of-service (QoS). Lastly, the temporal misalignment of sensor data in sensor fusion applications in cyber-physical systems is examined. A modular analysis based on minimal properties to obtain an upper-bound for the maximal sensor data time-stamp difference is proposed.

2023-01-01T00:00:00Z Transfer learning for multi-channel time-series Human Activity Recognition Moya Rueda, Fernando http://hdl.handle.net/2003/42208 2023-11-30T23:13:05Z 2023-01-01T00:00:00Z

Title: Transfer learning for multi-channel time-series Human Activity Recognition Authors: Moya Rueda, Fernando Abstract: Abstract for the PHD Thesis Transfer Learning for Multi-Channel Time-Series Human Activity Recognition Methods of human activity recognition (HAR) have been developed for the purpose of automatically classifying recordings of human movements into a set of activities. Capturing, evaluating, and analysing sequential data to recognise human activities accurately is critical for many applications in pervasive and ubiquitous computing applications, e.g., in applications such as mobile- or ambient-assisted living, smart-homes, activities of daily living, health support and rehabilitation, sports, automotive surveillance, and industry 4.0. For example, HAR is particularly interesting for optimisation in those industries where manual work remains dominant. HAR takes as inputs signals from videos or from multi-channel time-series, e.g., human joint measurements from marker-based motion capturing systems and inertial measurements measured by wearables or on-body devices. Wearables have become relevant as they extend the potential of HAR beyond constrained or laboratory settings. This thesis focuses on HAR using multi-channel time-series. Multi-channel Time-Series HAR is, in general, a challenging classification task. This is because human activities and movements show a large variation. Humans carry out in similar manner activities that are semantically very distinctive; conversely, they carry out similar activities in many different ways. Furthermore, multi-channel Time-Series HAR datasets suffer from the class unbalance problem, with more samples of certain activities than others. This problem strongly depends on the annotation. Moreover, there are non-standard definitions of human activities for annotation. Methods based on Deep Neural Networks (DNNs) are prevalent for Multi-channel Time-Series HAR. Nevertheless, the performance of DNNs has not significantly increased compared to as other fields such as image classification or segmentation. DNNs present a low sample efficiency as they learn the temporal structure from activities completely from data. Considering supervised DNNs, the scarcity of annotated data is the primary concern. Annotated data from human behaviour is scarce and costly to obtain. The annotation process demands enormous resources. Additionally, annotation reliability varies because they can be subject to human errors or unclear and non-elaborated annotation protocols. Transfer learning has been used to cope with a limited amount of annotated data, overfitting, zero-shot learning or classification of unseen human activities, and the class-unbalance problem. Transfer learning can alleviate the problem of scarcity of annotated data. Learnt parameters and feature representations from a specific source domain are transferred to a target domain. Transfer learning extends the usability of large annotated data from source domains to related problems. This thesis proposes a general transfer learning approach to improve automatic multi-channel Time-Series HAR. The proposed transfer learning method combines a semantic attribute representation of activities and a specific deep neural network. It handles situations where the source and target domains differ, i.e., the sensor space and the set of activities change, without needing a large amount of annotated data from the target domain. The method considers different levels of transferability. First, an architecture handles a variate of dataset configurations in regard to the number of devices and their type; it creates fixed-size representations of sensor recordings that are representative of the human limbs. These networks will process sequences of movements from the human limbs, either from poses or inertial measurements. Second, it introduces a search of semantic attribute representations that favourably represent signal segments for recognising human activities in unknown scenarios, as they only consider annotations of activities, and they lack human-annotated semantic attributes. And third, it covers transferability from data of a variety of source datasets. The method takes advantage of a large human-pose dataset as a source domain, which is created during the develop of this thesis. Furthermore, synthetic-inertial measurements will be derived from sequences of human poses either from a marker-based motion capturing system or video-based HAR and pose-based HAR datasets. The latter will specifically use the annotations of pixel-coordinate of human poses as multi-channel time-series data. Real inertial measurements and these synthetic measurements will then be deployed as a source domain for parameter transfer learning. Experimentation on different target datasets demonstrates that the proposed transfer learning method improves performance, most evidently when deploying a proportion of their training material. This outcome suggests that the temporal convolutional filters are rather general as they learn local temporal relations of human movements related to the semantic attributes, independent of the number of devices and their type. A human-limb-oriented deep architecture and an evolutionary algorithm provide an out-of-the-shelf predictor of semantic attributes that can be deployed directly on a new target scenario. Very related problems can directly be addressed by manually giving the attribute-to-activity relations without the need for a search throughout an evolutionary algorithm. Besides, the learnt convolutional filters are activity class dependent. Hence, the classification performance on the activities shared among the datasets improves.

2023-01-01T00:00:00Z Memory carousel: LLVM-based bitwise wear leveling for nonvolatile main memory Hölscher, Nils Hakert, Christian Nassar, Hassan Chen, Kuan-Hsun Bauer, Lars Chen, Jian-Jia Henkel, Jörg http://hdl.handle.net/2003/42131 2023-10-11T22:12:59Z 2022-12-14T00:00:00Z

Title: Memory carousel: LLVM-based bitwise wear leveling for nonvolatile main memory Authors: Hölscher, Nils; Hakert, Christian; Nassar, Hassan; Chen, Kuan-Hsun; Bauer, Lars; Chen, Jian-Jia; Henkel, Jörg Abstract: Emerging nonvolatile memory yields, alongside many advantages, technical shortcomings, such as reduced cell lifetime. Although many wear-leveling approaches exist to extend the lifetime of such memories, usually a tradeoff for the granularity of wear leveling has to be made. Due to iterative write schemes (repeatedly sense and write), wear out of memory in certain systems is directly dependent on the written bit value and thus can be highly imbalanced, requiring dedicated bit-wise wear leveling. Such a bit-wise wear leveling so far has only be proposed together with a special hardware support. However, if no dedicated hardware solutions are available, especially for commercial off-the-shelf systems with nonvolatile memories, a software solution can be crucial for the system lifetime. In this work, we propose entirely software-based bit-wise wear leveling, where the position of bits within CPU words in the main memory is rotated on a regular basis. We leverage the LLVM intermediate representation to adjust load and store operations of the application with a custom compiler pass. Experimental evaluation shows that the lifetime by applying local rotation within the CPU word can be extended by a factor of up to 21× . We also show that our method can incorporate with coarser-grained wear leveling, e.g., on block granularity and assist achievement of higher lifetime improvements.

2022-12-14T00:00:00Z Assessing the reliability of deep neural networks Oberdiek, Philipp http://hdl.handle.net/2003/42101 2023-09-19T22:12:59Z 2023-01-01T00:00:00Z

Title: Assessing the reliability of deep neural networks Authors: Oberdiek, Philipp Abstract: Deep Neural Networks (DNNs) have achieved astonishing results in the last two decades, fueled by ever larger datasets and the availability of high performance compute hardware. This led to breakthroughs in many applications such as image and speech recognition, natural language processing, autonomous driving, and drug discovery. Despite their success, the understanding of internal workings and the interpretability of predictions remains limited and DNNs are often treated as "black boxes". Especially for safety-critical applications where the well-being of humans is at risk, decisions based on predictions should consider associated uncertainties. Autonomous vehicles, for example, operate in a highly complex environment with potentially unpredictable situations that can lead to safety risks for pedestrians and other road users. In medical applications, decision based on incorrect predictions can have serious consequences for a patient's health. As a consequence, the topic of Uncertainty Quantification (UQ) has received increasing attention in recent years. The goal of UQ is to assign uncertainties to predictions in order to ensure the decision-making process is informed by potentially unreliable predictions. In addition, other tasks such as identifying model weaknesses, collecting additional data or detecting malicious attacks can be supported by uncertainty estimates. Unfortunately, UQ for DNNs is a particularly challenging task due to their high complexity and nonlinearity. Uncertainties which can be derived from traditional statistical models are often not directly applicable to DNNs. Therefore, the development of new UQ techniques for DNNs is of paramount importance to ensure safety-aware decision-making. This thesis evaluates existing UQ methods and proposes improvements and novel approaches which contribute to the reliability and trustworthiness of modern deep learning methodology. One of the core contributions of this work is the development of a novel generative learning framework with an integrated training of a One-vs-All (OvA) classifier. A Generative Adversarial Network (GAN) is trained in such a way that it is possible to sample from the boundary of the training distribution. These boundary samples are shielding the training dataset from the Out-of-Distribution (OoD) region. By making the GAN class-conditional, it is possible to shield each class separately, which integrates well with the formulation of an OvA classifier. The OvA classifier achieves outstanding results on the task of OoD detection and surpasses many previous works by large margins. In addition, the tight class shielding also improves the overall classification accuracy. A comprehensive and consistent evaluation on the tasks of False Positive, Out-of-Distribution and Adversarial Example Detection on a diverse selection of datasets provides insights into the strengths and weaknesses of existing methods and the proposed approaches.

2023-01-01T00:00:00Z Special issue on practical and robust design of real-time systems Chen, Jian-Jia Shrivastava, Aviral http://hdl.handle.net/2003/41738 2023-06-13T22:13:16Z 2022-08-20T00:00:00Z

Title: Special issue on practical and robust design of real-time systems Authors: Chen, Jian-Jia; Shrivastava, Aviral

2022-08-20T00:00:00Z MODES: model-based optimization on distributed embedded systems Shi, Junjie Bian, Jiang Richter, Jakob Chen, Kuan-Hsun Rahnenführer, Jörg Xiong, Haoyi Chen, Jian-Jia http://hdl.handle.net/2003/40880 2022-04-26T22:12:34Z 2021-06-04T00:00:00Z

Title: MODES: model-based optimization on distributed embedded systems Authors: Shi, Junjie; Bian, Jiang; Richter, Jakob; Chen, Kuan-Hsun; Rahnenführer, Jörg; Xiong, Haoyi; Chen, Jian-Jia Abstract: The predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to collect all the data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the time available for tuning. Model-Based Optimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) MODES-B considers the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, and (2) MODES-I considers all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate MODES by conducting experiments on the optimization for the hyper-parameters of a random forest and a multi-layer perceptron. The experimental results demonstrate that, with an improvement in terms of mean accuracy (MODES-B), run-time efficiency (MODES-I), and statistical stability for both modes, MODES outperforms the baseline, i.e., carry out tuning with MBO on each node individually with its local sub-data set.

2021-06-04T00:00:00Z Aufzeichnungsbasierte Analyse von Sperren in Betriebssystemen Lochmann, Alexander http://hdl.handle.net/2003/40642 2022-01-05T23:13:01Z 2021-01-01T00:00:00Z

Title: Aufzeichnungsbasierte Analyse von Sperren in Betriebssystemen Authors: Lochmann, Alexander Abstract: Moderne Mehrkernbetriebssysteme bieten eine Vielzahl an Synchronisationsmechanismen. Sie dienen der Realisierung von feingranularem Sperren, um dem Betriebssystem wie auch den darauf laufenden Anwendungen zu erlauben, die Leistung von modernen Mehrkernprozessoren auszunutzen. Hierbei werden ganze Subsysteme, einzelne Datenstrukturen oder lediglich Teile einer Datenstruktur mit einer oder mehr Sperren abgesichert. Je vielfältiger die Mechanismen und je feingranularer das Sperren wird, desto fehleranfälliger kann ein Betriebssystem werden. Daher ist es immanent wichtig zu verstehen, wie die vorgenannten Synchronisationsmechanismen in einem Mehrkernbetriebssystem eingesetzt werden, um Synchronisationsfehler zu vermeiden. Existierende Forschungsarbeiten in diesem Bereich befassen sich mit dem Auffinden von spezifischen Synchronisationsproblemen, wie z. B. der Detektion von Wettlaufsituationen um Speicherzugriffe. Sie detektieren Synchronisationsfehler allerdings nur im Nachhinein. Sie leiten aber keinerlei Sperren-Regeln ab, die Aussagen über das korrekte Absichern von Zugriffen machen könnten. So würden im Vorhinein Fehler vermieden. Genau diese Lücke versucht die vorliegende Arbeit zu schließen. Daher befasst sie sich mit den Fragen, ob man a) mit der aufzeichnungsbasierten Analyse Erkenntnisse über das Synchronisationsverhalten in Mehrkehrbetriebssystemen erlangen kann, und, wie man b) mit diesen Erkenntnissen die Softwarequalität moderner Mehrkernbetriebssysteme verbessern kann. Daraus ergeben sich folgende Forschungsbeiträge dieser Arbeit: Zunächst wird in dieser Arbeit der Entwurf des LockDoc-Ansatzes erläutert. Dieser umfasst das Aufzeichnen von Speicherzugriffen und Sperren-Operationen in einem Betriebssystemkern, während eine Arbeitslast ausgeführt wird. Daraus werden Zusammenhänge zwischen Zugriffen auf Datenstrukturen und Sperren-Operationen hergestellt. Dies lässt sich auf dreierlei Wegen nutzen: 1) Das Überprüfen der existierenden Sperren-Dokumentation, ob der Programmcode sich noch an die dokumentierten Regeln hält. 2) Das Ableiten von neuen Sperren-Regeln für verschiedene Datentypen. Aus diesen Daten lässt sich in einem weiteren Schritt eine neue Sperren-Dokumentation generieren. 3) Das Detektieren von Zugriffen, die nicht den abgeleiteten Regeln folgen. Die sogenannten Gegenbeispiele zeigen potentielle Synchronisationsfehler inkl. der Aufrufhierarchie sowie den tatsächlich gehaltenen Sperren an. In dieser Arbeit wird der Ansatz im Rahmen von Fallstudien auf die Betriebssystemkerne von Linux und FreeBSD angewendet. Die Untersuchung erfolgt dabei nach den drei vorgenannten Zielen. Basierend auf den Untersuchung im Rahmen dieser Arbeit wurden fünf Änderungen am Linux-Kern seitens des Autors dieser Arbeit erstellt und durch die Entwicklergemeinde akzeptiert. Eine weitere Änderung wurde bereits für gut befunden, aber noch nicht akzeptiert. Die Ergebnisse dieser Arbeit führten ebenfalls zu einer Änderung an der Sperren-Dokumentation im FreeBSD-Kern. Außerdem wurde ein Synchronisationsfehler in FreeBSD aufgedeckt.

2021-01-01T00:00:00Z Software fault injection and localization in embedded systems Gabor, Ulrich Thomas http://hdl.handle.net/2003/40298 2021-07-07T22:12:31Z 2021-01-01T00:00:00Z

Title: Software fault injection and localization in embedded systems Authors: Gabor, Ulrich Thomas Abstract: Injection and localization of software faults have been extensively researched, but the results are not directly transferable to embedded systems. The domain-specific constraints applying to these systems, such as limited resources and the predominant C/C++ programming languages, require a specific set of injection and localization techniques. In this thesis, we have assessed existing approaches and have contributed a set of novel methods for software fault injection and localization in embedded systems. We have developed a method based on AspectC++ for the injection of errors at interfaces and a method based on Clang for the accurate injection of software faults directly into source code. Both approaches work particularly well in the context of embedded systems, because they do not require runtime support and modify binaries only when necessary. Nevertheless, they are suitable to inject software faults and errors into the software of other domains. These contributions required a thorough assessment of fault injection techniques and fault models presented in literature over the years, which raised multiple questions regarding their validity in the context of C/C++. We found that macros (particularly header files), compile-time language constructs, and the commonly used optimization levels introduce a non-negligible bias to experimental results achieved by injection methods operating on any other layer than the source code. Additionally, we found that the textual specification of fault models is prone to ambiguities and misunderstandings. We have conceived an automatic fault classifier to solve this problem in a field study. Regarding software fault localization, we have combined existing methods making use of program spectra and assertions, and have contributed a new oracle type for autonomous localization of software faults in the field. Our evaluation shows that this approach works particularly well in the context of embedded systems because the generated information can be processed in real-time and, therefore, it can run in an unsupervised manner. Concluding, we assessed a variety of injection and localization approaches in the context of embedded systems and contributed novel methods where applicable improving the current state-of-the-art. Our results also point out weaknesses regarding the general validity of the majority of previous injection experiments in C/C++.

2021-01-01T00:00:00Z Correspondence article: counterexample for suspension-aware schedulability analysis of EDF scheduling Günzel, Mario Chen, Jian-Jia http://hdl.handle.net/2003/40234 2021-05-31T22:12:28Z 2020-08-18T00:00:00Z

Title: Correspondence article: counterexample for suspension-aware schedulability analysis of EDF scheduling Authors: Günzel, Mario; Chen, Jian-Jia

2020-08-18T00:00:00Z A note on slack enforcement mechanisms for self-suspending tasks Günzel, Mario Chen, Jian-Jia http://hdl.handle.net/2003/40088 2021-03-19T23:10:20Z 2021-01-27T00:00:00Z

Title: A note on slack enforcement mechanisms for self-suspending tasks Authors: Günzel, Mario; Chen, Jian-Jia Abstract: This paper provides counterexamples for the slack enforcement mechanisms to handle segmented self-suspending real-time tasks by Lakshmanan and Rajkumar (Proceedings of the Real-Time and Embedded Technology and Applications Symposium (RTAS), pp 3–12, 2010).

2021-01-27T00:00:00Z Nanoparticle classification using frequency domain analysis on resource-limited platforms Yayla, Mikail Toma, Anas Chen, Kuan-Hsun Lenssen, Jan Eric Shpacovitch, Victoria Hergenröder, Roland Weichert, Frank Chen, Jian-Jia http://hdl.handle.net/2003/38541 2020-01-30T02:40:51Z 2019-09-24T00:00:00Z

Title: Nanoparticle classification using frequency domain analysis on resource-limited platforms Authors: Yayla, Mikail; Toma, Anas; Chen, Kuan-Hsun; Lenssen, Jan Eric; Shpacovitch, Victoria; Hergenröder, Roland; Weichert, Frank; Chen, Jian-Jia Abstract: A mobile system that can detect viruses in real time is urgently needed, due to the combination of virus emergence and evolution with increasing global travel and transport. A biosensor called PAMONO (for Plasmon Assisted Microscopy of Nano-sized Objects) represents a viable technology for mobile real-time detection of viruses and virus-like particles. It could be used for fast and reliable diagnoses in hospitals, airports, the open air, or other settings. For analysis of the images provided by the sensor, state-of-the-art methods based on convolutional neural networks (CNNs) can achieve high accuracy. However, such computationally intensive methods may not be suitable on most mobile systems. In this work, we propose nanoparticle classification approaches based on frequency domain analysis, which are less resource-intensive. We observe that on average the classification takes 29 μ s per image for the Fourier features and 17 μ s for the Haar wavelet features. Although the CNN-based method scores 1–2.5 percentage points higher in classification accuracy, it takes 3370 μ s per image on the same platform. With these results, we identify and explore the trade-off between resource efficiency and classification performance for nanoparticle classification of images provided by the PAMONO sensor.

2019-09-24T00:00:00Z Realistic scheduling models and analyses for advanced real-time embedded systems Brüggen, Georg von der http://hdl.handle.net/2003/38526 2020-01-18T02:41:25Z 2019-01-01T00:00:00Z

Title: Realistic scheduling models and analyses for advanced real-time embedded systems Authors: Brüggen, Georg von der Abstract: Focusing on real-time scheduling theory, the thesis demonstrates how essential realistic scheduling models and analyses are when guaranteeing timing correctness without over-provisioning the necessary system resources. It details potential pitfalls of the de facto standards for theoretical examination of scheduling algorithms and schedulability tests, namely resource augmentation bounds and utilization bounds, and proposes parametric augmentation functions to improve their meaningfulness. Considering uncertain execution behaviour, systems with dynamic real-time guarantees are introduced to model this scenario more realistically than mixed-criticality systems, and the first technique that allows to precisely calculate the worst-case deadline failure probability for task sets with a realistic number of tasks is provided. Furthermore, hybrid self-suspension models are proposed that bridge the gap between the over-flexible dynamic and the over-restrictive segmented self-suspension model with different tradeoffs between accuracy and flexibility.

2019-01-01T00:00:00Z Energy-aware design of hardware and software for ultra-low-power systems Buschhoff, Markus http://hdl.handle.net/2003/38271 2019-10-09T01:40:46Z 2019-01-01T00:00:00Z

Title: Energy-aware design of hardware and software for ultra-low-power systems Authors: Buschhoff, Markus Abstract: Future visions of the Internet of Things and Industry 4.0 demand for large scale deployments of mobile devices while removing the numerous disadvantages of using batteries: degradation, scale, weight, pollution, and costs. However, this requires computing platforms with extremely low energy consumptions, and thus employ ultra-low-power hardware, energy harvesting solutions, and highly efficient power-management hardware and software. The goal of these power management solutions is to either achieve power neutrality, a condition where energy harvest and energy consumption equalize while maximizing the service quality, or to enhance power efficiency for conserving energy reserves. To reach these goals, intelligent power-management decisions are needed that utilize precise energy data. This thesis discusses the measurement of energy in embedded systems, both online and by external equipment, and the utilization of the acquired data for modeling the power consumption states of each involved hardware component. Furthermore, a method is shown to use the resulting models by instrumenting preexisting device drivers. These drivers enable new functionalities, such as online energy accounting and energy application interfaces, and facilitate intelligent power management decisions. In order to reduce additional efforts for device driver reimplementation and the violation of the separation of concerns paradigm, the approach shown in this thesis synthesizes instrumentation aspects for an aspect oriented programming language, so that the original device-driver source code remains unaffected. Eventually, an automated process of energy measurement and data analysis is presented. This process is able to yield precise energy models with low manual effort. In combination with the instrumentation synthesis of aspect code, this method enables an accelerated creation process for energy models of ultra-low-power systems. For all proposed methods, empirical accuracy and overhead measurements are presented. To support the claims of the author, first practical energy aware and wireless-radio networked applications are showcased: An energy-neutral light sensor, a photovoltaic-powered seminar-room door plate, and a sensor network experiment testbed for research and education.

2019-01-01T00:00:00Z Segmentation-free word spotting with bag-of-features hidden Markov models Rothacker, Leonard http://hdl.handle.net/2003/38186 2019-08-23T01:40:47Z 2019-01-01T00:00:00Z

Title: Segmentation-free word spotting with bag-of-features hidden Markov models Authors: Rothacker, Leonard Abstract: The method that is proposed in this thesis makes document images searchable with minimum manual effort. This works in the query-by-example scenario where the user selects an exemplary occurrence of the query word in a document image. Afterwards, an entire collection of document images is searched automatically. The major challenge is to detect relevant words and to sort them according to similarity to the query. However, recognizing text in historic document images can be considered as extremely challenging. Different historic document collections have highly irregular visual appearances due to non-standardized layouts or the large variabilities in handwritten script. An automatic text recognizer requires huge amounts of annotated samples from the collection that are usually not directly available. In order to search document images with just a single example of the query word, the information that is available about the problem domain is integrated at various levels. Bag-of-features are a powerful image representation that can be adapted to the data automatically. The query word is represented with a hidden Markov model. This statistical sequence model is very suitable for the sequential structure of text. An important assumption is that the visual variability of the text within a single collection is limited. For example, this is typically the case if the documents have been written by only a few writers. Furthermore, the proposed method requires only minimal heuristic assumptions about the visual appearance of text. This is achieved by processing document images as a whole without requiring a given segmentation of the images on word level or on line level. The detection of potentially relevant document regions is based on similarity to the query. It is not required to recognize words in general. Word size variabilities can be handled by the hidden Markov model. In order to make the computationally costly application of the sequence model feasible in practice, regions are retrieved according to approximate similarity with an efficient model decoding algorithm. Since the approximate approach retrieves regions with high recall, re-ranking these regions with the sequence model leads to highly accurate word spotting results. In addition, the method can be extended to textual queries, i.e., query-by-string, if annotated samples become available. The method is evaluated on five benchmark datasets. In the segmentation-free query-by-example scenario where no annotated sample set is available, the method outperforms all other methods that have been evaluated on any of these five benchmarks. If only a small dataset of annotated samples is available, the performance in the query-by-string scenario is competitive with the state-of-the-art.

2019-01-01T00:00:00Z Design of fault-tolerant virtual execution environments for cyber-physical systems Jablkowski, Boguslaw http://hdl.handle.net/2003/38154 2019-08-02T01:40:48Z 2019-01-01T00:00:00Z

Title: Design of fault-tolerant virtual execution environments for cyber-physical systems Authors: Jablkowski, Boguslaw Abstract: The last decade revealed the vast economical and societal potential of Cyber-Physical Systems (CPS) which integrate computation with physical processes. In order to better exploit this potential, designers of CPS are trying to take advantage of novel technological opportunities provided by the unprecedented efficiency of today's hardware. There are, however, considerable challenges to this endeavor. First, there is a strong trend towards softwarization. Functions that were originally implemented in hardware are now being increasingly realized in software. This fact, together with the ever growing functionality of modern CPS, translates to unrestrained code generation which, in turn, directly influences their safety and security. Second, the spreading adaptation of multi-core and manycore architectures, due to their considerable increase in computation power, additionally generates issues related to timing properties, resource partitioning, task mapping and scalability. In order to overcome these challenges, this thesis investigates the idea of adopting virtualization technology to the domain of CPS. Several research questions originate from this idea and the following work aims at answering those questions. It addresses both technological and methodological issues. With respect to the technological aspects, it investigates problems and proposes solutions related to timing properties of a virtualized execution platform as well as the thereon based high availability technique. Regarding the methodological aspects, it discusses models and methods for the planing of safe and efficient virtualized CPS compute and control clusters, proposes architectures for the development and verification of virtualized CPS applications as well as for the testing of non-functional characteristics of the underlying software and hardware infrastructure. Further, through a set of experiments, this thesis thoroughly evaluates the proposed solutions. Finally, based upon the provided results and some new considerations regarding the requirements of future CPS applications, it gives an outlook towards a generic virtualized execution platform architecture for emerging CPS.

2019-01-01T00:00:00Z Optimization and analysis for dependable application software on unreliable hardware platforms Chen, Kuan-Hsun http://hdl.handle.net/2003/38110 2019-06-26T01:40:47Z 2019-01-01T00:00:00Z

Title: Optimization and analysis for dependable application software on unreliable hardware platforms Authors: Chen, Kuan-Hsun Abstract: As chip technology keeps on shrinking towards higher densities and lower operating vol- tages, memory and logic components are now vulnerable to electromagnetic inference and radiation, leading to transient faults in the underlying hardware, which may jeopar- dize the correctness of software execution and cause so-called soft errors. To mitigate threats of soft errors, embedded-software developers have started to deploy Software- Implemented Hardware Fault Tolerance (SIHFT) techniques. However, the main cost is the signi cant amount of time due to the additional computation of using SIHFT techniques. To support safety critical systems, e.g., computing systems in automotive and avionic devices, real-time system technology has been primarily used and been wi- dely studied. While considering hardware transient faults and SIHFT techniques with real-time system technology, novel scheduling approaches and schedulability analyses are desired to provide a less pessimistic o -line guarantee for timeliness or at least to provide a certain degree of performance for new application models. Moreover, reliability optimizations also need to be designed thoughtfully while considering di erent resource constraints. In this dissertation, we present three treatments for soft errors. Firstly, we study how to allow erroneous computations without deadline misses by modeling inherent safety margins and noise tolerance in control applications as (m; k) constraints. We further dis- cuss how a given (m; k) requirement can be satis ed by individual error detection and exible compensations while satisfying the given hard real-time constraints. Secondly, we analyze the probability of deadline misses and the deadline miss rate in soft real-time systems, which allow to have occasional deadline misses without erroneous computations. Thirdly, we consider how to deploy redundant multi-threading techniques to improve the system reliability under two di erent system models for multi-core systems: 1) Under core-to-core frequency variations, we address the reliability-aware task-mapping problem. 2) We decide on redundancy levels for each task while satisfying the given real-time constraints and the limited redundant cores even under multi-tasking. Finally, an enhan- cement for real time operating systems is also provided to maintain the strict periodicity for task overruns due to potential transient faults, especially on one popular platform named Real-Time Executive for Multiprocessor Systems (RTEMS).

2019-01-01T00:00:00Z Learning attribute representations with deep convolutional neural networks for word spotting Sudholt, Sebastian http://hdl.handle.net/2003/37881 2019-01-18T02:40:53Z 2018-01-01T00:00:00Z

Title: Learning attribute representations with deep convolutional neural networks for word spotting Authors: Sudholt, Sebastian Abstract: Understanding the contents of handwritten texts from document images has long been a traditional field of research in computer science. The ultimate goal is to automatically transcribe the text in the images into an electronic format. This would make the documents from which the images were generated much easier to access and would also allow for a fast extraction of information. Especially for historical documents a possibility to easily sift through large document image collections would be of high interest. There exist vast amounts of manuscripts all over the world storing substantial amounts of yet untapped information on cultural heritage. Being able to extract these information for large and different corpora would allow historians unprecedented insight into various aspects of ancient human life. The desired goal is thus to obtain information on the text embedded in digital document images with no manual human interaction at all. A well known approach for achieving this is to make use of models known from the field of pattern recognition and machine learning in order to classify the text in the images into electronic representations of characters or words. This approach is known as Optical Character Recognition or text recognition and belongs to the oldest applications of pattern recognition and computer science in general. Despite its long history, handwritten text recognition is still considered an unsolved task as classification systems are still not able to consistently achieve results as are common for machine printed text recognition. This is especially true for historical documents as the text to be recognized typically exhibits different amounts of degradation as well as large variability in handwriting for the same characters and words. Depending on the task at hand, a full transcription of the text might, however, not be necessary. If a potential user is only interested in whether a certain word or text portion is present in a given document collection or not, retrieval-based approaches are able to produce more robust results than recognition-based ones. These retrieval-based approaches compare parts of the document images to a sought-after query and decide if the individual parts are similar to the query. For a given method, the result is then a list of parts of the document images which are deemed relevant by the method. In the field of document image analysis, this retrieval approach is known as keyword spotting or simply word spotting. Word spotting is the problem of interest in this thesis. In particular, a method will be presented which allows for using neural network models in order to approach different word spotting tasks. This method is inspired by a recent state-of-the-art approach which utilizes semantic attributes for word spotting. In pattern recognition and computer vision, semantic attributes describe characteristics of classes which may be shared between classes. This sharing ability enables an attribute representations to encode parts of different classes which are common and those which are not. For example, when classifying animals, the classes tiger and zebra may share an attribute striped. For word spotting, attributes have been used in order to encode the occurrence and position of certain characters. The success of any attribute-based method is, of course, highly dependent on the ability of a classifier to correctly predict the individual attributes. In order to accomplish an accurate prediction of attributes for word spotting tasks, the use of Convolutional Neural Networks (CNNs) is proposed in this thesis. CNNs have recently attracted a substantial amount of research interest as they are able to consistently achieve state-of-the-art results in virtually all fields of computer vision. Their main advantage compared to other methods is their ability to jointly optimize a classifier and the feature representations obtained from the images. This characteristic is known as end-to-end learning. While CNNs have been used extensively for classifying data into one of multiple classes for various tasks, predicting attributes with these neural networks has largely been done for face and fashion attributes only. For the method presented in this thesis a CNN is trained to predict attribute representations extracted from word strings in an end-to-end fashion. These attributes are leveraged in order to perform word spotting. The core contribution lies in the design and evaluation of different neural network architectures which are specifically designed to be applied to document images. A big part of this design is to determine suitable loss functions for the CNNs. Loss functions are a crucial ingredient in the training of neural networks in general and largely determine what kind of annotations the individual networks are able to learn for the given images. In particular, two loss function are derived, which allow for learning binary attribute representations as well as real-valued representations who can be considered attribute-like. Besides the loss functions, the second major contribution is the design of three CNN architectures which are tailor-made for being applied to problems involving handwritten text as data. Using the loss functions and the three architectures, a number experiments are conducted in which the neural networks are trained to predict the attribute or attribute-like representations Pyramidal Histogram of Characters (PHOC), Spatial Pyramid of Characters (SPOC) and Discrete Cosine Transform of Words (DCToW). It is shown experimentally, that the proposed approach of using neural networks for predicting attribute representations achieves state-of-the-art results for various word spotting benchmarks.

2018-01-01T00:00:00Z Partially supervised learning of models for visual scene and object recognition Grzeszick, René http://hdl.handle.net/2003/37117 2018-09-05T08:52:51Z 2018-01-01T00:00:00Z

Title: Partially supervised learning of models for visual scene and object recognition Authors: Grzeszick, René Abstract: When creating a visual recognition system for a novel task, one of the main burdens is the collection and annotation of data. Often several thousand samples need to be manually reviewed and labeled so that the recognition system achieves the desired accuracy. The goal of this thesis is to provide methods that lower the annotation effort for visual scene and object recognition. These methods are applicable to traditional pattern recognition approaches as well as methods from the field of deep learning. The contributions are three-fold and range from feature augmentation, over semi-supervised learning for natural scene classification to zero-shot object recognition. The contribution in the field of feature augmentation deals with handcrafted feature representations. A novel method for incorporating additional information at feature level has been introduced. This information is subsequently integrated in a Bag-of-Features representation. The additional information can, for example, be of spatial or temporal nature, encoding a local feature's position within a sample in its feature descriptor. The information is quantized and appended to the feature vector and thus also integrated in the unsupervised learning step of the Bag-of-Features representation. As a result more specific codebook entries are computed for different regions within the samples. The results in the field of image classification for natural scenes and objects as well as the field of acoustic event detection, show that the proposed approach allows for learning compact feature representations without reducing the accuracy of the subsequent classification. In the field of semi-supervised learning, a novel approach for learning annotations in large image collections of natural scene images has been proposed. The approach is based on the active learning principle and incorporates multiple views on the data. The views, i.e. different feature representations, are clustered independently of each other. A human in the loop is asked to label each data cluster. The clusters are then iteratively refined based on cluster evaluation measures and additional labels are assigned to the dataset. Ultimately, a voting over all views creates a partially labeled sample set that is used for training a classifier. The results on natural scene images show that a powerful visual classifier can be learned with minimal annotation effort. The approach has been evaluated for traditional handcrafted features as well as features derived from a convolutional neural network. For the semi-supervised learning it is desirable to have compact feature representation. For traditional features, the ones obtained by the proposed feature augmentation approach are a good example of such a representation. Especially the application in the field of deep learning, which usually requires large amounts of labeled samples for training or even adapting a deep neural network, the semi-supervised learning is beneficial. For the zero-shot object prediction, a method that combines visual and semantic information about natural scenes is proposed. A convolutional neural network is trained in order to distinguish different scene categories. Furthermore, the relations between scene categories and visual object classes are learned based on their semantic relation in large text corpora. The probability for a given image to show a certain scene is derived from the network and combined with the semantic relations based on a statistical approach. This allows for predicting the presence of certain object classes in an image without having any visual training sample from any of the object classes. The results on a challenging dataset depicting various objects in natural scene images, show that especially in cluttered scenes the semantic relations can be a powerful information cue. Furthermore, when post-processing the results of a visual object predictor, the detection accuracy can be improved at the minimal cost of providing additional scene labels. When combining these contributions, it is shown that a scene classifier can be trained with minimal human effort and its predictions can still be leveraged for object prediction. Thus, information about natural scene images and the object classes within these images can be gained without having the burden to manually label tremendous amounts of images beforehand.

2018-01-01T00:00:00Z Methods for efficient resource utilization in statistical machine learning algorithms Kotthaus, Helena http://hdl.handle.net/2003/36929 2018-06-20T08:35:57Z 2018-01-01T00:00:00Z

Title: Methods for efficient resource utilization in statistical machine learning algorithms Authors: Kotthaus, Helena Abstract: In recent years, statistical machine learning has emerged as a key technique for tackling problems that elude a classic algorithmic approach. One such problem, with a major impact on human life, is the analysis of complex biomedical data. Solving this problem in a fast and efficient manner is of major importance, as it enables, e.g., the prediction of the efficacy of different drugs for therapy selection. While achieving the highest possible prediction quality appears desirable, doing so is often simply infeasible due to resource constraints. Statistical learning algorithms for predicting the health status of a patient or for finding the best algorithm configuration for the prediction require an excessively high amount of resources. Furthermore, these algorithms are often implemented with no awareness of the underlying system architecture, which leads to sub-optimal resource utilization. This thesis presents methods for efficient resource utilization of statistical learning applications. The goal is to reduce the resource demands of these algorithms to meet a given time budget while simultaneously preserving the prediction quality. As a first step, the resource consumption characteristics of learning algorithms are analyzed, as well as their scheduling on underlying parallel architectures, in order to develop optimizations that enable these algorithms to scale to larger problem sizes. For this purpose, new profiling mechanisms are incorporated into a holistic profiling framework. The results show that one major contributor to the resource issues is memory consumption. To overcome this obstacle, a new optimization based on dynamic sharing of memory is developed that speeds up computation by several orders of magnitude in situations when available main memory is the bottleneck, leading to swapping out memory. One important application that can be applied for automated parameter tuning of learning algorithms is model-based optimization. Within a huge search space, algorithm configurations are evaluated to find the configuration with the best prediction quality. An important step towards better managing this search space is to parallelize the search process itself. However, a high runtime variance within the configuration space can cause inefficient resource utilization. For this purpose, new resource-aware scheduling strategies are developed that efficiently map evaluations of configurations to the parallel architecture, depending on their resource demands. In contrast to classical scheduling problems, the new scheduling interacts with the configuration proposal mechanism to select configurations with suitable resource demands. With these strategies, it becomes possible to make use of the full potential of parallel architectures. Compared to established parallel execution models, the results show that the new approach enables model-based optimization to converge faster to the optimum within a given time budget.

2018-01-01T00:00:00Z Efficient implementation of resource-constrained cyber-physical systems using multi-core parallelism Neugebauer, Olaf http://hdl.handle.net/2003/36928 2018-06-16T01:41:30Z 2018-01-01T00:00:00Z

Title: Efficient implementation of resource-constrained cyber-physical systems using multi-core parallelism Authors: Neugebauer, Olaf Abstract: The quest for more performance of applications and systems became more challenging in the recent years. Especially in the cyber-physical and mobile domain, the performance requirements increased significantly. Applications, previously found in the high-performance domain, emerge in the area of resource-constrained domain. Modern heterogeneous high-performance MPSoCs provide a solid foundation to satisfy the high demand. Such systems combine general processors with specialized accelerators ranging from GPUs to machine learning chips. On the other side of the performance spectrum, the demand for small energy efficient systems exposed by modern IoT applications increased vastly. Developing efficient software for such resource-constrained multi-core systems is an error-prone, time-consuming and challenging task. This thesis provides with PA4RES a holistic semiautomatic approach to parallelize and implement applications for such platforms efficiently. Our solution supports the developer to find good trade-offs to tackle the requirements exposed by modern applications and systems. With PICO, we propose a comprehensive approach to express parallelism in sequential applications. PICO detects data dependencies and implements required synchronization automatically. Using a genetic algorithm, PICO optimizes the data synchronization. The evolutionary algorithm considers channel capacity, memory mapping, channel merging and flexibility offered by the channel implementation with respect to execution time, energy consumption and memory footprint. PICO's communication optimization phase was able to generate a speedup almost 2 or an energy improvement of 30% for certain benchmarks. The PAMONO sensor approach enables a fast detection of biological viruses using optical methods. With a sophisticated virus detection software, a real-time virus detection running on stationary computers was achieved. Within this thesis, we were able to derive a soft real-time capable virus detection running on a high-performance embedded system, commonly found in today's smart phones. This was accomplished with smart DSE algorithm which optimizes for execution time, energy consumption and detection quality. Compared to a baseline implementation, our solution achieved a speedup of 4.1 and 87\% energy savings and satisfied the soft real-time requirements. Accepting a degradation of the detection quality, which still is usable in medical context, led to a speedup of 11.1. This work provides the fundamentals for a truly mobile real-time virus detection solution. The growing demand for processing power can no longer satisfied following well-known approaches like higher frequencies. These so-called performance walls expose a serious challenge for the growing performance demand. Approximate computing is a promising approach to overcome or at least shift the performance walls by accepting a degradation in the output quality to gain improvements in other objectives. Especially for a safe integration of approximation into existing application or during the development of new approximation techniques, a method to assess the impact on the output quality is essential. With QCAPES, we provide a multi-metric assessment framework to analyze the impact of approximation. Furthermore, QCAPES provides useful insights on the impact of approximation on execution time and energy consumption. With ApproxPICO we propose an extension to PICO to consider approximate computing during the parallelization of sequential applications.

2018-01-01T00:00:00Z Acoustic sensor network geometry calibration and applications Plinge, Axel http://hdl.handle.net/2003/36343 2018-01-26T02:40:48Z 2017-01-01T00:00:00Z

Title: Acoustic sensor network geometry calibration and applications Authors: Plinge, Axel Abstract: In the modern world, we are increasingly surrounded by computation devices with communication links and one or more microphones. Such devices are, for example, smartphones, tablets, laptops or hearing aids. These devices can work together as nodes in an acoustic sensor network (ASN). Such networks are a growing platform that opens the possibility for many practical applications. ASN based speech enhancement, source localization, and event detection can be applied for teleconferencing, camera control, automation, or assisted living. For this kind of applications, the awareness of auditory objects and their spatial positioning are key properties. In order to provide these two kinds of information, novel methods have been developed in this thesis. Information on the type of auditory objects is provided by a novel real-time sound classification method. Information on the position of human speakers is provided by a novel localization and tracking method. In order to localize with respect to the ASN, the relative arrangement of the sensor nodes has to be known. Therefore, different novel geometry calibration methods were developed. Sound classification The first method addresses the task of identification of auditory objects. A novel application of the bag-of-features (BoF) paradigm on acoustic event classification and detection was introduced. It can be used for event and speech detection as well as for speaker identification. The use of both mel frequency cepstral coefficient (MFCC) and Gammatone frequency cepstral coefficient (GFCC) features improves the classification accuracy. By using soft quantization and introducing supervised training for the BoF model, superior accuracy is achieved. The method generalizes well from limited training data. It is working online and can be computed in a fraction of real-time. By a dedicated training strategy based on a hierarchy of stationarity, the detection of speech in mixtures with noise was realized. This makes the method robust against severe noises levels corrupting the speech signal. Thus it is possible to provide control information to a beamformer in order to realize blind speech enhancement. A reliable improvement is achieved in the presence of one or more stationary noise sources. Speaker localization The localization method enables each node to determine the direction of arrival (DoA) of concurrent sound sources. The author's neuro-biologically inspired speaker localization method for microphone arrays was refined for the use in ASNs. By implementing a dedicated cochlear and midbrain model, it is robust against the reverberation found in indoor rooms. In order to better model the unknown number of concurrent speakers, an application of the EM algorithm that realizes probabilistic clustering according to auditory scene analysis (ASA) principles was introduced. Based on this approach, a system for Euclidean tracking in ASNs was designed. Each node applies the node wise localization method and shares probabilistic DoA estimates together with an estimate of the spectral distribution with the network. As this information is relatively sparse, it can be transmitted with low bandwidth. The system is robust against jitter and transmission errors. The information from all nodes is integrated according to spectral similarity to correctly associate concurrent speakers. By incorporating the intersection angle in the triangulation, the precision of the Euclidean localization is improved. Tracks of concurrent speakers are computed over time, as is shown with recordings in a reverberant room. Geometry calibration The central task of geometry calibration has been solved with special focus on sensor nodes equipped with multiple microphones. Novel methods were developed for different scenarios. An audio-visual method was introduced for the calibration of ASNs in video conferencing scenarios. The DoAs estimates are fused with visual speaker tracking in order to provide sensor positions in a common coordinate system. A novel acoustic calibration method determines the relative positioning of the nodes from ambient sounds alone. Unlike previous methods that only infer the positioning of distributed microphones, the DoA is incorporated and thus it becomes possible to calibrate the orientation of the nodes with a high accuracy. This is very important for all applications using the spatial information, as the triangulation error increases dramatically with bad orientation estimates. As speech events can be used, the calibration becomes possible without the requirement of playing dedicated calibration sounds. Based on this, an online method employing a genetic algorithm with incremental measurements was introduced. By using the robust speech localization method, the calibration is computed in parallel to the tracking. The online method is be able to calibrate ASNs in real time, as is shown with recordings of natural speakers in a reverberant room. The informed acoustic sensor network All new methods are important building blocks for the use of ASNs. The online methods for localization and calibration both make use of the neuro-biologically inspired processing in the nodes which leads to state-of-the-art results, even in reverberant enclosures. The high robustness and reliability can be improved even more by including the event detection method in order to exclude non-speech events. When all methods are combined, both semantic information on what is happening in the acoustic scene as well as spatial information on the positioning of the speakers and sensor nodes is automatically acquired in real time. This realizes truly informed audio processing in ASNs. Practical applicability is shown by application to recordings in reverberant rooms. The contribution of this thesis is thus not only to advance the state-of-the-art in automatically acquiring information on the acoustic scene, but also pushing the practical applicability of such methods.

2017-01-01T00:00:00Z Co-Konfiguration von Hardware- und Systemsoftware-Produktlinien Meier, Matthias http://hdl.handle.net/2003/36006 2017-06-24T02:00:17Z 2017-01-01T00:00:00Z

Title: Co-Konfiguration von Hardware- und Systemsoftware-Produktlinien Authors: Meier, Matthias Abstract: Hardwarearchitekturen im Kontext von Eingebetteten Systemen werden immer komplexer und bewegen sich zukünftig immer häufiger in Richtung von Multi- oder Manycore-Systemen. Damit diese Systeme ihre optimale Leistungsfähigkeit – für die oftmals speziellen Aufgaben im Kontext von Eingebetteten Systemen – ausspielen können, beschäftigen sich ganze Forschungszweige mit der anwendungsspezifischen Maßschneiderung dieser Systeme. Insbesondere die Popularität von Hardwarebeschreibungssprachen trägt dazu ihren Teil bei. Jedoch ist die Entwicklung von solchen Systemen, selbst bei der Verwendung von Hardwarebeschreibungssprachen und der damit verbundenen höheren Abstraktionsebene, aufwendig und fehleranfällig. Die Verwendung von Hardwarebeschreibungssprachen lässt allerdings die Grenze zwischen Hard- und Software verschwimmen, denn Hardware kann nun – ähnlich wie auch Software – in textueller Form beschrieben werden. Dies eröffnet Möglichkeiten zur Übertragung von Konzepten aus der Software- auf die Hardwareentwicklung. Ein Konzept um der wachsenden Komplexität im Bereich der Softwareentwicklung zu begegnen, ist die organisierte Wiederverwendung von Komponenten, wie sie in der Produktlinienentwicklung zum Einsatz kommt. Inwieweit sich Produktlinienkonzepte auf Hardwarearchitekturen übertragen lassen und wie Hardware-Produktlinien entworfen werden können, soll in dieser Arbeit detailliert untersucht werden. Die Vorteile der Produktlinientechniken, wie die Möglichkeit zur Wiederverwendung von erprobten und zuverlässigen Komponenten, könnten so auch für Hardwarearchitekturen genutzt werden, um die Entwicklungskomplexität zu reduzieren und so mit erheblich geringerem Aufwand spezifische Hardwarearchitekturen entwickeln zu können. Zudem kann durch die gemeinsame Codebasis einer Produktlinie eine schnellere Markteinführungszeit unter geringeren Entwicklungskosten realisiert werden. Auf Basis dieser neuen Konzepte beschäftigt sich diese Arbeit zudem mit der Fragestellung, wie zukünftig solche parallelen Systeme programmiert und automatisiert optimiert werden können, um den Entwickler von der Anwendung über die Systemsoftware bis hin zur Hardware mit einer automatisierten Werkzeugkette bei der Umsetzung zu unterstützen. Im Fokus stehen dabei die in dieser Arbeit entworfenen Techniken zur durchgängigen Konfigurierung von Hardware und Systemsoftware. Diese Techniken beruhen im Wesentlichen auf den Programmierschnittstellen zwischen den Schichten, deren Zugriffsmuster sich statisch analysieren lassen. Die so gewonnenen Konfigurationsinformationen lassen sich dann zur automatisierten Maßschneiderung der Systemsoftware- und Hardware-Produktlinie für ein spezifisches Anwendungsszenario nutzen. Die anwendungsspezifische Optimierung der Systeme wird in dieser Arbeit mittels einer Entwurfsraumexploration durchgeführt. Der Fokus der Entwurfsraumexploration liegt allerdings nicht allein auf der Hardwarearchitektur, sondern umfasst ebenso die Softwareebene. Denn neben der Maßschneiderung der Systemsoftware, wird auch die auf einer parallelen Programmierschnittstelle aufsetzende Anwendung innerhalb der Entwurfsraumexploration automatisch skaliert, um die Leistungsfähigkeit von Manycore-Systemen ausschöpfen zu können.

2017-01-01T00:00:00Z Memory-aware platform description and framework for source-level embedded MPSoC software optimization Pyka, Robert http://hdl.handle.net/2003/36003 2017-06-24T02:00:08Z 2017-01-01T00:00:00Z

Title: Memory-aware platform description and framework for source-level embedded MPSoC software optimization Authors: Pyka, Robert Abstract: Developing optimizing source-level transformations, consists of numerous non-trivial subtasks. Besides identifying actual optimization goals within a particular target-platform and compiler setup, the actual implementation is a tedious, error-prone and often recurring work. Providing appropriate support for this development work is a challenging task. Defining and implementing a well-suited target-platform description which can be used by a wide set of optimization techniques while being precise and easy to maintain is one dimension of this challenging task. Another dimension, which has also been tackled in this work, deals with provision of an infrastructure for optimization-step representation, interaction and data retention. Finally, an appropriate source-code representation has been integrated into this approach. These contributions are tightly related to each other, they have been bundled into the MACCv2 framework, a fullfledged optimization-technique implementation and integration approach. Together, they significantly alleviate the effort required for implementation of source-level memory-aware optimization techniques for Multi Processor Systems on a Chip (MPSoCs). The system-modeling approach presented in this dissertation has been located at the processor-memory-switch (PMS) abstraction level. It offers a novel combined structural and semantical description. It combines a locally-scoped, structural modeling approach, as preferred by system designers, and a fast, database-like interface, best suited for optimization technique developers. It supports model refinement and requires only limited effort for an initial abstract system model. The general structure consists of components and channels. Based on this structure, the system model provides mechanisms for database-like access to system-global target-platform properties, while requiring only definition of locally-scoped input data annotated to system-model items. A typical set of these properties contains energy-consumption and access-latency values. The request-based retrieval of system properties is a unique feature, which makes this approach superior to state-of-the-art table-lookup-based or full-system-simulation-based approaches. Combining such component-local properties to system-global target-platform data is performed via aspect handlers. These handlers define computational rules which are applied to correlated locally-scoped data along access paths in the memory-subsystem hierarchy. This approach is capable of calculating these system-global values at a rate similar to plain table lookups, while maintaining a precision close to full-system-simulation-based estimations. This has been shown for both, energy-consumption values as well as access-latency values of the MPARM platform. The MACCv2 framework provides a set of fundamental services to the optimization technique developer. On top of these services, a system model and source-code representation are provided. Further, framework-based optimization-technique implementations are encapsulated into self-contained entities exposing well-defined interfaces. This framework has been successfully used within the European Commission funded MNEMEE project. The hierarchical processing-step representation in MACCv2 allows for encapsulation of tasks at various granularity levels. For simplified reuse in future projects, the entire toolchain as well as individual optimization techniques have been represented as processing-step entities in terms of MACCv2. A common notion of target-platform structure and properties as well as inter-processing-step communication, is achieved via framework-provided services. The system-modeling approach and the framework show the right set of properties needed to support development of memory-aware optimization techniques. The MNEMEE project, continued research work, teaching activities and PhD theses have been successfully founded on approaches and the framework proposed in this dissertation.

2017-01-01T00:00:00Z Scheduling algorithms and timing analysis for hard real-time systems Huang, Wen-Hung Kevin http://hdl.handle.net/2003/35984 2017-06-09T02:00:08Z 2017-01-01T00:00:00Z

Title: Scheduling algorithms and timing analysis for hard real-time systems Authors: Huang, Wen-Hung Kevin Abstract: Real-time systems are designed for applications in which response time is critical. As timing is a major property of such systems, proving timing correctness is of utter importance. To achieve this, a two-fold approach of timing analysis is traditionally involved: (i) worst-case execution time (WCET) analysis, which computes an upper bound on the execution time of a single job of a task running in isolation; and (ii) schedulability analysis using the WCET as the input, which determines whether multiple tasks are guaranteed to meet their deadlines. Formal models used for representing recurrent real-time tasks have traditionally been characterized by a collection of independent jobs that are released periodically. However, such a modeling may result in resource under-utilization in systems whose behaviors are not entirely periodic or independent. Examples are (i) multicore platforms where tasks share a communication fabric, like bus, for accesses to a shared memory beside processors; (ii) tasks with synchronization, where no two concurrent access to one shared resource are allowed to be in their critical section at the same time; and (iii) automotive systems, where tasks are linked to rotation (e.g., of the crankshaft, gears, or wheels). There, their activation rate is proportional to the angular velocity of a specific device. This dissertation presents multiple approaches towards designing scheduling algorithms and schedulability analysis for a variety of real-time systems with different characteristics. Specifically, we look at those design problems from the perspective of speedup factor — a metric that quantifies both the pessimism of the analysis and the non-optimality of the scheduling algorithm. The proposed solutions are shown promising by means of not only speedup factor but also extensive evaluations.

2017-01-01T00:00:00Z Aspect-oriented technology for dependable operating systems Borchert, Christoph http://hdl.handle.net/2003/35975 2017-05-27T02:00:11Z 2017-01-01T00:00:00Z

Title: Aspect-oriented technology for dependable operating systems Authors: Borchert, Christoph Abstract: Modern computer devices exhibit transient hardware faults that disturb the electrical behavior but do not cause permanent physical damage to the devices. Transient faults are caused by a multitude of sources, such as fluctuation of the supply voltage, electromagnetic interference, and radiation from the natural environment. Therefore, dependable computer systems must incorporate methods of fault tolerance to cope with transient faults. Software-implemented fault tolerance represents a promising approach that does not need expensive hardware redundancy for reducing the probability of failure to an acceptable level. This thesis focuses on software-implemented fault tolerance for operating systems because they are the most critical pieces of software in a computer system: All computer programs depend on the integrity of the operating system. However, the C/C++ source code of common operating systems tends to be already exceedingly complex, so that a manual extension by fault tolerance is no viable solution. Thus, this thesis proposes a generic solution based on Aspect-Oriented Programming (AOP). To evaluate AOP as a means to improve the dependability of operating systems, this thesis presents the design and implementation of a library of aspect-oriented fault-tolerance mechanisms. These mechanisms constitute separate program modules that can be integrated automatically into common off-the-shelf operating systems using a compiler for the AOP language. Thus, the aspect-oriented approach facilitates improving the dependability of large-scale software systems without affecting the maintainability of the source code. The library allows choosing between several error-detection and error-correction schemes, and provides wait-free synchronization for handling asynchronous and multi-threaded operating-system code. This thesis evaluates the aspect-oriented approach to fault tolerance on the basis of two off-the-shelf operating systems. Furthermore, the evaluation also considers one user-level program for protection, as the library of fault-tolerance mechanisms is highly generic and transparent and, thus, not limited to operating systems. Exhaustive fault-injection experiments show an excellent trade-off between runtime overhead and fault tolerance, which can be adjusted and optimized by fine-grained selective placement of the fault-tolerance mechanisms. Finally, this thesis provides evidence for the effectiveness of the approach in detecting and correcting radiation-induced hardware faults: High-energy particle radiation experiments confirm improvements in fault tolerance by almost 80 percent.

2017-01-01T00:00:00Z Memory-aware mapping strategies for heterogeneous MPSoC systems Holzkamp, Olivera http://hdl.handle.net/2003/35958 2017-05-09T02:00:11Z 2017-01-01T00:00:00Z

Title: Memory-aware mapping strategies for heterogeneous MPSoC systems Authors: Holzkamp, Olivera Abstract: Embedded systems, such as mobile phones, integrate more and more features, e.g. multiple cameras, GPS sensors and many other sensors and actuators. These kind of embedded systems are dealing with increasing complexity due to demands on performance and constraints in energy consumption. The performance on such systems can be increased by executing application tasks in parallel. To achieve this, multiprocessor systems-on-chip (MPSoC) devices were introduced. On the other side, the energy consumption of these systems has to be decreased, especially for battery-driven embedded systems. A reduction in energy consumption can be achieved by efficiently utilizing the hardware resources on these devices. MPSoC devices can be either homogeneous or heterogeneous. Homogeneous MPSoC devices usually contain the same type of processors with the same speed, i.e. clock frequency, and the same type and size of memories for each processor. In heterogeneous MPSoC devices, the processor types and/or clock frequencies and memory types and/or sizes may vary. During the last decade, research has dealt with optimizations for the efficient utilization of hardware resources on MPSoCs. Central issues are the extraction of parallelism from sequential code and the efficient mapping of the parallelized application tasks onto the processors of the system. A few frameworks have been developed which distribute parallelized application tasks to available processors while optimizing for one or more objectives such as performance and energy consumption. They usually integrate all required, foregoing steps such as the extraction of parallelized tasks from sequential code and the extraction of a task graph as input for the mapping optimization. These steps are performed either manually or in an automated way. These kind of frameworks help the embedded system designer to significantly reduce design time. Unfortunately, the influence of memories or memory hierarchies is neglected in mapping optimizations, even though it is a well-known fact that memories have a drastic impact on the runtime and energy consumption of the system. This dissertation investigates the effect of memory hierarchies in MPSoC mapping. Since a thread based application model is used, a thread graph extraction tool is introduced. Furthermore, two approaches for memory-aware mapping optimization for homogeneous and heterogeneous embedded MPSoC devices are presented. The thread graph extraction tool extracts a flat thread graph with important annotations for software requirements, hardware performance and energy consumption. This thread graph represents all required input information for the subsequent memory-aware mapping optimizations. Dependent on the complexity of the application, the designer can choose between a fine-grained and a coarse-grained thread graph and thus influence the overall design time. The first presented memory-aware mapping approach handles single objective optimizations, which reduce either the runtime or the energy consumption of the system. The second presented memory-aware mapping approach handles a multiobjective optimization, which reduces both, runtime and energy consumption. All approaches additionally reduce the work of the embedded system designer and thus the design time. They work in a fully automated way and are integrated within the MACCv2/MNEMEE tool flow. The MNEMEE tool flow also provides all required foregoing steps such as the parallelization of sequential application code. The presented evaluations show that considering memory mapping during MPSoC mapping optimization significantly reduces the application runtime and energy consumption. The single objective optimizations are able to achieve an average reduction in runtime by about 21% and an average reduction in energy consumption by about 28%. The multiobjective memory-aware mapping optimization achieves an average reduction in runtime by about 21% and an average reduction in energy consumption by about 26%. Both presented optimization approaches were validated for homogeneous and heterogeneous MPSoC devices. The results clearly show that neglecting the memory subsystem can lead to wasted optimization potential.

2017-01-01T00:00:00Z Modeling and training options for handwritten Arabic text recognition Ahmad, Irfan http://hdl.handle.net/2003/35899 2017-03-25T03:00:11Z 2016-01-01T00:00:00Z

Title: Modeling and training options for handwritten Arabic text recognition Authors: Ahmad, Irfan

2016-01-01T00:00:00Z Die Detektion interessanter Objekte unter Verwendung eines objektbasierten Aufmerksamkeitsmodells Naße, Fabian http://hdl.handle.net/2003/35783 2017-02-09T03:00:07Z 2016-01-01T00:00:00Z

Title: Die Detektion interessanter Objekte unter Verwendung eines objektbasierten Aufmerksamkeitsmodells Authors: Naße, Fabian Abstract: Das visuelle System des Menschen ist in der Lage, komplexe Aufgaben, wie beispielsweise das Erkennen von Objekten und Personen, problemlos zu bewältigen. Mit dem Begriff Computer-Vision wird ein Forschungsgebiet bezeichnet, bei der die Fragestellung im Vordergrund steht, wie eine vergleichbare Leistungsfähigkeit in technischen Systemen erreicht werden kann. In dieser Dissertation wird diesbezüglich das Prinzip der visuellen Aufmerksamkeit betrachtet, dass einen wichtigen Aspekt des menschlichen Sehsystems darstellt. Es besagt, dass der bewussten Wahrnehmung ein unbewusster Prozess vorausgeht, durch den die Aufmerksamkeit selektiv auf potentiell wichtige oder interessante Sehinhalte gelenkt wird. Es handelt sich dabei um eine Strategie der effizienten Informationsverarbeitung, die ein schnelles Reagieren auf relevante Inhalte erlaubt. In diesem Zusammenhang bezeichnet der Begriff der visuellen Salienz die Eigenschaft von Sehinhalten, im Vergleich zu ihrem Umfeld hervorzustechen und deshalb Aufmerksamkeit zu stimulieren. Im Allgemeinen besteht für solche Inhalte eine vergleichsweise hohe Wahrscheinlichkeit, dass sie für das beobachtende Individuum von Interesse sind. Diese Arbeit hat das Thema der aufmerksamkeitsbasierten Objektdetektion zum Gegenstand. Motiviert wird das Thema als eine Alternative zu wissensbasierten Objektdetektionsverfahren, bei denen Klassifizierungsmodelle mittels annotierten Beispielbildern angelernt werden. Solche Verfahren sind im Allgemeinen mit einem hohen manuellen Vorbereitungsaufwand verbunden, weisen eine hohe Komplexität auf und skalieren schlecht mit der Anzahl der betrachteten Objektkategorien. Die zentrale Fragestellung dieser Arbeit ist es deshalb, ob sich Salienz als Kriterium für eine effizientere Lokalisierung von Objekten in Bildern nutzen lässt. Aufbauend auf der These, dass gerade die interessanten Objekte einer Szene visuell salient sind, soll durch einen aufmerksamkeitsbasierten Ansatz eine schnelle und aufwandsarme Detektion solcher Objekte ermöglicht werden. Es werden in dieser Arbeit zunächst wichtige Grundlagen aus den Bereichen der Mustererkennung, des maschinellen Lernens und der Bildverarbeitung erläutert. Anschließend werden klassische Strategien zur Lokalisierung von Objekten in Bildern aufgezeigt. Dabei werden Vor- und Nachteile verschiedener Lokalisierungsstrategien im Hinblick auf den aufmerksamkeitsbasierten Ansatz betrachtet. Im Anschluss daran werden grundlegende Konzepte sowie einflussreiche Theorien und Modelle zur visuellen Aufmerksamkeit des Menschen aufgezeigt. Hieran schließt sich eine Betrachtung mathematischer Aufmerksamkeitsmodelle aus der Literatur an. Aufbauend darauf wird ein eigenes Aufmerksamkeitsmodell vorgeschlagen, dass Objektvorschläge ermittelt und anhand ihrer Salienz bewertet. Zwecks einer generischen Anwendbarkeit wird dabei ein rein datengetriebener Ansatz favorisiert, bei dem bewusst auf die Verwendung problemspezifischen Vorwissens verzichtet wird. Das Verfahren wird schließlich auf einem schwierigen Benchmark evaluiert. Dabei werden durch Vergleiche mit anderen Modellen aus der Literatur die Vorteile der vorgeschlagenen Methoden hervorgehoben. Des Weiteren wird bei der Betrachtung der Ergebnisse gezeigt, dass Salienz ein wichtiges Kriterium bei der generischen Lokalisierung von Objekten in komplexen Bildern darstellt.

2016-01-01T00:00:00Z Lampung handwritten character recognition Junaidi, Akmal http://hdl.handle.net/2003/35321 2017-04-28T08:11:54Z 2016-01-01T00:00:00Z

Title: Lampung handwritten character recognition Authors: Junaidi, Akmal Abstract: Lampung script is a local script from Lampung province Indonesia. The script is a non-cursive script which is written from left to right. It consists of 20 characters. It also has 7 unique diacritics that can be put on top, bottom, or right of the character. Considering this position, the number of diacritics augments into 12 diacritics. This research is devoted to recognize Lampung characters along with diacritics. The research aim to attract more concern on this script especially from Indonesian researchers. Beside, it is also an endeavor to preserve the script from extinction. The work of recognition is administered by multi steps processing system the so called Lampung handwritten character recognition framework. It is started by a preprocessing of a document image as an input. In the preprocessing stage, the input should be distinguished between characters and diacritics. The character is classified by a multistage scheme. The first stage is to classify 18 character classes and the second stage is to classify special characters which consist of two components. The number of classes after the second stage classification becomes 20 class. The diacritic is classified into 7 classes. These diacritics should be associated to the characters to form compound characters. The association is performed in two steps. Firstly, the diacritic detects some characters nearby. The character with closest distance to that diacritic is selected as the association. This is completed until all diacritics get their characters. Since every diacritic already has one-to-one association to a character, the pivot element is switched to a character in the second step. Each character collects all its diacritics as a composition of the compound characters. This framework has been evaluated on Lampung dataset created and annotated during this work and is hosted at the Department of Computer Science, TU Dortmund, Germany. The proposed framework achieved 80.64% recognition rate on this data.

2016-01-01T00:00:00Z Efficient fault-injection-based assessment of software-implemented hardware fault tolerance Schirmeier, Horst Benjamin http://hdl.handle.net/2003/35175 2016-08-12T08:42:15Z 2016-01-01T00:00:00Z

Title: Efficient fault-injection-based assessment of software-implemented hardware fault tolerance Authors: Schirmeier, Horst Benjamin Abstract: With continuously shrinking semiconductor structure sizes and lower supply voltages, the per-device susceptibility to transient and permanent hardware faults is on the rise. A class of countermeasures with growing popularity is Software-Implemented Hardware Fault Tolerance (SIHFT), which avoids expensive hardware mechanisms and can be applied application-specifically. However, SIHFT can, against intuition, cause more harm than good, because its overhead in execution time and memory space also increases the figurative “attack surface” of the system – it turns out that application-specific configuration of SIHFT is in fact a necessity rather than just an advantage. Consequently, target programs need to be analyzed for particularly critical spots to harden. SIHFT-hardened programs need to be measured and compared throughout all development phases of the program to observe reliability improvements or deteriorations over time. Additionally, SIHFT implementations need to be tested. The contributions of this dissertation focus on Fault Injection (FI) as an assessment technique satisfying all these requirements – analysis, measurement and comparison, and test. I describe the design and implementation of an FI tool, named Fail*, that overcomes several shortcomings in the state of the art, and enables research on the general drawbacks of simulation-based FI. As demonstrated in four case studies in the context of SIHFT research, Fail* provides novel fine-grained analysis techniques that exploit the newly gained possibility to analyze FI results from complete fault-space exploration. These analysis techniques aid SIHFT design decisions on the level of program modules, functions, variables, source-code lines, or single machine instructions. Based on the experience from the case studies, I address the problem of large computation efforts that accompany exhaustive fault-space exploration from two different angles: Firstly, I develop a heuristical fault-space pruning technique that allows to freely trade the total FI-experiment count for result accuracy, while still providing information on all possible faultspace coordinates. Secondly, I speed up individual TAP-based FI experiments by improving the fast-forwarding operation by several orders of magnitude for most workloads. Finally, I dissect current practices in FI-based evaluation of SIHFT-hardened programs, identify three widespread pitfalls in the result interpretation, and advance the state of the art by defining a novel comparison metric.

2016-01-01T00:00:00Z Tight integration of cache, path and task-interference modeling for the analysis of hard real time systems Kleinsorge, Jan C. http://hdl.handle.net/2003/34332 2015-11-16T07:32:34Z 2015-01-01T00:00:00Z

Title: Tight integration of cache, path and task-interference modeling for the analysis of hard real time systems Authors: Kleinsorge, Jan C. Abstract: Traditional timing analysis for hard real-time systems is a two-step approach consisting of isolated per-task timing analysis and subsequent scheduling analysis which is conceptually entirely separated and is based only on execution time bounds of whole tasks. Today this model is outdated as it relies on technical assumptions that are not feasible on modern processor architectures any longer. The key limiting factor in this traditional model is the interfacing from micro-architectural analysis of individual tasks to scheduling analysis — in particular path analysis as the binding step between the two is a major obstacle. In this thesis, we contribute to traditional techniques that overcome this problem by means of by passing path analysis entirely, and propose a general path analysis and several derivatives to support improved interfacing. Specifically, we discuss, on the basis of a precise cache analysis, how existing metrics to bound cache-related preemption delay (CRPD) can be derived from cache representation without separate analyses, and suggest optimizations to further reduce analysis complexity and to increase accuracy. In addition, we propose two new estimation methods for CRPD based on the explicit elimination of infeasible task interference scenarios. The first one is conventional in that path analysis is ignored, the second one specifically relies on it. We formally define a general path analysis framework in accordance to the principles of program analysis — as opposed to most existing approaches that differ conceptually and therefore either increase complexity or entail inherent loss of information — and propose solutions for several problems specific to timing analysis in this context. First, we suggest new and efficient methods for loop identification. Based on this, we show how path analysis itself is applied to the traditional problem of per-task worst-case execution time bounds, define its generalization to sub-tasks, discuss several optimizations and present an efficient reference algorithm. We further propose analyses to solve related problems in this domain, such as the estimation of bounds on best-case execution times, latest execution times, maximum blocking times and execution frequencies. Finally, we then demonstrate the utility of this additional information in scheduling analysis by proposing a new CRPD bound.

2015-01-01T00:00:00Z Flexible error handling for embedded real time systems Heinig, Andreas http://hdl.handle.net/2003/34098 2015-08-13T01:43:42Z 2015-01-01T00:00:00Z

Title: Flexible error handling for embedded real time systems Authors: Heinig, Andreas Abstract: Due to advancements of semiconductor fabrication that lead to shrinking geometries and lowered supply voltages of semiconductor devices, transient fault rates will increase significantly for future semiconductor generations [Int13]. To cope with transient faults, error detection and correction is mandatory. However, additional resources are required for their implementation. This is a serious problem in embedded systems development since embedded systems possess only a limited number of resources, like processing time, memory, and energy. To cope with this problem, a software-based flexible error handling approach is proposed in this dissertation. The goal of flexible error handling is to decide if, how, and when errors have to be corrected. By applying this approach, deadline misses will be reduced by up to 97% for the considered video decoding benchmark. Furthermore, it will be shown that the approach is able to cope with very high error rates of nearly 50 errors per second.

2015-01-01T00:00:00Z Cache-Kohärenz in hart echtzeitfähigen Mehrkern-Prozessoren Pyka, Arthur http://hdl.handle.net/2003/34097 2015-08-12T20:19:31Z 2015-01-01T00:00:00Z

Title: Cache-Kohärenz in hart echtzeitfähigen Mehrkern-Prozessoren Authors: Pyka, Arthur Abstract: Im Bereich der Echtzeitsysteme rücken Mehrkern-Prozessoren zunehmend in den Fokus. Dabei stellen Echtzeitsysteme besondere Anforderungen an die eingesetzte Systemarchitektur. Neben der logischen Korrektheit, ist in Echtzeitsystemen eine zeitlich vorhersagbare Ausführung entscheidend. Cache- Speicher spielen in dieser Hinsicht eine besondere Rolle. Zum einen sind sie notwendig, um schnelle Zugriffe auf Instruktionen und Daten zu gewährleisten, zum anderen beeinträchtigen sie die zeitliche Vorhersagbarkeit der Ausführung. Beim Zugriff auf gemeinsame Daten in Mehrkern-Prozessoren ist zudem der Einsatz eines Cache-Kohärenzverfahrens notwendig. Gängige Kohärenzverfahren können die Anforderungen an Performanz und Echtzeitfähigkeit nicht hinreichend erfüllen. Die in hardwarebasierten Kohärenzverfahren eingesetzten Kohärenzoperationen machen eine präzise WCET-Abschätzung undurchführbar. Der On-Demand Coherent Cache (ODC2) stellt ein Cache- Kohärenzverfahren dar, das im Hinblick auf den Einsatz in Echtzeitsystemen entwickelt wurde. Es verzichtet auf eine gegenseitige Beeinflussung von Cache- Speicher durch Kohärenzoperationen und erreicht dadurch eine hinreichende zeitliche Vorhersagbarkeit der Zugriffe auf gemeinsame Daten. Das Verfahren des ODC2 zielt auf eine möglichst effiziente Nutzung des Cache-Speichers hin. Im Vergleich zu gängigen, softwarebasierten Verfahren ermöglicht es eine signifikant höhere (Worst-Case) Performanz.

2015-01-01T00:00:00Z WCET analysis and optimization for multi-core real-time systems Kelter, Timon http://hdl.handle.net/2003/33992 2015-08-13T01:42:19Z 2015-01-01T00:00:00Z

Title: WCET analysis and optimization for multi-core real-time systems Authors: Kelter, Timon

2015-01-01T00:00:00Z Automatic parallelization for embedded multi-core systems using high level cost models Cordes, Daniel Alexander http://hdl.handle.net/2003/31796 2015-08-12T23:15:05Z 2013-12-20T00:00:00Z

Title: Automatic parallelization for embedded multi-core systems using high level cost models Authors: Cordes, Daniel Alexander Abstract: Nowadays, embedded and cyber-physical systems are utilized in nearly all operational areas in order to support and enrich peoples' everyday life. To cope with the demands imposed by modern embedded systems, the employment of MPSoC devices is often the most profitable solution. However, many embedded applications are still written in a sequential way. In order to benefit from the multiple cores available on those devices, the application code has to be divided into concurrently executed tasks. Since performing this partitioning manually is an error-prone and also time-consuming job, many automatic parallelization approaches were developed in the past. Most of these existing approaches were developed in the context of high-performance and desktop computers so that their applicability to embedded devices is limited. Many new challenges arise if applications should be ported to embedded MPSoCs in an efficient way. Therefore, novel parallelization techniques were developed in the context of this thesis that are tailored towards special requirements demanded by embedded multi-core devices. All approaches presented in this thesis are based on sophisticated parallelization techniques employing high-level cost models to estimate the benefit of parallel execution. This enables the creation of well-balanced tasks, which is essential if applications should be parallelized efficiently. In addition, several other requirements of embedded devices are covered, like the consideration of multiple objectives simultaneously. As a result, beneficial trade-offs between several objectives, like, e.g., energy consumption and execution time can be found enabling the extraction of solutions which are highly optimized for a specific application scenario. To be applicable to many embedded application domains, approaches extracting different kinds of parallelism were also developed. The structure of the global parallelization approach facilitates the combination of different approaches in a plug-and-play fashion. Thus, the advantages of multiple parallelization techniques can easily be combined. Finally, in addition to parallelization approaches for homogeneous MPSoCs, optimized ones for heterogeneous devices were also developed in this thesis since the trend towards heterogeneous multi-core architectures is inexorable. To the best of the author's knowledge, most of these objectives and especially their combination were not covered by existing parallelization frameworks, so far. By combining all of them, a parallelization framework that is well optimized for embedded multi-core devices was developed in the context of this thesis.

2013-12-20T00:00:00Z Resource efficient processing and communication in sensor/actuator environments Timm, Constantin http://hdl.handle.net/2003/29731 2015-08-12T22:14:56Z 2012-10-29T00:00:00Z

Title: Resource efficient processing and communication in sensor/actuator environments Authors: Timm, Constantin Abstract: The future of computer systems will not be dominated by personal computer like hardware platforms but by embedded and cyber-physical systems assisting humans in a hidden but omnipresent manner. These pervasive computing devices can, for example, be utilized in the home automation sector to create sensor/ actuator networks supporting the inhabitants of a house in everyday life. The efficient usage of resources is an important topic at design time and operation time of mobile embedded and cyber-physical systems. Therefore, this thesis presents methods which allow an efficient use of energy and processing resources in sensor/actuator networks. These networks comprise different nodes cooperating for a “smart” joint control function. Sensor/actuator nodes are typical cyber-physical systems comprising sensors/actuators and processing and communication components. Processing components of today’s sensor nodes can comprise many-core chips. This thesis introduces new methods for optimizing the code and the application mapping of the aforementioned systems and presents novel results with regard to design space explorations for energy-efficient and embedded many-core systems. The considered many-core systems are graphics processing units. The application code for these graphics processing units is optimized for a particular platform variant with the objectives of minimal energy consumption and/or of minimal runtime. These two objectives are targeted with the utilization of multi-objective optimization techniques. The mapping optimizations are realized by means of multi-objective design space explorations. Furthermore, this thesis introduces new techniques and functions for a resource-efficient middleware design employing service-oriented architectures. Therefore, a service-oriented architecture based middleware framework is presented which comprises a lightweight service orchestration. In addition to that, a flexible resource management mechanism will be introduced. This resource management adapts resource utilization and services to an environmental context and provides methods to reduce the energy consumption of sensor nodes.

2012-10-29T00:00:00Z Memory-based optimization techniques for real-time systems Plazar, Sascha http://hdl.handle.net/2003/29500 2015-08-12T23:49:40Z 2012-07-06T00:00:00Z

Title: Memory-based optimization techniques for real-time systems Authors: Plazar, Sascha Abstract: Embedded/Cyber-physical systems, have become popular in a wide range of application scenarios. Such systems are called real-time systems if they underlie strict timing constraints. To verify if such systems can meet their deadlines, the knowledge of an upper bound for a program's execution time is mandatory. This upper bound is also called worst-case execution time (WCET) and is estimated by static timing analyzers. Established optimizing compilers are not aware of the WCET as objective since they focus on the minimization of the average-case execution time (ACET). To overcome this obstacle, this thesis presents memory-based optimization techniques which focus on the reduction of the WCET of programs. All presented optimizations are integrated into the WCET-aware C Compiler (WCC) framework. Since the memory interface of a system often turns out to be a bottleneck which limits the performance of a system, the presented optimizations are applied to different levels of the memory hierarchy of a system. Starting within a CPU core, the instruction fetch buffer is the most tightly coupled memory which tries to provide the next few instructions to be executed. Optimization techniques are presented improving the efficiency of this buffer w.r.t. the WCET of a system. Instruction caches placed between the CPU core and the main memory try to speed up accesses to the main memory by storing local copies in fast small cache memories. In order to improve the efficiency of this part of the memory hierarchy, a memory content selection approach is introduced which improves the WCET of a program by improving the cache performance. Due to the fact that multi-task systems are employed in almost all domains, this thesis presents elaborate extensions to a compiler supporting the compilation and WCET-aware optimization of multi-task systems. These extensions exploited to develop a number of novel optimizations for systems running multiple tasks. As first optimization, a WCET-driven software-based cache partitioning demonstrates the effectiveness of considering the WCET for the optimization of a set of tasks. Furthermore, many embedded systems integrate so-called scratchpad memories (SPM) as tightly coupled memories. An optimization approach for SPM allocation in a multi-task scenario is proposed. Besides, a holistic view of memory architecture compilation considers a number of memory-based WCET optimizations and presents approaches for a combined application. Existing compiler frameworks which are able to consider the WCET during optimization are limited to a particular hardware platform. In order to support multiple platforms, this thesis presents techniques to extend an existing WCET-aware compiler framework. Based on these extensions, a novel static cache locking optimization selects memory blocks which are statically locked into the instruction cache driven by WCET reductions. Applying these optimizations, the WCET of real-time applications can be reduced by about 35% to 48%. These results underline the need for specialized WCET-driven optimization techniques integrated into a sophisticated compiler framework. Otherwise, immense optimization potential would remain unused resulting in oversized and thus costly Embedded/Cyber-physical systems.

2012-07-06T00:00:00Z Videobasierte Gestenerkennung in einer intelligenten Umgebung Richarz, Jan http://hdl.handle.net/2003/29287 2015-08-12T18:06:22Z 2012-01-19T00:00:00Z

Title: Videobasierte Gestenerkennung in einer intelligenten Umgebung Authors: Richarz, Jan Abstract: Diese Dissertation umfasst die Konzeption einer berührungslosen und nutzerunabhängigen visuellen Klassifikation von Armgesten anhand ihrer räumlich-zeitlichen Bewegungsmuster mit Methoden der Computer Vision, der Mustererkennung und des maschinellen Lernens. Das Anwendungsszenario ist hierbei ein intelligenter Konferenzraum, der mit mehreren handelsüblichen Kameras ausgerüstet ist. Dieses Szenario stellt aus drei Gründen eine besondere Herausforderung dar: Für eine möglichst intuitive Interaktion ist es erstens notwendig, die Erkennung unabhängig von der Position und Orientierung des Nutzers im Raum zu realisieren. Somit werden vereinfachende Annahmen bezüglich der relativen Positionen von Nutzer und Kamera weitgehend ausgeschlossen. Zweitens wird ein realistisches Innenraumszenario betrachtet, bei dem sich die Umgebungsbedingungen abrupt ändern können und sehr unterschiedliche Blickwinkel der Kameras auftreten. Das erfordert die Entwicklung adaptiver Methoden, die sich schnell an derartige Änderungen anpassen können bzw. in weiten Grenzen dagegen robust sind. Drittens stellt die Verwendung eines nicht synchronisierten Multikamerasystems eine Neuerung dar, die dazu führt, dass während der 3D-Rekonstruktion der Hypothesen aus verschiedenen Kamerabildern besonderes Augenmerk auf den Umgang mit dem auftretenden zeitlichen Versatz gelegt werden muss. Dies hat auch Folgen für die Klassifikationsaufgabe, weil in den rekonstruierten 3D-Trajektorien mit entsprechenden Ungenauigkeiten zu rechnen ist. Ein wichtiges Kriterium für die Akzeptanz einer gestenbasierten Mensch-Maschine-Schnittstelle ist ihre Reaktivität. Daher wird bei der Konzeption besonderes Augenmerk auf die effiziente Umsetzbarkeit der gewählten Methoden gelegt. Insbesondere wird eine parallele Verarbeitungsstruktur realisiert, in der die verschiedenen Kameradatenströme getrennt verarbeitet und die Einzelergebnisse anschließend kombiniert werden. Im Rahmen der Dissertation wurde die komplette Bildverarbeitungspipeline prototypisch realisiert. Sie umfasst unter anderem die Schritte Personendetektion, Personentracking, Handdetektion, 3D-Rekonstruktion der Hypothesen und Klassifikation der räumlich-zeitlichen Gestentrajektorien mit semikontinuierlichen Hidden Markov Modellen (HMM). Die realisierten Methoden werden anhand realistischer, anspruchsvoller Datensätze ausführlich evaluiert. Dabei werden sowohl für die Personen- als auch für die Handdetektion sehr gute Ergebnisse erzielt. Die Gestenklassifikation erreicht Klassifikationsraten von annähernd 90% für neun verschiedene Gesten.

2012-01-19T00:00:00Z Subword-based Stochastic Segment Modeling for Offline Arabic Handwriting Recognition Cao, Huaigu Manohar, Vasant Natarajan, Prem Prasad, Rohit Subramanian, Krishna http://hdl.handle.net/2003/27564 2015-08-13T00:00:58Z 2011-01-12T00:00:00Z

Title: Subword-based Stochastic Segment Modeling for Offline Arabic Handwriting Recognition Authors: Cao, Huaigu; Manohar, Vasant; Natarajan, Prem; Prasad, Rohit; Subramanian, Krishna Abstract: In this paper, we describe several experiments in which we use a stochastic segment model (SSM) to improve offline handwriting recognition (OHR) performance. We use the SSM to re-rank (re-score) multiple decoder hypotheses. Then, a probabilistic multi-class SVM is trained to model stochastic segments obtained from force aligning transcriptions with the underlying image. We extract multiple features from the stochastic segments that are sensitive to larger context span to train the SVM. Our experiments show that using confidence scores from the trained SVM within the SSM framework can significantly improve OHR performance. We also show that OHR performance can be improved by using a combination of character-based and parts-of-Arabic-words (PAW)-based SSMs.

2011-01-12T00:00:00Z Arabic Handwritten Alphanumeric Character Recognition using Fuzzy Attributed Turning Functions Mahmoud, Sabri Parvez, Mohammad Tanvir http://hdl.handle.net/2003/27563 2015-08-12T22:55:28Z 2011-01-12T00:00:00Z

Title: Arabic Handwritten Alphanumeric Character Recognition using Fuzzy Attributed Turning Functions Authors: Mahmoud, Sabri; Parvez, Mohammad Tanvir Abstract: In this paper, we present a novel method for recognition of unconstrained handwritten Arabic alphanumeric characters. The algorithm binarizes the character image, smoothes it and extracts its contour. A novel approach for polygonal approximation of handwritten character contours is applied. The directions and length features are extracted from the polygonal approximation. These features are used to build character models in the training phase. For the recognition purpose, we introduce Fuzzy Attributed Turning Functions (FATF) and define a dissimilarity measure based on FATF for comparing polygonal shapes. Experimental results demonstrate the effectiveness of our algorithm for recognition of handwritten Arabic characters. We have obtained around 98% accuracy for Arabic handwritten characters and more than 97% accuracy for handwritten Arabic numerals.

2011-01-12T00:00:00Z Arabic Handwriting Synthesis Al-Muhtaseb, Husni Elarian, Yousef Ghouti, Lahouari http://hdl.handle.net/2003/27562 2015-08-12T23:56:57Z 2011-01-12T00:00:00Z

Title: Arabic Handwriting Synthesis Authors: Al-Muhtaseb, Husni; Elarian, Yousef; Ghouti, Lahouari Abstract: Training and testing data for optical character recognition are cumbersome to obtain. If large amounts of data can be produced from small amounts, much time and effort can be saved. This paper presents an approach to synthesize Arabic handwriting. We segment word images into labeled characters and then use these in synthesizing arbitrary words. The synthesized text should look natural; hence, we define some criteria to decide on what is acceptable as natural-looking. The text that is synthesized by using the natural-looking constrain is compared to text that is synthesized without using the natural-looking constrain for evaluation.

2011-01-12T00:00:00Z A Lexicon of Connected Components for Arabic Optical Text Recognition Elarian, Yousef Idris, Fayez http://hdl.handle.net/2003/27561 2015-08-13T02:28:30Z 2011-01-12T00:00:00Z

Title: A Lexicon of Connected Components for Arabic Optical Text Recognition Authors: Elarian, Yousef; Idris, Fayez Abstract: Arabic is a cursive script that lacks the ease of character segmentation. Hence, we suggest a unit that is discrete in nature, viz. the connected component, for Arabic text recognition. A lexicon listing valid Arabic connected components is necessary to any system that is to use such unit. Here, we produce and analyze a comprehensive lexicon of connected components. A lexicon can be extracted from corpora or synthesized from morphemes. We follow both approaches and merge their results. Besides, generation of a lexicon of connected components encompasses extra tokenization and point-normalization steps to make the size of the lexicon tractable. We produce a lexicon of surface-words, reduce it into a lexicon of connected components, and finally into a lexicon of point normalized connected components. The lexicon of point normalized connected components contains 684,743 entries, showing a percent decrease of 97.17% from the word-lexicon.

2011-01-12T00:00:00Z Writer Identification of Arabic Handwritten Digits Awaida, Sameh Mahmoud, Sabri http://hdl.handle.net/2003/27560 2015-08-12T16:50:13Z 2011-01-12T00:00:00Z

Title: Writer Identification of Arabic Handwritten Digits Authors: Awaida, Sameh; Mahmoud, Sabri Abstract: This paper addresses the identification of Arabic handwritten digits. In addition to digit identifiability, the paper presents digit recognition. The digit image is divided into grids based on the distribution of the black pixels in the image. Several types of features are extracted (viz. gradient, curvature, density, horizontal and vertical run lengths, stroke, and concavity features) from the grid segments. K-Nearest Neighbor and Nearest Mean classifiers are used. A database of 70000 of Arabic handwritten digit samples written by 700 writers is used in the analysis and experimentations. The identifiability of isolated and combined digits are tested. The analysis of the results indicates that Arabic digits 3 (٣), 4 (٤), 8 (٨), and 9 (٩) are more identifiable than other digits while Arabic digit 0 (٠) and 1 (١) are the least identifiable. In addition, the paper shows that combining the writer’s digits increases the discriminability power of Arabic handwritten digits. Combining the features of all digits, K-NN provided the best accuracy in text-independent writer identification with top-1 result of 88.14%, top-5 result of 94.81%, and top-10 results of 96.48%.

2011-01-12T00:00:00Z A new System for offline Printed Arabic Recognition for Large Vocabulary : SPARLV Dhouib, Mariem Miledi Kanoun, Slim http://hdl.handle.net/2003/27559 2015-08-13T00:52:15Z 2011-01-12T00:00:00Z

Title: A new System for offline Printed Arabic Recognition for Large Vocabulary : SPARLV Authors: Dhouib, Mariem Miledi; Kanoun, Slim Abstract: This paper presents a contribution for the Arabic printed recognition. In fact, we are interested in the printed decomposable Arabic word recognition. The proposed system uses the analytical approach through the segmentation into characters to succeed to a generation of letter hypotheses as well as word hypotheses using a lexical verification in a pre-established dictionary of the language. Our proposed system SPARLV is able to put valid hypotheses of words thanks to the lexical verification.

2011-01-12T00:00:00Z Towards Feature Learning for HMM-based Offline Handwriting Recognition Fink, Gernot A. Hammerla, Nils Y. Plötz, Thomas Vajda, Szilárd http://hdl.handle.net/2003/27556 2015-08-12T20:35:41Z 2011-01-12T00:00:00Z

Title: Towards Feature Learning for HMM-based Offline Handwriting Recognition Authors: Fink, Gernot A.; Hammerla, Nils Y.; Plötz, Thomas; Vajda, Szilárd Abstract: Statistical modelling techniques for automatic reading systems substantially rely on the availability of compact and meaningful feature representations. State-of-the-art feature extraction for offline handwriting recognition is usually based on heuristic approaches that describe either basic geometric properties or statistical distributions of raw pixel values. Working well on average, still fundamental insights into the nature of handwriting are desired. In this paper we present a novel approach for the automatic extraction of appearance-based representations of offline handwriting data. Given the framework of deep belief networks -- Restricted Boltzmann Machines -- a two-stage method for feature learning and optimization is developed. Given two standard corpora of both Arabic and Roman handwriting data it is demonstrated across script boundaries, that automatically learned features achieve recognition results comparable to state-of-the-art handcrafted features. Given these promising results the potential of feature learning for future reading systems is discussed.

2011-01-12T00:00:00Z Advanced ensemble methods for automatic classification of 1H-NMR spectra Lienemann, Kai http://hdl.handle.net/2003/27321 2017-06-03T18:12:27Z 2010-08-03T00:00:00Z

Title: Advanced ensemble methods for automatic classification of 1H-NMR spectra Authors: Lienemann, Kai

2010-08-03T00:00:00Z Mikroarchitektur-Synthese mit genetischen Algorithmen Lorenz, Markus http://hdl.handle.net/2003/21453 2015-08-12T19:02:29Z 2005-06-01T00:00:00Z

Title: Mikroarchitektur-Synthese mit genetischen Algorithmen Authors: Lorenz, Markus

2005-06-01T00:00:00Z Hardware-Partitionierung für Prototypen-Boards Falk, Heiko http://hdl.handle.net/2003/21452 2015-08-12T19:02:26Z 2005-06-01T00:00:00Z

Title: Hardware-Partitionierung für Prototypen-Boards Authors: Falk, Heiko

2005-06-01T00:00:00Z Codeerzeugung für den digitalen Signalprozessor TI TMS320X5x Barschdorf, Thomas http://hdl.handle.net/2003/21451 2015-08-12T19:02:24Z 2005-06-01T00:00:00Z

Title: Codeerzeugung für den digitalen Signalprozessor TI TMS320X5x Authors: Barschdorf, Thomas

2005-06-01T00:00:00Z Vergleich von CLP und ILP basierten Optimierungsstrategien am Beispiel der Codegenerierung für DSPs Menne, Torsten http://hdl.handle.net/2003/21450 2015-08-12T19:02:22Z 2005-06-01T00:00:00Z

Title: Vergleich von CLP und ILP basierten Optimierungsstrategien am Beispiel der Codegenerierung für DSPs Authors: Menne, Torsten

2005-06-01T00:00:00Z Analysen und Methoden optimierender Compiler zur Steigerung der Effizienz von Speicherzugriffen in eingebetteten Systemen Franke, Bjoern http://hdl.handle.net/2003/21449 2015-08-12T23:53:06Z 2005-06-01T00:00:00Z

Title: Analysen und Methoden optimierender Compiler zur Steigerung der Effizienz von Speicherzugriffen in eingebetteten Systemen Authors: Franke, Bjoern

2005-06-01T00:00:00Z Entwurf und Realisierung eines skalierbaren FPGA-Prototypenboards Rave, Stefan http://hdl.handle.net/2003/21448 2015-08-13T00:43:46Z 2005-06-01T00:00:00Z

Title: Entwurf und Realisierung eines skalierbaren FPGA-Prototypenboards Authors: Rave, Stefan

2005-06-01T00:00:00Z Energiemessung von ARM7TDMI Prozessor-Instruktion Theokharidis, Michael http://hdl.handle.net/2003/21447 2015-08-12T23:53:08Z 2005-06-01T00:00:00Z

Title: Energiemessung von ARM7TDMI Prozessor-Instruktion Authors: Theokharidis, Michael

2005-06-01T00:00:00Z Reduktion des Energiebedarfs von Programmen für den ARM-Prozessor durch Registerpipelining Schwarz, Rüdiger http://hdl.handle.net/2003/21446 2015-08-12T23:53:01Z 2005-06-01T00:00:00Z

Title: Reduktion des Energiebedarfs von Programmen für den ARM-Prozessor durch Registerpipelining Authors: Schwarz, Rüdiger

2005-06-01T00:00:00Z Adresszuweisung für den M3-DSP Kottmann, David http://hdl.handle.net/2003/21445 2015-08-12T23:53:03Z 2005-06-01T00:00:00Z

Title: Adresszuweisung für den M3-DSP Authors: Kottmann, David

2005-06-01T00:00:00Z Übersetzung und Optimierung objektorientierter Programmiersprachen unter besonderer Berücksichtigung eingebetteter Systeme Jagla, Frank http://hdl.handle.net/2003/21444 2015-08-12T23:52:55Z 2005-06-01T00:00:00Z

Title: Übersetzung und Optimierung objektorientierter Programmiersprachen unter besonderer Berücksichtigung eingebetteter Systeme Authors: Jagla, Frank

2005-06-01T00:00:00Z Speicherpartitionierung in DSP-Compilern Kotte, Daniel http://hdl.handle.net/2003/21443 2015-08-12T23:52:59Z 2005-06-01T00:00:00Z

Title: Speicherpartitionierung in DSP-Compilern Authors: Kotte, Daniel

2005-06-01T00:00:00Z Messung des Energieverbrauchs von Caches am Beispiel de StrongARM-Prozessors Sapsford, Gregory http://hdl.handle.net/2003/21442 2015-08-12T23:52:57Z 2005-06-01T00:00:00Z

Title: Messung des Energieverbrauchs von Caches am Beispiel de StrongARM-Prozessors Authors: Sapsford, Gregory

2005-06-01T00:00:00Z Codierungsverfahren zur Reduktion des Energiebedarfs von Programmen Knauer, Markus http://hdl.handle.net/2003/21441 2015-08-12T23:51:29Z 2005-06-01T00:00:00Z

Title: Codierungsverfahren zur Reduktion des Energiebedarfs von Programmen Authors: Knauer, Markus

2005-06-01T00:00:00Z Energieeinsparung durch compilergesteuerte Nutzung des On-Chip-Speichers Zobiegala, Christoph http://hdl.handle.net/2003/21440 2015-08-12T23:52:53Z 2005-06-01T00:00:00Z

Title: Energieeinsparung durch compilergesteuerte Nutzung des On-Chip-Speichers Authors: Zobiegala, Christoph

2005-06-01T00:00:00Z Vergleich des Energieverbrauchs von Cache- und Scratch-Pad-Speichern für den ARM7-Prozessor Lee, Bo-Sik http://hdl.handle.net/2003/21439 2015-08-12T23:52:51Z 2005-06-01T00:00:00Z

Title: Vergleich des Energieverbrauchs von Cache- und Scratch-Pad-Speichern für den ARM7-Prozessor Authors: Lee, Bo-Sik

2005-06-01T00:00:00Z Generische Low-Level Optimierungen für RISC-Architekturen Hornbach, Lars http://hdl.handle.net/2003/21438 2015-08-12T23:52:49Z 2005-06-01T00:00:00Z

Title: Generische Low-Level Optimierungen für RISC-Architekturen Authors: Hornbach, Lars

2005-06-01T00:00:00Z XML-basierte generische Zwischendarstellung für Compiler Fiesel, Markus http://hdl.handle.net/2003/21436 2015-08-12T23:52:47Z 2005-06-01T00:00:00Z

Title: XML-basierte generische Zwischendarstellung für Compiler Authors: Fiesel, Markus

2005-06-01T00:00:00Z Architekturunabhängige Quellcodeoptimierung durch Mustererkennung Jakubowski, Jacek http://hdl.handle.net/2003/21435 2015-08-12T23:52:05Z 2005-06-01T00:00:00Z

Title: Architekturunabhängige Quellcodeoptimierung durch Mustererkennung Authors: Jakubowski, Jacek

2005-06-01T00:00:00Z Energieminimierung eingebetteter Programme durch die dynamische Nutzung eines Scratchpad-Speichers Grundwald, Nils http://hdl.handle.net/2003/21434 2015-08-12T19:02:19Z 2005-06-01T00:00:00Z

Title: Energieminimierung eingebetteter Programme durch die dynamische Nutzung eines Scratchpad-Speichers Authors: Grundwald, Nils

2005-06-01T00:00:00Z Codegrößenreduktion eingebetteter Systeme durch kombiniertes In- und Exlining Imhoff, Peter http://hdl.handle.net/2003/21433 2015-08-13T00:42:06Z 2005-06-01T00:00:00Z

Title: Codegrößenreduktion eingebetteter Systeme durch kombiniertes In- und Exlining Authors: Imhoff, Peter

2005-06-01T00:00:00Z Entwicklung eines generischen Codegenerators für RISC-Architekturen Kamphausen, Jörg http://hdl.handle.net/2003/21432 2015-08-12T23:52:44Z 2005-06-01T00:00:00Z

Title: Entwicklung eines generischen Codegenerators für RISC-Architekturen Authors: Kamphausen, Jörg

2005-06-01T00:00:00Z Compilergestützte Optimierung von Zugriffen auf partitionierte Speicher Helmig, Urs http://hdl.handle.net/2003/21431 2015-08-12T23:52:42Z 2005-06-01T00:00:00Z

Title: Compilergestützte Optimierung von Zugriffen auf partitionierte Speicher Authors: Helmig, Urs

2005-06-01T00:00:00Z Plattformabhängige Eliminierung gemeinsamer Teilausdrücke auf Quellcode-Ebene Vogt, Michael http://hdl.handle.net/2003/21430 2015-08-12T23:52:40Z 2005-06-01T00:00:00Z

Title: Plattformabhängige Eliminierung gemeinsamer Teilausdrücke auf Quellcode-Ebene Authors: Vogt, Michael

2005-06-01T00:00:00Z Compilergestützte Energiereduktion von SDRAM- und Flash-basierten Speichertechnologien Kernchen, André http://hdl.handle.net/2003/21429 2015-08-12T23:52:37Z 2005-06-01T00:00:00Z

Title: Compilergestützte Energiereduktion von SDRAM- und Flash-basierten Speichertechnologien Authors: Kernchen, André

2005-06-01T00:00:00Z Didaktik der Informatik - Teil 1 (Sommersemester 2004) Humbert, Ludger http://hdl.handle.net/2003/21346 2015-08-12T23:44:03Z 2004-07-28T00:00:00Z

Title: Didaktik der Informatik - Teil 1 (Sommersemester 2004) Authors: Humbert, Ludger

2004-07-28T00:00:00Z Humbert, Ludger: Didaktik der Informatik -Teil 2 (Wintersemester 2003/2004) Humbert, Ludger http://hdl.handle.net/2003/21345 2021-04-12T14:08:20Z 2004-02-18T00:00:00Z

Title: Humbert, Ludger: Didaktik der Informatik -Teil 2 (Wintersemester 2003/2004) Authors: Humbert, Ludger

2004-02-18T00:00:00Z Didaktik der Informatik für die Sekundarstufe I Humbert, Ludger http://hdl.handle.net/2003/21344 2021-04-12T14:07:17Z 2004-02-18T00:00:00Z

Title: Didaktik der Informatik für die Sekundarstufe I Authors: Humbert, Ludger

2004-02-18T00:00:00Z Didaktik der Informatik - Teil 1 Humbert, Ludger http://hdl.handle.net/2003/21343 2021-04-12T14:06:00Z 2003-11-06T00:00:00Z

Title: Didaktik der Informatik - Teil 1 Authors: Humbert, Ludger

2003-11-06T00:00:00Z Introduction to embedded systems Marwedel, Peter http://hdl.handle.net/2003/20364 2021-04-12T14:01:10Z 2005-04-25T00:00:00Z

Title: Introduction to embedded systems Authors: Marwedel, Peter

2005-04-25T00:00:00Z Prozessrechnertechnik Marwedel, Peter http://hdl.handle.net/2003/20363 2015-08-13T02:18:41Z 1999-10-14T00:00:00Z

Title: Prozessrechnertechnik Authors: Marwedel, Peter

1999-10-14T00:00:00Z Rechnerarchitektur Marwedel, Peter http://hdl.handle.net/2003/20362 2015-08-13T02:18:38Z 1999-10-14T00:00:00Z

Title: Rechnerarchitektur Authors: Marwedel, Peter

1999-10-14T00:00:00Z Rechnergestützter Entwurf / Produktion (Mikroelektronik Leupers, Rainer http://hdl.handle.net/2003/20361 2021-04-12T14:00:00Z 1999-10-13T00:00:00Z

Title: Rechnergestützter Entwurf / Produktion (Mikroelektronik Authors: Leupers, Rainer

1999-10-13T00:00:00Z Begleitmaterial zur Vorlesung Einführung in die Didaktik der Informatik Schubert, Sigrid http://hdl.handle.net/2003/2771 2015-08-12T19:07:12Z 1999-10-14T00:00:00Z

Title: Begleitmaterial zur Vorlesung Einführung in die Didaktik der Informatik; Einführung in die Didaktik der Informati Authors: Schubert, Sigrid

1999-10-14T00:00:00Z Performance- und energieeffiziente Compilierung für digitale SIMD-Signalprozessoren mittels genetischer Algorithmen Lorenz, Markus http://hdl.handle.net/2003/2770 2015-08-13T00:05:50Z 2003-06-03T00:00:00Z

Title: Performance- und energieeffiziente Compilierung für digitale SIMD-Signalprozessoren mittels genetischer Algorithmen Authors: Lorenz, Markus Abstract: In den letzten Jahren war ein ständig zunehmender Einsatz von eingebetteten Systemen in vielen Produkten unseres täglichen Lebens zu verzeichnen. Häufig sind an diese Systeme spezielle Anforderungen bezüglich einer Realzeitfähigkeit, einer geringen Größe und auch zunehmend eines geringen Energiebedarfs gebunden. Um diesen Anforderungen zu genügen und dennoch ein hohes Maß an Flexibilität beim Systementwurf beizubehalten, werden anstelle von anwendungsspezifischer Hardware häufig digitale Signalprozessoren (DSPs) zur Datenverarbeitung eingesetzt. Mit diesen wird auch bei Spezifikationsänderungen in späten Entwicklungsphasen i.d.R. keine kosten- und zeitintensive Neuentwicklung der verwendeten Hardware erforderlich. Leider stellt die manuelle Überführung eines Anwendungsprogramms in Assemblercode des Zielprozessors eine äußerst zeitaufwändige und fehlerträchtige Aufgabe dar. Aus diesem Grund werden Compiler benötigt, die in der Lage sind, eine gegebene Anwendung in effizienten Assemblercode zu überführen. Im Vergleich zu General-Purpose Prozessoren (GPPs) weisen DSPs jedoch spezielle Architekturmerkmale auf, die von herkömmlichen Compilertechniken nur unzureichend oder gar nicht ausgenutzt werden. Das Ziel dieser Arbeit besteht in der Entwicklung neuer Compilertechniken für DSPs, um die durch Compiler generierte Codequalität insbesondere hinsichtlich der Ausführungszeit und des Energiebedarfs zu verbessern. Um eine Wiederverwendung der entwickelten Techniken in anderen Compilern zu ermöglichen, setzen diese auf der ebenfalls in dieser Arbeit beschriebenen neuen Zwischendarstellung GeLIR (Generic Low-Level Intermediate Representation) auf. Als Schwerpunkt dieser Arbeit wird ein Codegenerator vorgestellt, der in der Lage ist, eine graphbasierte Codeselektion durchzuführen und zusätzlich die Phasen der Codeselektion, Instruktionsanordnung (einschließlich Kompaktierung) und Registerallokation im Sinne einer Phasenkopplung simultan löst. Da dies die Lösung eines NP-harten Optimierungsproblems darstellt, ist dem Codegenerator ein Optimierungsverfahren auf Basis eines genetischen Algorithmus zugrunde gelegt. Zusätzlich werden bei der Durchführung der Teilaufgaben Codeselektion, Instruktionsauswahl und Registerallokation bereits Wechselwirkungen mit der nachfolgend durchgeführten Adresscode-Generierung berücksichtigt. Aufgrund der flexiblen Spezifikationsmöglichkeit von Kostenfunktionen in genetischen Optimierungsverfahren ist der Codegenerator unter Verwendung eines Energiekostenmodells in der Lage, eine energieeffiziente Auswahl und Anordnung von Instruktionen durchzuführen. Als weiterer Schwerpunkt werden Optimierungsverfahren zur effektiven Ausnutzung der parallelen Datenpfade und von SIMD-Speicherzugriffen vorgestellt. Mit der Integration des Energiekostenmodells in den Codegenerator und den Simulator wird dabei mit dieser Arbeit erstmalig das Potential von SIMD-Operationen hinsichtlich der energieeffizienten Ausführung von DSP-Programmen compilerunterstützt untersucht. Durch die beispielhafte Implementierung der Techniken für eine DSP-Architektur und die Retargierung des genetischen Codegenerators auf einen weiteren DSP wird die Anwendbarkeit für reale Prozessoren gezeigt.

2003-06-03T00:00:00Z Untersuchung des Energieeinsparungspotenzials in eingebetteten Systemen durch energieoptimierende Compilertechnik Steinke, Stefan http://hdl.handle.net/2003/2769 2015-08-13T00:01:13Z 2003-01-27T00:00:00Z

Title: Untersuchung des Energieeinsparungspotenzials in eingebetteten Systemen durch energieoptimierende Compilertechnik Authors: Steinke, Stefan Abstract: In der Arbeitswelt und in der Freizeit hat die Nutzung von mobilen elektronischen Geräten wie Handys oder PDAs in den letzten Jahren stark zugenommen. Die Funktionen dieser Geräte nehmen sowohl in der Anzahl als auch in der Komplexität weiter zu, wodurch die Kapazitätsgrenze der Akkus häufiger erreicht wird. Dies schränkt die Anwender ein und führt zu der Motivation, den Energieverbrauch zu reduzieren. Außerdem sind andere neue mobile Applikationen zukünftig nur realisierbar, nachdem der Energieverbrauch vorab weiter reduziert wurde. Neben der bekannten Optimierung der Hardware der Geräte auf Energieverbrauch liefert der steigende Anteil der Software ein neues Potenzial zur Energieeinsparung. Das Ziel dieser Arbeit ist die systematische Untersuchung dieses Energieeinsparungspotenzials bei der Ausführung der Applikationssoftware, welches durch modifizierte oder neue Compilertechniken erreicht werden kann.Zu Beginn der Arbeit werden die Grundlagen des Energieverbrauchs untersucht und daraus Ansatzpunkte für die Energiereduzierung durch Software entwickelt. Innerhalb des betrachteten Entwurfsablaufs eingebetteter Systeme liefert die Phase der SW-Synthese die Möglichkeit, Einfluss auf den generierten Maschinencode zu nehmen. Im Compiler liegen ausreichende Informationen zur Abschätzung des späteren Energiebedarfs vor, wenn ein entsprechendes Energiemodell integriert wird. Das in dieser Arbeit neu vorgestellte Energiemodell berücksichtigt die Unterschiede im Energieverbrauch in Abhängigkeit von den ausgeführten Instruktionen, ihren verwendeten Funktionseinheiten, den Zugriffen auf verschiedene Speicher sowie den Bitmustern der über Busse transportierten Daten. Diese Eigenschaften sind eine notwendige Voraussetzung zur umfassenden Untersuchung des Potenzials bei der Codegenerierung.Die verschiedenen Bestandteile und Phasen eines Compilers werden auf der Basis dieses Energiemodells systematisch betrachtet und auf ihr Einsparungspotenzial und die mögliche Integration des Optimierungsziels des Energieverbrauchs hin untersucht. Die Phasen im Front-End des Compilers bieten wenig Ansatzpunkte, da noch kein Bezug zu den Maschineninstruktionen und dem jeweiligen Energieverbrauch hergestellt werden kann. Den Schwerpunkt bilden somit die Phasen im Back-End mit der Instruktionsauswahl, der Instruktionsanordnung, der Registerallokation und den maschinenabhängigen Optimierungen.Im Detail werden die Phasen und Optimierungen betrachtet, in denen der Energieverbrauch einen Einfluss auf die Verarbeitung hat und die energiesparenden Optimierungen ausführlich beschrieben, die den größten Effekt aufzeigen. Insbesondere die Zugriffe auf den Speicher weisen einen hohen Anteil am Gesamtenergieverbrauch auf, so dass sich hieraus ein großes Potenzial ergibt. Daher bilden Optimierungen zur effizienteren Nutzung des Speichers den Schwerpunkt der Untersuchungen. Neben der Anwendung bekannter Optimierungen zur effizienteren Nutzung der Prozessorregister werden neue Optimierungen vorgestellt, die eine effiziente Nutzung kleiner, frei adressierbarer Onchip-Speicher unterstützen. Die bisher eingesetzten Caches beinhalten eine Hardwaresteuerung zum Einlagern von häufig verwendeten Programmteilen und Daten. Dieser Mechanismus kann die Programmausführung nennenswert beschleunigen, verbraucht aber in der zusätzlichen Hardware relativ viel Energie für häufige Adressvergleiche. Die Einbeziehung der während des Compilerlaufs vorliegenden Informationen bei der Entscheidung für die Programmteile und Daten, die in den Onchip-Speicher verlagert werden, bietet ein hohes Energieeinsparungspotenzial. Das dafür notwendige Verfahren wird sowohl als statische Variante mit einer festen Zuordnung von Programmteilen und Daten zum Hauptspeicher und Onchip-Speicher beschrieben als auch in einer erweiterten Variante mit integriertem Umkopieren der Blöcke während des Programmablaufs.Als Abschluss der Arbeit wird untersucht, wie alternative Codierungen auf Bussen zur Reduzierung des Energieverbrauchs genutzt werden können.Insgesamt konnte mit dieser Arbeit das Energieeinsparungspotenzial durch einen Compiler in seinen jeweiligen Phasen aufgezeigt werden, sowie neue Techniken, die die Speicherzugriffe effizienter generieren, vorgestellt werden. Der Energieverbrauch einer Applikation lässt sich dadurch in den betrachteten Fallbeispielen um ca. 50% gegenüber heute eingesetzten Systemen reduzieren.

2003-01-27T00:00:00Z Constraintbasierte Codegenerierung für eingebettete Prozessoren Bashford, Steven http://hdl.handle.net/2003/2768 2015-08-13T02:18:03Z 2001-10-25T00:00:00Z

Title: Constraintbasierte Codegenerierung für eingebettete Prozessoren Authors: Bashford, Steven Abstract: Eingebettete Systeme gewinnen zunehmenden Einfluss in vielen Bereichen unseres all-täglichen Lebens, wie z.B. in der Telekommunikation, Fahrzeugelektronik, Medizin-technik und in der Unterhaltungselektronik. Diese Systeme unterliegen strengen Rand-bedingungen, wie z.B. Realzeitanforderungen und Energieverbrauch. Beim Entwurf eingebetteter Systeme spielt die Realisierung möglichst vieler Systemkomponenten durch sogenannte eingebettete Prozessoren eine große Rolle. Diese können für eine Vielzahl von Systemen wiederverwendet werden, wodurch ein teurer und äußerst zeitaufwändiger Entwurfs-, Test- und Fertigungsprozess von dedizierter Hardware entfallen kann. Die Entwicklung von Software ermöglicht wesentlich schnellere De-signprozesse, und man gewinnt weiterhin ein hohes Maß an Flexibilität, da Designfehler noch in einer späten Entwurfsphase behoben werden können.Bei der Entwicklung von Software besteht natürlich der Wunsch, moderne Hochsprachen einzusetzen. Das Problem, das hier auftritt, ist ein Mangel an guten Compilern, besonders im Bereich der digitalen Signalprozessoren. Traditionelle Compilertechniken sind nicht geeignet, die spezifischen Eigenschaften dieser Prozessoren effektiv auszunutzen. Die erforderliche Qualität des generierten Codes genügt bei weitem nicht den gestellten Randbedingungen der Systeme. Um trotzdem den Einsatz von Prozessoren zu ermöglichen, wird in der Entwicklung von Software häufig auf die Assemblerprogrammierung zurückgegriffen, was zu großen Nachteilen führt: Entwicklungszeiten und Phasen zum Testen und zur Fehlerkorrektur verlängern sich i.d.R. wesentlich, und die Wiederverwendung von Software ist bei einem Prozessorwechsel kaum noch möglich.Ziel dieser Arbeit ist die Entwicklung neuer Compilertechniken für eingebettet Prozessoren, wobei der Schwerpunkt auf digitalen Signalprozessoren liegt, die hochgradig irreguläre Datenpfade mit eingeschränkter Parallelausführung auf Instruktionsebene besitzen. Ziel ist die Generierung einer sehr hohen Codequalität bzgl. der Ausführungsgeschwindigkeit und der Codegröße. Bei der Entwicklung neuer Techniken wird verstärkt auf die Integration von Teilphasen der Codegenerierung, im Sinne einer Phasenkopplung, hingezielt. Weiterhin spielt die Einbeziehung graphbasierter Techniken zur Instruktionsauswahl eine bedeutende Rolle. Um diesen Anforderungen gerecht zu werden, ist zur handhabbaren Umsetzung entsprechender Compiler-techniken der Einsatz neuer Programmierungs- und Optimierungsmethoden unbedingt notwendig. In dieser Arbeit werden Techniken auf der Basis der Constraint-Logikprogrammierung (CLP) entworfen und realisiert. Es wird gezeigt, in welchem Maße der Einsatz von CLP in diesem Problembereich geeignet ist. Ein weiteres Ziel ist der Entwurf von Konzepten, die eine schnelle Adaption von Compilern an neue Prozessoren erlauben.

2001-10-25T00:00:00Z System level modeling and design with the SpecC language Dömer, Rainer http://hdl.handle.net/2003/2767 2015-08-13T00:51:00Z 2000-04-11T00:00:00Z

Title: System level modeling and design with the SpecC language Authors: Dömer, Rainer Abstract: The semiconductor roadmap estimates the design complexity for digital systems to continue to increase according to Moore's law. In the next years, embedded systems with 10ths of millions of transistors on one chip will be standard technology. SystemonChip (SOC) designs will integrate processor cores, memories and special purpose custom logic into a complete system fitting on a single die. However, the increased complexity of SOC designs requires more effort, more efficient tools and new methodologies. Increasing the design time is not an option due to market pressures. Systemlevel design reduces the complexity of the design models by raising the level of abstraction. Starting from an abstract specification model, the system is stepwise refined with the help of computeraided design (CAD) tools. Using codesign techniques, the system is partitioned into hardware and software parts and finally implemented on a target architecture. Established design methodologies for behavioral synthesis and standard software design are utilized. However, moving to higher abstraction levels is not sufficient. The key to cope with the complexity involved with SOC designs is the reuse of Intellectual Property (IP). The integration of complex components, which are predesigned and welltested, drastically reduces the design complexity and, thus, saves design time and allows a shorter timetomarket. Since the idea of IP reuse promises great benefits, it must become an integral part in the system design methodology. Furthermore, the use of IP components must be directly supported by the design models, the tools and the languages being used throughout the design process. For example, it must be easy to insert and replace IP components in the design model (``plugandplay''). This work addresses the main issues in SOC design, namely the system design methodology, systemlevel modeling, and the specification language. First, an IPcentric system design methodology is proposed which is based on the reuse of IP. It allows the reuse and integration of IP components at any level and at any time during the design process. Starting with an abstract executable specification of the system, architecture exploration and communication synthesis are performed in order to map the design model onto the target architecture. At any stage, the systems functionality and its characteristics can be evaluated and validated. The model being used in the methodology to represent the system must meet system design requirements. It must be suitable to represent abstract properties at early stages as well as specific details about design decisions later in the design process. In order to support IP, the model must clearly separate communication from computation. In this work, a hierarchical model is described which encapsu lates computation and communication in separate entities, namely behaviors and channels. This model naturally supports reuse, integration and protection of IP. In order to formally describe a design model, a language should be used which directly represents the properties and characteristics of the model. This work presents a newly developed language, called SpecC, which allows to map modeling concepts onto language constructs in a one to one fashion. Unlike other systemlevel languages, the SpecC language precisely covers the unique requirements for embedded systems design in an orthogonal manner. Built on top of the C language, the defacto standard for software development, SpecC supports additional concepts needed in hardware design and allows IPcentric modeling. Recently, the SpecC language has been proposed as a standard systemlevel language for adoption in industry by some of Japan's toptier electronics and semiconductor companies. The proposed methodology and the SpecC language have been implemented in the SpecC design environment. In a graphical framework, the SpecC design environment integrates a set of CAD tools which support systemlevel modeling, design validation, design space exploration, and (semi) automatic refinement. The framework and all tools rely on a powerful, central design representation, the SpecC Internal Representation (SIR). Using the SpecC design environment, the IPcentric methodology has been successfully applied to several designs of industrial size, including a GSM vocoder used in mobile telecommunication.

2000-04-11T00:00:00Z Novel Code Optimization Techniques for DSPs Leupers, Rainer http://hdl.handle.net/2003/2765 2015-08-12T19:07:02Z 1998-07-02T00:00:00Z

Title: Novel Code Optimization Techniques for DSPs Authors: Leupers, Rainer Abstract: Software development for DSPs is frequently a bottleneck in the system design process, due to the poor code quality delivered by many current C compilers. As a consequence, most of the DSP software still has to be written manually in assembly language. In order to overcome this problem, new DSP-specific code optimization techniques are required, which, in contrast to classical compiler technology, take the detailed processor architecture sufficiently into account. This paper describes several new DSP code optimization techniques: maximum utilization of parallel address generation units, exploitation of instruction-level parallelism through exact code compaction, and optimized code generation for IF-statements by means of conditional instructions. Experimental results indicate significant improvements in code quality as compared to existing compilers.

1998-07-02T00:00:00Z Retargierbare Codeerzeugung für digitale Signalprozessoren Leupers, Rainer http://hdl.handle.net/2003/2766 2016-02-02T14:13:39Z 1998-07-02T00:00:00Z

Title: Retargierbare Codeerzeugung für digitale Signalprozessoren Authors: Leupers, Rainer Abstract: Digitale Signalprozessoren (DSPs) sind programmierbare Bausteine mit speziellen, für rechenintensive Anwendungen optimierten Befehlssätzen, welche vor allem zur Signalverarbeitung unter Echtzeitbedingungen eingesetzt werden. Aufgrund fehlender DSP-spezifischer Optimierungstechniken erzeugen derzeitige Hochsprachen-Compiler für DSPs meist sehr schlechten Code, so daß der Großteil der DSP-Software auch heute noch zeitaufwendig in Assemblersprachen entwickelt werden muß. Dies bedeutet einen erheblichen Flaschenhals in der Entwicklung eingebetteter Systeme. In dieser Arbeit werden neue Compilertechniken vorgestellt, welche die besonderen Randbedingungen im DSP-Bereich berücksichtigen. Hierzu zählen Optimierungstechniken, welche die charakteristischen Hardware-Eigenschaften von DSPs (u.a. spezialisierte Register, parallele Maschinenbefehle, separate Adreßrecheneinheiten) zur Verbesserung der Codequalität ausnutzen, mit dem Ziel, den Einsatz von Compilern auch im DSP-Bereich zu ermöglichen. Gleichzeitig sind diese Techniken hinreichend allgemein gehalten, um auf eine ganze Klasse von DSPs anwendbar zu sein. Diese Eigenschaft wird als Retargierbarkeit bezeichnet. Retargierbare Compiler helfen bei der Optimierung von Prozessorarchitekturen für gegebene Anwendungen. Das in dieser Arbeit vorgestellte Compilersystem RECORD ermöglicht die automatische Anpassung von Compilern an neue Prozessoren auf der Basis von Prozessormodellen, die in einer Hardware-Beschreibungssprache spezifiziert sind. Hierdurch wird die notwendige Brücke zwischen dem Compilerbau und dem computergestützten Entwurf integrierter Schaltungen geschlagen. Experimentelle Ergebnisse für realistische Prozessoren zeigen die praktische Anwendbarkeit der vorgestellten Techniken.

1998-07-02T00:00:00Z Synthesis of Communicating Controllers for Concurrent Hardware/Software Systems Marwedel, Peter Niemann, Ralf http://hdl.handle.net/2003/2764 2015-08-12T19:06:49Z 1998-07-02T00:00:00Z

Title: Synthesis of Communicating Controllers for Concurrent Hardware/Software Systems Authors: Marwedel, Peter; Niemann, Ralf Abstract: Two main aspects in hardware/software codesign are hardware/software partitioning and co-synthesis. Most codesign approaches work only on one of these problems. In this paper, an approach coupling hardware/software partitioning and co-synthesis will be presented, working fully-automatic. The techniques have been integrated in the codesign tool COOL (COdesign toOL) supporting the complete design flow from system specification to board-level implementation for multi-processor and multi-ASIC target architectures for data-flow dominated applications.

1998-07-02T00:00:00Z Optimized Array Index Computation in DSP Programs Basu, Anupam Leupers, Rainer Marwedel, Peter http://hdl.handle.net/2003/2763 2015-08-12T19:07:09Z 1998-07-02T00:00:00Z

Title: Optimized Array Index Computation in DSP Programs Authors: Basu, Anupam; Leupers, Rainer; Marwedel, Peter Abstract: An increasing number of components in embedded systems are implemented by software running on embedded processors. This trend creates a need for compilers for embedded processors capable of generating high quality machine code. Particularly for DSPs, such compilers are hardly available, and novel DSP-specific code optimization techniques are required. In this paper we focus on efficient address computation for array accesses in loops. Based on previous work, we present a new and optimal algorithm for address register allocation and provide an experimental evaluation of different algorithms. Furthermore, an efficient and close-to-optimum heuristic is proposed for large problems.

1998-07-02T00:00:00Z Retargetable Code Generation based on Structural Processor Descriptions Leupers, Rainer Marwedel, Peter http://hdl.handle.net/2003/2762 2015-08-12T19:07:04Z 1998-07-02T00:00:00Z

Title: Retargetable Code Generation based on Structural Processor Descriptions Authors: Leupers, Rainer; Marwedel, Peter Abstract: Design automation for embedded systems comprising both hardware and software components demands for code generators integrated into electronic CAD systems. These code generators provide the necessary link between software synthesis tools in HW/SW codesign systems and embedded processors. General-purpose compilers for standard processors are often insufficient, because they do not provide flexibility with respect to different target processors and also suffer from inferior code quality. While recent research on code generation for embedded processors has primarily focussed on code quality issues, in this contribution we emphasize the importance of retargetability, and we describe an approach to achieve retargetability. We propose usage of uniform, external target processor models in code generation, which describe embedded processors by means of RT-level netlists. Such structural models incorporate more hardware details than purely behavioral models, thereby permitting a close link to hardware design tools and fast adaptation to different target processors. The MSSQ compiler, which is part of the MIMOLA hardware design system, operates on structural models. We describe input formats, central data structures, and code generation techniques in MSSQ. The compiler has been successfully retargeted to a number of real-life processors, which proves feasibility of our approach with respect to retargetability. We discuss capabilities and limitations of MSSQ, and identify possible areas of improvement.

1998-07-02T00:00:00Z Interface Synthesis for Embedded Applications in a Codesign Environment Basu, Anupam Marwedel, Peter Mitra, Raj S. http://hdl.handle.net/2003/2761 2015-08-13T00:09:57Z 1998-07-02T00:00:00Z

Title: Interface Synthesis for Embedded Applications in a Codesign Environment Authors: Basu, Anupam; Marwedel, Peter; Mitra, Raj S. Abstract: In embedded systems, programmable peripherals are often coupled with the main programmable processor to achieve desired functionality. Interfacing such peripherals with the processor qualifies as an important task of hardware software codesign. In this paper, three important aspects of such interfacing, namely the allocation of addresses to the devices, allocation of device drivers, and approaches to handle events and transitions have been discussed. The proposed approaches have been incorporated in a codesign system MICKEY. The paper includes a number of examples, taken from the results synthesized by MICKEY, to illustrate the ideas.

1998-07-02T00:00:00Z Register-Constrained Address Computation in DSP Programs Basu, Anupam Leupers, Rainer Marwedel, Peter http://hdl.handle.net/2003/2760 2015-08-12T18:04:07Z 1998-07-02T00:00:00Z

Title: Register-Constrained Address Computation in DSP Programs Authors: Basu, Anupam; Leupers, Rainer; Marwedel, Peter Abstract: This paper describes a new code optimization technique for digital signal processors (DSPs). One important characteristic of DSP algorithms are iterative accesses to data array elements within loops. DSPs support efficient address computations for such array accesses by means of dedicated address generation units (AGUs). We present a heuristic technique which, given an AGU with a fixed number of address registers, minimizes the number of instructions needed for array address computations in a program loop.

1998-07-02T00:00:00Z Processor-Core Based Design and Test Marwedel, Peter http://hdl.handle.net/2003/2758 2015-08-12T19:07:00Z 1998-07-04T00:00:00Z

Title: Processor-Core Based Design and Test Authors: Marwedel, Peter Abstract: This tutorial responds to the rapidly increasing use of various cores for implementing systems-on-a-chip. It specifically focusses on processor cores. We will give some examples of cores, including DSP cores and application-specific instruction-set processors (ASIPs). We will mention market trends for these components, and we will touch design procedures, in particular the use compilers. Finally, we will discuss the problem of testing core-based designs. Existing solutions include boundary scan, embedded in-circuit emulation (ICE), the use of processor resources for stimuli/response compaction and self-test programs.

1998-07-04T00:00:00Z An Algorithm for Hardware/Software Partitioning Using Mixed Integer Linear Marwedel, Peter Niemann, Ralf http://hdl.handle.net/2003/2759 2015-08-12T19:06:40Z 1998-07-04T00:00:00Z

Title: An Algorithm for Hardware/Software Partitioning Using Mixed Integer Linear Authors: Marwedel, Peter; Niemann, Ralf Abstract: One of the key problems in hardware/software codesign is hardware/software partitioning. This paper describes a new approach to hardware/software partitioning using integer programming (IP). The advantage of using IP is that optimal results are calculated for a chosen objective function. The partitioning approach works fully automatic and supports multi-processor systems, interfacing and hardware sharing. In contrast to other approaches where special estimators are used, we use compilation and synthesis tools for cost estimation. The increased time for calculating values for the cost metrics is compensated by an improved quality of the values. Therefore, fewer iteration steps for partitioning are needed. The paper presents an algorithm using integer programming for solving the hardware/software partitioning problem leading to promising results.

1998-07-04T00:00:00Z Compilers for Embedded Processors Marwedel, Peter http://hdl.handle.net/2003/2757 2015-08-12T20:20:17Z 1998-07-04T00:00:00Z

Title: Compilers for Embedded Processors Authors: Marwedel, Peter Abstract: This talk responds to the rapidly increasing use of embedded processors for implementing systems. Such processors come in the form of discrete processors as well as in the form of core processors. They are available both from vendors and within system companies. Applications can be found in most segments of the embedded system market, such as automotive electronics and telecommunications. These applications demand for extremely efficient processor architectures, optimized for a certain application domain or even a certain application. Current compiler technology supports these architectures very poorly and has recently been recognized as a major bottleneck for designing systems quickly, efficiently and reliably. A number of recent research projects aim at removing this bottleneck. The talk will briefly discuss the trend towards embedded processors. We will show market trends and examples of recent embedded processors. We will also introduce the terms "application specific instruction-set processors" (ASIPs), "application-specific signal processors" (ASSPs), "soft cores" and "hard cores". We will then present new code optimization approaches taking the special characterstics of embedded processor architectures into account. In particular, we will present new memory allocation and code compaction algorithms. In the final section of the talk, we will present techniques for retargeting compilers to new architectures easily. These techniques are motivated by the need for domain- or application-dependent optimizations of processor architectures. The scope for such optimizations should not be restricted to hardware architectures but has to include the corresponding work on compilers as well. We will show, how compilers can be generated from descriptions of processor architectures. Presented techniques aim at bridging the gap between electronic CAD and compiler generation.

1998-07-04T00:00:00Z Code Generation for Core Processors Marwedel, Peter http://hdl.handle.net/2003/2756 2015-08-12T19:06:55Z 1998-07-04T00:00:00Z

Title: Code Generation for Core Processors Authors: Marwedel, Peter Abstract: This tutorial responds to the rapidly increasing use of cores in general and of processor cores in particular for implementing systems-on-a-chip. In the first part of this text, we will provide a brief introduction to various cores. Applications can be found in most segments of the embedded systems market. These applications demand for extreme efficiency, and in particular for efficient processor architectures and for efficient embedded software. In the second part of this text, we will show that current compilers do not provide the required efficiency and we will give an overview over new compiler optimization techniques, which aim at making assembly language programming for embedded software obsolete. These new techniques take advantage of the special characteristics of embedded software and embedded architectures. Due to efficiency considerations, processor architectures optimized for application domains or even for particular applications are of interest. This results in a large number of architectures and instruction sets, leading to the requirement for retargeting compilers to those numerous architectures. In the final section of the tutorial, we will present techniques for retargeting compilers to new architectures easily. We will show, how compilers can be generated from descriptions of processors. One of the approaches closes the gap which so far existed between electronic CAD and compiler generation.

1998-07-04T00:00:00Z Introducing Complex Components into Architectural Synthesis Dömer, Rainer Landwehr, Birger Marwedel, Peter http://hdl.handle.net/2003/2755 2015-08-12T20:20:15Z 1998-07-04T00:00:00Z

Title: Introducing Complex Components into Architectural Synthesis Authors: Dömer, Rainer; Landwehr, Birger; Marwedel, Peter Abstract: In this paper, we extend the set of library components which are usually considered in architectural synthesis by components with built-in chaining. For such components, the result of some internally computed arithmetic function is made available as an argument to some other function through a local connection. These components can be used to implement chaining in a data-path in a single component. Components with built-in chaining are combinatorial circuits. They correspond to ``complex gates in logic synthesis. If compared to implementations with several components, components with built-in chaining usually provide a denser layout, reduced power consumption, and a shorter delay time. Multiplier/accumulators are the most prominent example of such components. Such components require new approaches for library mapping in architectural synthesis. In this paper, we describe an IP-based approach taken in our OSCAR synthesis system.

1998-07-04T00:00:00Z Time-Constrained Code Compaction for DSPs Leupers, Rainer Marwedel, Peter http://hdl.handle.net/2003/2754 2015-08-12T20:20:08Z 1998-07-04T00:00:00Z

Title: Time-Constrained Code Compaction for DSPs Authors: Leupers, Rainer; Marwedel, Peter Abstract: This paper addresses instruction-level parallelism in code generation for DSPs. In presence of potential parallelism, the task of code generation includes code compaction, which parallelizes primitive processor operations under given dependency and resource constraints. Furthermore, DSP algorithms in most cases are required to guarantee real-time response. Since the exact execution speed of a DSP program is only known after compaction, real-time constraints should be taken into account during the compaction phase. While previous DSP code generators rely on rigid heuristics for compaction, we propose a novel approach to exact local code compaction based on an Integer Programming model, which handles time constraints. Due to a general problem formulation, the IP model also captures encoding restrictions and handles instructions having alternative encodings and side effects, and therefore applies to a large class of instruction formats. Capabilities and limitations of our approach are discussed for different DSPs.

1998-07-04T00:00:00Z