Eldorado Community:http://hdl.handle.net/2003/742024-03-29T04:58:44Z2024-03-29T04:58:44ZAnalyses and optimizations of timing-constrained embedded systems considering resource synchronization and machine learning approachesShi, Junjiehttp://hdl.handle.net/2003/422792024-01-22T12:24:22Z2023-01-01T00:00:00ZTitle: Analyses and optimizations of timing-constrained embedded systems considering resource synchronization and machine learning approaches
Authors: Shi, Junjie
Abstract: Nowadays, embedded systems have become ubiquitous, powering a vast array of applications from consumer electronics to industrial automation. Concurrently, statistical and machine learning algorithms are being increasingly adopted across various application domains, such as medical diagnosis, autonomous driving, and environmental analysis, offering sophisticated data analysis and decision-making capabilities. As the demand for intelligent and time-sensitive applications continues to surge, accompanied by growing concerns regarding data privacy, the deployment of machine learning models on embedded devices has emerged as an indispensable requirement. However, this integration introduces both significant opportunities for performance enhancement and complex challenges in deployment optimization.
On the one hand, deploying machine learning models on embedded systems with limited computational capacity, power budgets, and stringent timing requirements necessitates additional adjustments to ensure optimal performance and meet the imposed timing constraints. On the other hand, the inherent capabilities of machine learning, such as self-adaptation during runtime, prove invaluable in addressing challenges encountered in embedded systems, aiding in optimization and decision-making processes.
This dissertation introduces two primary modifications for the analyses and optimizations of timing-constrained embedded systems. For one thing, it addresses the relatively long access times required for shared resources of machine learning tasks. For another, it considers the limited communication resources and data privacy concerns in distributed embedded systems when deploying machine learning models. Additionally, this work provides a use case that employs a machine learning method to tackle challenges specific to embedded systems.
By addressing these key aspects, this dissertation contributes to the analysis and optimization of timing-constrained embedded systems, considering resource synchronization and machine learning models to enable improved performance and efficiency in real-time applications with stringent constraints.2023-01-01T00:00:00ZComplex scheduling models and analyses for property-based real-time embedded systemsUeter, Niklashttp://hdl.handle.net/2003/422122023-12-05T12:53:03Z2023-01-01T00:00:00ZTitle: Complex scheduling models and analyses for property-based real-time embedded systems
Authors: Ueter, Niklas
Abstract: Modern multi core architectures and parallel applications
pose a significant challenge to the worst-case centric real-time system verification
and design efforts.
The involved model and parameter uncertainty contest the fidelity of formal real-time analyses,
which are mostly based on exact model assumptions.
In this dissertation, various approaches that can accept parameter and model uncertainty
are presented.
In an attempt to improve predictability in worst-case centric analyses, the exploration of timing predictable protocols
are examined for parallel task scheduling on multiprocessors and network-on-chip arbitration.
A novel scheduling algorithm, called stationary rigid gang scheduling, for gang tasks on multiprocessors is proposed.
In regard to fixed-priority wormhole-switched network-on-chips, a more restrictive family of transmission protocols called
simultaneous progression switching protocols is proposed with predictability enhancing properties.
Moreover, hierarchical scheduling for parallel DAG tasks under parameter
uncertainty is studied to achieve temporal- and spatial isolation.
Fault-tolerance as a supplementary reliability aspect of real-time systems
is examined, in spite of dynamic external causes of fault.
Using various job variants, which trade off increased execution time demand with increased error protection,
a state-based policy selection strategy is proposed, which provably assures an acceptable quality-of-service (QoS).
Lastly, the temporal misalignment of sensor data in sensor fusion applications
in cyber-physical systems is examined. A modular analysis based on minimal properties to obtain an upper-bound for the
maximal sensor data time-stamp difference is proposed.2023-01-01T00:00:00ZTransfer learning for multi-channel time-series Human Activity RecognitionMoya Rueda, Fernandohttp://hdl.handle.net/2003/422082023-11-30T23:13:05Z2023-01-01T00:00:00ZTitle: Transfer learning for multi-channel time-series Human Activity Recognition
Authors: Moya Rueda, Fernando
Abstract: Abstract for the PHD Thesis
Transfer Learning for Multi-Channel Time-Series Human Activity Recognition
Methods of human activity recognition (HAR) have been developed for the purpose of automatically classifying recordings of human movements into a set of activities. Capturing, evaluating, and analysing sequential data to recognise human activities accurately is critical for many applications in pervasive and ubiquitous computing applications, e.g., in applications such as mobile- or ambient-assisted living, smart-homes, activities of daily living, health support and rehabilitation, sports, automotive surveillance, and industry 4.0. For example, HAR is particularly interesting for optimisation in those industries where manual work remains dominant.
HAR takes as inputs signals from videos or from multi-channel time-series, e.g., human joint measurements from marker-based motion capturing systems and inertial measurements measured by wearables or on-body devices. Wearables have become relevant as they extend the potential of HAR beyond constrained or laboratory settings. This thesis focuses on HAR using multi-channel time-series.
Multi-channel Time-Series HAR is, in general, a challenging classification task. This is because human activities and movements show a large variation. Humans carry out in similar manner activities that are semantically very distinctive; conversely, they carry out similar activities in many different ways. Furthermore, multi-channel Time-Series HAR datasets suffer from the class unbalance problem, with more samples of certain activities than others. This problem strongly depends on the annotation. Moreover, there are non-standard definitions of human activities for annotation.
Methods based on Deep Neural Networks (DNNs) are prevalent for Multi-channel Time-Series HAR. Nevertheless, the performance of DNNs has not significantly increased compared to as other fields such as image classification or segmentation. DNNs present a low sample efficiency as they learn the temporal structure from activities completely from data. Considering supervised DNNs, the scarcity of annotated data is the primary concern. Annotated data from human behaviour is scarce and costly to obtain. The annotation process demands enormous resources. Additionally, annotation reliability varies because they can be subject to human errors or unclear and non-elaborated annotation protocols.
Transfer learning has been used to cope with a limited amount of annotated data, overfitting, zero-shot learning or classification of unseen human activities, and the class-unbalance problem. Transfer learning can alleviate the problem of scarcity of annotated data. Learnt parameters and feature representations from a specific source domain are transferred to a target domain. Transfer learning extends the usability of large annotated data from source domains to related problems.
This thesis proposes a general transfer learning approach to improve automatic multi-channel Time-Series HAR. The proposed transfer learning method combines a semantic attribute representation of activities and a specific deep neural network. It handles situations where the source and target domains differ, i.e., the sensor space and the set of activities change, without needing a large amount of annotated data from the target domain.
The method considers different levels of transferability. First, an architecture handles a variate of dataset configurations in regard to the number of devices and their type; it creates fixed-size representations of sensor recordings that are representative of the human limbs. These networks will process sequences of movements from the human limbs, either from poses or inertial measurements. Second, it introduces a search of semantic attribute representations that favourably represent signal segments for recognising human activities in unknown scenarios, as they only consider annotations of activities, and they lack human-annotated semantic attributes. And third, it covers transferability from data of a variety of source datasets. The method takes advantage of a large human-pose dataset as a source domain, which is created during the develop of this thesis. Furthermore, synthetic-inertial measurements will be derived from sequences of human poses either from a marker-based motion capturing system or video-based HAR and pose-based HAR datasets. The latter will specifically use the annotations of pixel-coordinate of human poses as multi-channel time-series data. Real inertial measurements and these synthetic measurements will then be deployed as a source domain for parameter transfer learning.
Experimentation on different target datasets demonstrates that the proposed transfer learning method improves performance, most evidently when deploying a proportion of their training material. This outcome suggests that the temporal convolutional filters are rather general as they learn local temporal relations of human movements related to the semantic attributes, independent of the number of devices and their type. A human-limb-oriented deep architecture and an evolutionary algorithm provide an out-of-the-shelf predictor of semantic attributes that can be deployed directly on a new target scenario. Very related problems can directly be addressed by manually giving the attribute-to-activity relations without the need for a search throughout an evolutionary algorithm. Besides, the learnt convolutional filters are activity class dependent. Hence, the classification performance on the activities shared among the datasets improves.2023-01-01T00:00:00ZMemory carousel: LLVM-based bitwise wear leveling for nonvolatile main memoryHölscher, NilsHakert, ChristianNassar, HassanChen, Kuan-HsunBauer, LarsChen, Jian-JiaHenkel, Jörghttp://hdl.handle.net/2003/421312023-10-11T22:12:59Z2022-12-14T00:00:00ZTitle: Memory carousel: LLVM-based bitwise wear leveling for nonvolatile main memory
Authors: Hölscher, Nils; Hakert, Christian; Nassar, Hassan; Chen, Kuan-Hsun; Bauer, Lars; Chen, Jian-Jia; Henkel, Jörg
Abstract: Emerging nonvolatile memory yields, alongside many advantages, technical shortcomings, such as reduced cell lifetime. Although many wear-leveling approaches exist to extend the lifetime of such memories, usually a tradeoff for the granularity of wear leveling has to be made. Due to iterative write schemes (repeatedly sense and write), wear out of memory in certain systems is directly dependent on the written bit value and thus can be highly imbalanced, requiring dedicated bit-wise wear leveling. Such a bit-wise wear leveling so far has only be proposed together with a special hardware support. However, if no dedicated hardware solutions are available, especially for commercial off-the-shelf systems with nonvolatile memories, a software solution can be crucial for the system lifetime. In this work, we propose entirely software-based bit-wise wear leveling, where the position of bits within CPU words in the main memory is rotated on a regular basis. We leverage the LLVM intermediate representation to adjust load and store operations of the application with a custom compiler pass. Experimental evaluation shows that the lifetime by applying local rotation within the CPU word can be extended by a factor of up to 21× . We also show that our method can incorporate with coarser-grained wear leveling, e.g., on block granularity and assist achievement of higher lifetime improvements.2022-12-14T00:00:00ZAssessing the reliability of deep neural networksOberdiek, Philipphttp://hdl.handle.net/2003/421012023-09-19T22:12:59Z2023-01-01T00:00:00ZTitle: Assessing the reliability of deep neural networks
Authors: Oberdiek, Philipp
Abstract: Deep Neural Networks (DNNs) have achieved astonishing results in the last two decades, fueled by ever larger datasets and the availability of high performance compute hardware.
This led to breakthroughs in many applications such as image and speech recognition, natural language processing, autonomous driving, and drug discovery.
Despite their success, the understanding of internal workings and the interpretability of predictions remains limited and DNNs are often treated as "black boxes".
Especially for safety-critical applications where the well-being of humans is at risk, decisions based on predictions should consider associated uncertainties.
Autonomous vehicles, for example, operate in a highly complex environment with potentially unpredictable situations that can lead to safety risks for pedestrians and other road users.
In medical applications, decision based on incorrect predictions can have serious consequences for a patient's health.
As a consequence, the topic of Uncertainty Quantification (UQ) has received increasing attention in recent years.
The goal of UQ is to assign uncertainties to predictions in order to ensure the decision-making process is informed by potentially unreliable predictions.
In addition, other tasks such as identifying model weaknesses, collecting additional data or detecting malicious attacks can be supported by uncertainty estimates.
Unfortunately, UQ for DNNs is a particularly challenging task due to their high complexity and nonlinearity.
Uncertainties which can be derived from traditional statistical models are often not directly applicable to DNNs.
Therefore, the development of new UQ techniques for DNNs is of paramount importance to ensure safety-aware decision-making.
This thesis evaluates existing UQ methods and proposes improvements and novel approaches which contribute to the reliability and trustworthiness of modern deep learning methodology.
One of the core contributions of this work is the development of a novel generative learning framework with an integrated training of a One-vs-All (OvA) classifier.
A Generative Adversarial Network (GAN) is trained in such a way that it is possible to sample from the boundary of the training distribution.
These boundary samples are shielding the training dataset from the Out-of-Distribution (OoD) region.
By making the GAN class-conditional, it is possible to shield each class separately, which integrates well with the formulation of an OvA classifier.
The OvA classifier achieves outstanding results on the task of OoD detection and surpasses many previous works by large margins.
In addition, the tight class shielding also improves the overall classification accuracy.
A comprehensive and consistent evaluation on the tasks of False Positive, Out-of-Distribution and Adversarial Example Detection on a diverse selection of datasets provides insights into the strengths and weaknesses of existing methods and the proposed approaches.2023-01-01T00:00:00ZSpecial issue on practical and robust design of real-time systemsChen, Jian-JiaShrivastava, Aviralhttp://hdl.handle.net/2003/417382023-06-13T22:13:16Z2022-08-20T00:00:00ZTitle: Special issue on practical and robust design of real-time systems
Authors: Chen, Jian-Jia; Shrivastava, Aviral2022-08-20T00:00:00ZMODES: model-based optimization on distributed embedded systemsShi, JunjieBian, JiangRichter, JakobChen, Kuan-HsunRahnenführer, JörgXiong, HaoyiChen, Jian-Jiahttp://hdl.handle.net/2003/408802022-04-26T22:12:34Z2021-06-04T00:00:00ZTitle: MODES: model-based optimization on distributed embedded systems
Authors: Shi, Junjie; Bian, Jiang; Richter, Jakob; Chen, Kuan-Hsun; Rahnenführer, Jörg; Xiong, Haoyi; Chen, Jian-Jia
Abstract: The predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to collect all the data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the time available for tuning. Model-Based Optimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) MODES-B considers the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, and (2) MODES-I considers all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate MODES by conducting experiments on the optimization for the hyper-parameters of a random forest and a multi-layer perceptron. The experimental results demonstrate that, with an improvement in terms of mean accuracy (MODES-B), run-time efficiency (MODES-I), and statistical stability for both modes, MODES outperforms the baseline, i.e., carry out tuning with MBO on each node individually with its local sub-data set.2021-06-04T00:00:00ZAufzeichnungsbasierte Analyse von Sperren in BetriebssystemenLochmann, Alexanderhttp://hdl.handle.net/2003/406422022-01-05T23:13:01Z2021-01-01T00:00:00ZTitle: Aufzeichnungsbasierte Analyse von Sperren in Betriebssystemen
Authors: Lochmann, Alexander
Abstract: Moderne Mehrkernbetriebssysteme bieten eine Vielzahl an Synchronisationsmechanismen. Sie dienen der Realisierung von feingranularem Sperren, um dem Betriebssystem wie auch den darauf laufenden Anwendungen zu erlauben, die Leistung von modernen Mehrkernprozessoren auszunutzen. Hierbei werden ganze Subsysteme, einzelne Datenstrukturen oder lediglich Teile einer Datenstruktur mit einer oder mehr Sperren abgesichert. Je vielfältiger die Mechanismen und je feingranularer das Sperren wird, desto fehleranfälliger kann ein Betriebssystem werden. Daher ist es immanent wichtig zu verstehen, wie die vorgenannten Synchronisationsmechanismen in einem Mehrkernbetriebssystem eingesetzt werden, um Synchronisationsfehler zu vermeiden.
Existierende Forschungsarbeiten in diesem Bereich befassen sich mit dem Auffinden von spezifischen Synchronisationsproblemen, wie z. B. der Detektion von Wettlaufsituationen um Speicherzugriffe. Sie detektieren Synchronisationsfehler allerdings nur im Nachhinein. Sie leiten aber keinerlei Sperren-Regeln ab, die Aussagen über das korrekte Absichern von Zugriffen machen könnten. So würden im Vorhinein Fehler vermieden.
Genau diese Lücke versucht die vorliegende Arbeit zu schließen. Daher befasst sie sich mit den Fragen, ob man a) mit der aufzeichnungsbasierten Analyse Erkenntnisse über das Synchronisationsverhalten in Mehrkehrbetriebssystemen erlangen kann, und, wie man b) mit diesen Erkenntnissen die Softwarequalität moderner Mehrkernbetriebssysteme verbessern kann.
Daraus ergeben sich folgende Forschungsbeiträge dieser Arbeit: Zunächst wird in dieser Arbeit der Entwurf des LockDoc-Ansatzes erläutert. Dieser umfasst das Aufzeichnen von Speicherzugriffen und Sperren-Operationen in einem Betriebssystemkern, während eine Arbeitslast ausgeführt wird. Daraus werden Zusammenhänge zwischen Zugriffen auf Datenstrukturen und Sperren-Operationen hergestellt. Dies lässt sich auf dreierlei Wegen nutzen: 1) Das Überprüfen der existierenden Sperren-Dokumentation, ob der Programmcode sich noch an die dokumentierten Regeln hält. 2) Das Ableiten von neuen Sperren-Regeln für verschiedene Datentypen. Aus diesen Daten lässt sich in einem weiteren Schritt eine neue Sperren-Dokumentation generieren. 3) Das Detektieren von Zugriffen, die nicht den abgeleiteten Regeln folgen. Die sogenannten Gegenbeispiele zeigen potentielle Synchronisationsfehler inkl. der Aufrufhierarchie sowie den tatsächlich gehaltenen Sperren an.
In dieser Arbeit wird der Ansatz im Rahmen von Fallstudien auf die Betriebssystemkerne von Linux und FreeBSD angewendet. Die Untersuchung erfolgt dabei nach den drei vorgenannten Zielen. Basierend auf den Untersuchung im Rahmen dieser Arbeit wurden fünf Änderungen am Linux-Kern seitens des Autors dieser Arbeit erstellt und durch die Entwicklergemeinde akzeptiert. Eine weitere Änderung wurde bereits für gut befunden, aber noch nicht akzeptiert. Die Ergebnisse dieser Arbeit führten ebenfalls zu einer Änderung an der Sperren-Dokumentation im FreeBSD-Kern. Außerdem wurde ein Synchronisationsfehler in FreeBSD aufgedeckt.2021-01-01T00:00:00ZSoftware fault injection and localization in embedded systemsGabor, Ulrich Thomashttp://hdl.handle.net/2003/402982021-07-07T22:12:31Z2021-01-01T00:00:00ZTitle: Software fault injection and localization in embedded systems
Authors: Gabor, Ulrich Thomas
Abstract: Injection and localization of software faults have been extensively researched, but the results are not directly transferable to embedded systems. The domain-specific constraints applying to these systems, such as limited resources and the predominant C/C++ programming languages, require a specific set of injection and localization techniques. In this thesis, we have assessed existing approaches and have contributed a set of novel methods for software fault injection and localization in embedded systems.
We have developed a method based on AspectC++ for the injection of errors at interfaces and a method based on Clang for the accurate injection of software faults directly into source code. Both approaches work particularly well in the context of embedded systems, because they do not require runtime support and modify binaries only when necessary. Nevertheless, they are suitable to inject software faults and errors into the software of other domains.
These contributions required a thorough assessment of fault injection techniques and fault models presented in literature over the years, which raised multiple questions regarding their validity in the context of C/C++. We found that macros (particularly header files), compile-time language constructs, and the commonly used optimization levels introduce a non-negligible bias to experimental results achieved by injection methods operating on any other layer than the source code. Additionally, we found that the textual specification of fault models is prone to ambiguities and misunderstandings. We have conceived an automatic fault classifier to solve this problem in a field study.
Regarding software fault localization, we have combined existing methods making use of program spectra and assertions, and have contributed a new oracle type for autonomous localization of software faults in the field. Our evaluation shows that this approach works particularly well in the context of embedded systems because the generated information can be processed in real-time and, therefore, it can run in an unsupervised manner.
Concluding, we assessed a variety of injection and localization approaches in the context of embedded systems and contributed novel methods where applicable improving the current state-of-the-art. Our results also point out weaknesses regarding the general validity of the majority of previous injection experiments in C/C++.2021-01-01T00:00:00ZCorrespondence article: counterexample for suspension-aware schedulability analysis of EDF schedulingGünzel, MarioChen, Jian-Jiahttp://hdl.handle.net/2003/402342021-05-31T22:12:28Z2020-08-18T00:00:00ZTitle: Correspondence article: counterexample for suspension-aware schedulability analysis of EDF scheduling
Authors: Günzel, Mario; Chen, Jian-Jia2020-08-18T00:00:00ZA note on slack enforcement mechanisms for self-suspending tasksGünzel, MarioChen, Jian-Jiahttp://hdl.handle.net/2003/400882021-03-19T23:10:20Z2021-01-27T00:00:00ZTitle: A note on slack enforcement mechanisms for self-suspending tasks
Authors: Günzel, Mario; Chen, Jian-Jia
Abstract: This paper provides counterexamples for the slack enforcement mechanisms to handle segmented self-suspending real-time tasks by Lakshmanan and Rajkumar (Proceedings of the Real-Time and Embedded Technology and Applications Symposium (RTAS), pp 3–12, 2010).2021-01-27T00:00:00ZNanoparticle classification using frequency domain analysis on resource-limited platformsYayla, MikailToma, AnasChen, Kuan-HsunLenssen, Jan EricShpacovitch, VictoriaHergenröder, RolandWeichert, FrankChen, Jian-Jiahttp://hdl.handle.net/2003/385412020-01-30T02:40:51Z2019-09-24T00:00:00ZTitle: Nanoparticle classification using frequency domain analysis on resource-limited platforms
Authors: Yayla, Mikail; Toma, Anas; Chen, Kuan-Hsun; Lenssen, Jan Eric; Shpacovitch, Victoria; Hergenröder, Roland; Weichert, Frank; Chen, Jian-Jia
Abstract: A mobile system that can detect viruses in real time is urgently needed, due to the combination of virus emergence and evolution with increasing global travel and transport. A biosensor called PAMONO (for Plasmon Assisted Microscopy of Nano-sized Objects) represents a viable technology for mobile real-time detection of viruses and virus-like particles. It could be used for fast and reliable diagnoses in hospitals, airports, the open air, or other settings. For analysis of the images provided by the sensor, state-of-the-art methods based on convolutional neural networks (CNNs) can achieve high accuracy. However, such computationally intensive methods may not be suitable on most mobile systems. In this work, we propose nanoparticle classification approaches based on frequency domain analysis, which are less resource-intensive. We observe that on average the classification takes 29 μ s per image for the Fourier features and 17 μ s for the Haar wavelet features. Although the CNN-based method scores 1–2.5 percentage points higher in classification accuracy, it takes 3370 μ s per image on the same platform. With these results, we identify and explore the trade-off between resource efficiency and classification performance for nanoparticle classification of images provided by the PAMONO sensor.2019-09-24T00:00:00ZRealistic scheduling models and analyses for advanced real-time embedded systemsBrüggen, Georg von derhttp://hdl.handle.net/2003/385262020-01-18T02:41:25Z2019-01-01T00:00:00ZTitle: Realistic scheduling models and analyses for advanced real-time embedded systems
Authors: Brüggen, Georg von der
Abstract: Focusing on real-time scheduling theory, the thesis demonstrates how essential realistic scheduling models and analyses are when guaranteeing timing correctness without over-provisioning the necessary system resources. It details potential pitfalls of the de facto standards for theoretical examination of scheduling algorithms and schedulability tests, namely resource augmentation bounds and utilization bounds, and proposes parametric augmentation functions to improve their meaningfulness. Considering uncertain execution behaviour, systems with dynamic real-time guarantees are introduced to model this scenario more realistically than mixed-criticality systems, and the first technique that allows to precisely calculate the worst-case deadline failure probability for task sets with a realistic number of tasks is provided. Furthermore, hybrid self-suspension models are proposed that bridge the gap between the over-flexible dynamic and the over-restrictive segmented self-suspension model with different tradeoffs between accuracy and flexibility.2019-01-01T00:00:00ZEnergy-aware design of hardware and software for ultra-low-power systemsBuschhoff, Markushttp://hdl.handle.net/2003/382712019-10-09T01:40:46Z2019-01-01T00:00:00ZTitle: Energy-aware design of hardware and software for ultra-low-power systems
Authors: Buschhoff, Markus
Abstract: Future visions of the Internet of Things and Industry 4.0
demand for large scale deployments of mobile devices while removing
the numerous disadvantages of using batteries: degradation, scale, weight,
pollution, and costs. However, this requires computing platforms with extremely
low energy consumptions, and thus employ ultra-low-power hardware, energy
harvesting solutions, and highly efficient power-management hardware and
software.
The goal of these power management solutions is to either achieve power
neutrality, a condition where energy harvest and energy consumption equalize
while maximizing the service quality, or to enhance power efficiency for
conserving energy reserves. To reach these goals, intelligent power-management
decisions are needed that utilize precise energy data.
This thesis discusses the measurement of energy in embedded systems, both
online and by external equipment, and the utilization of the acquired data for
modeling the power consumption states of each involved hardware component.
Furthermore, a method is shown to use the resulting models by instrumenting
preexisting device drivers.
These drivers enable new functionalities, such as online energy accounting and
energy application interfaces, and facilitate intelligent power management
decisions.
In order to reduce additional efforts for device driver reimplementation and
the violation of the separation of concerns paradigm, the approach shown
in this thesis synthesizes instrumentation aspects for an
aspect oriented programming language, so that the original device-driver
source code remains unaffected.
Eventually, an automated process of energy measurement and data
analysis is presented. This process is able to yield precise energy models
with low manual effort. In combination with the instrumentation synthesis of
aspect code, this method enables an accelerated creation process for energy
models of ultra-low-power systems. For all proposed methods,
empirical accuracy and overhead measurements are presented.
To support the claims of the author, first practical energy aware and
wireless-radio networked applications are showcased: An energy-neutral light
sensor, a photovoltaic-powered seminar-room door plate, and a sensor network
experiment testbed for research and education.2019-01-01T00:00:00ZSegmentation-free word spotting with bag-of-features hidden Markov modelsRothacker, Leonardhttp://hdl.handle.net/2003/381862019-08-23T01:40:47Z2019-01-01T00:00:00ZTitle: Segmentation-free word spotting with bag-of-features hidden Markov models
Authors: Rothacker, Leonard
Abstract: The method that is proposed in this thesis makes document images searchable with minimum manual effort. This works in the query-by-example scenario where the user selects an exemplary occurrence of the query word in a document image. Afterwards, an entire collection of document images is searched automatically. The major challenge is to detect relevant words and to sort them according to similarity to the query. However, recognizing text in historic document images can be considered as extremely challenging. Different historic document collections have highly irregular visual appearances due to non-standardized layouts or the large variabilities in handwritten script. An automatic text recognizer requires huge amounts of annotated samples from the collection that are usually not directly available.
In order to search document images with just a single example of the query word, the information that is available about the problem domain is integrated at various levels. Bag-of-features are a powerful image representation that can be adapted to the data automatically. The query word is represented with a hidden Markov model. This statistical sequence model is very suitable for the sequential structure of text. An important assumption is that the visual variability of the text within a single collection is limited. For example, this is typically the case if the documents have been written by only a few writers. Furthermore, the proposed method requires only minimal heuristic assumptions about the visual appearance of text. This is achieved by processing document images as a whole without requiring a given segmentation of the images on word level or on line level. The detection of potentially relevant document regions is based on similarity to the query. It is not required to recognize words in general. Word size variabilities can be handled by the hidden Markov model. In order to make the computationally costly application of the sequence model feasible in practice, regions are retrieved according to approximate similarity with an efficient model decoding algorithm. Since the approximate approach retrieves regions with high recall, re-ranking these regions with the sequence model leads to highly accurate word spotting results. In addition, the method can be extended to textual queries, i.e., query-by-string, if annotated samples become available.
The method is evaluated on five benchmark datasets. In the segmentation-free query-by-example scenario where no annotated sample set is available, the method outperforms all other methods that have been evaluated on any of these five benchmarks. If only a small dataset of annotated samples is available, the performance in the query-by-string scenario is competitive with the state-of-the-art.2019-01-01T00:00:00ZDesign of fault-tolerant virtual execution environments for cyber-physical systemsJablkowski, Boguslawhttp://hdl.handle.net/2003/381542019-08-02T01:40:48Z2019-01-01T00:00:00ZTitle: Design of fault-tolerant virtual execution environments for cyber-physical systems
Authors: Jablkowski, Boguslaw
Abstract: The last decade revealed the vast economical and societal potential of Cyber-Physical Systems (CPS) which integrate computation with physical processes. In order to better exploit this potential, designers of CPS are trying to take advantage of novel technological opportunities provided by the unprecedented efficiency of today's hardware. There are, however, considerable challenges to this endeavor.
First, there is a strong trend towards softwarization. Functions that were originally implemented in hardware are now being increasingly realized in software. This fact, together with the ever growing functionality of modern CPS, translates to unrestrained code generation which, in turn, directly influences their safety and security. Second, the spreading adaptation of multi-core and manycore architectures, due to their considerable increase in computation power, additionally generates issues related to timing properties, resource partitioning, task mapping and scalability.
In order to overcome these challenges, this thesis investigates the idea of adopting virtualization technology to the domain of CPS. Several research questions originate from this idea and the following work aims at answering those questions. It addresses both technological and methodological issues. With respect to the technological aspects, it investigates problems and proposes solutions related to timing properties of a virtualized execution platform as well as the thereon based high availability technique. Regarding the methodological aspects, it discusses models and methods for the planing of safe and efficient virtualized CPS compute and control clusters, proposes architectures for the development and verification of virtualized CPS applications as well as for the testing of non-functional characteristics of the underlying software and hardware infrastructure. Further, through a set of experiments, this thesis thoroughly evaluates the proposed solutions.
Finally, based upon the provided results and some new considerations regarding the requirements of future CPS applications, it gives an outlook towards a generic virtualized execution platform architecture for emerging CPS.2019-01-01T00:00:00ZOptimization and analysis for dependable application software on unreliable hardware platformsChen, Kuan-Hsunhttp://hdl.handle.net/2003/381102019-06-26T01:40:47Z2019-01-01T00:00:00ZTitle: Optimization and analysis for dependable application software on unreliable hardware platforms
Authors: Chen, Kuan-Hsun
Abstract: As chip technology keeps on shrinking towards higher densities and lower operating vol-
tages, memory and logic components are now vulnerable to electromagnetic inference
and radiation, leading to transient faults in the underlying hardware, which may jeopar-
dize the correctness of software execution and cause so-called soft errors. To mitigate
threats of soft errors, embedded-software developers have started to deploy Software-
Implemented Hardware Fault Tolerance (SIHFT) techniques. However, the main cost
is the signi cant amount of time due to the additional computation of using SIHFT
techniques. To support safety critical systems, e.g., computing systems in automotive
and avionic devices, real-time system technology has been primarily used and been wi-
dely studied. While considering hardware transient faults and SIHFT techniques with
real-time system technology, novel scheduling approaches and schedulability analyses
are desired to provide a less pessimistic o -line guarantee for timeliness or at least to
provide a certain degree of performance for new application models. Moreover, reliability
optimizations also need to be designed thoughtfully while considering di erent resource
constraints.
In this dissertation, we present three treatments for soft errors. Firstly, we study how
to allow erroneous computations without deadline misses by modeling inherent safety
margins and noise tolerance in control applications as (m; k) constraints. We further dis-
cuss how a given (m; k) requirement can be satis ed by individual error detection and
exible compensations while satisfying the given hard real-time constraints. Secondly, we
analyze the probability of deadline misses and the deadline miss rate in soft real-time
systems, which allow to have occasional deadline misses without erroneous computations.
Thirdly, we consider how to deploy redundant multi-threading techniques to improve the
system reliability under two di erent system models for multi-core systems: 1) Under
core-to-core frequency variations, we address the reliability-aware task-mapping problem.
2) We decide on redundancy levels for each task while satisfying the given real-time
constraints and the limited redundant cores even under multi-tasking. Finally, an enhan-
cement for real time operating systems is also provided to maintain the strict periodicity
for task overruns due to potential transient faults, especially on one popular platform
named Real-Time Executive for Multiprocessor Systems (RTEMS).2019-01-01T00:00:00ZLearning attribute representations with deep convolutional neural networks for word spottingSudholt, Sebastianhttp://hdl.handle.net/2003/378812019-01-18T02:40:53Z2018-01-01T00:00:00ZTitle: Learning attribute representations with deep convolutional neural networks for word spotting
Authors: Sudholt, Sebastian
Abstract: Understanding the contents of handwritten texts from document images has long been a traditional field of research in computer science.
The ultimate goal is to automatically transcribe the text in the images into an electronic format.
This would make the documents from which the images were generated much easier to access and would also allow for a fast extraction of information.
Especially for historical documents a possibility to easily sift through large document image collections would be of high interest.
There exist vast amounts of manuscripts all over the world storing substantial amounts of yet untapped information on cultural heritage.
Being able to extract these information for large and different corpora would allow historians unprecedented insight into various aspects of ancient human life.
The desired goal is thus to obtain information on the text embedded in digital document images with no manual human interaction at all.
A well known approach for achieving this is to make use of models known from the field of pattern recognition and machine learning in order to classify the text in the images into electronic representations of characters or words.
This approach is known as Optical Character Recognition or text recognition and belongs to the oldest applications of pattern recognition and computer science in general.
Despite its long history, handwritten text recognition is still considered an unsolved task as classification systems are still not able to consistently achieve results as are common for machine printed text recognition.
This is especially true for historical documents as the text to be recognized typically exhibits different amounts of degradation as well as large variability in handwriting for the same characters and words.
Depending on the task at hand, a full transcription of the text might, however, not be necessary.
If a potential user is only interested in whether a certain word or text portion is present in a given document collection or not, retrieval-based approaches are able to produce more robust results than recognition-based ones.
These retrieval-based approaches compare parts of the document images to a sought-after query and decide if the individual parts are similar to the query.
For a given method, the result is then a list of parts of the document images which are deemed relevant by the method.
In the field of document image analysis, this retrieval approach is known as keyword spotting or simply word spotting.
Word spotting is the problem of interest in this thesis.
In particular, a method will be presented which allows for using neural network models in order to approach different word spotting tasks.
This method is inspired by a recent state-of-the-art approach which utilizes semantic attributes for word spotting.
In pattern recognition and computer vision, semantic attributes describe characteristics of classes which may be shared between classes.
This sharing ability enables an attribute representations to encode parts of different classes which are common and those which are not.
For example, when classifying animals, the classes tiger and zebra may share an attribute striped.
For word spotting, attributes have been used in order to encode the occurrence and position of certain characters.
The success of any attribute-based method is, of course, highly dependent on the ability of a classifier to correctly predict the individual attributes.
In order to accomplish an accurate prediction of attributes for word spotting tasks, the use of Convolutional Neural Networks (CNNs) is proposed in this thesis.
CNNs have recently attracted a substantial amount of research interest as they are able to consistently achieve state-of-the-art results in virtually all fields of computer vision.
Their main advantage compared to other methods is their ability to jointly optimize a classifier and the feature representations obtained from the images.
This characteristic is known as end-to-end learning.
While CNNs have been used extensively for classifying data into one of multiple classes for various tasks, predicting attributes with these neural networks has largely been done for face and fashion attributes only.
For the method presented in this thesis a CNN is trained to predict attribute representations extracted from word strings in an end-to-end fashion.
These attributes are leveraged in order to perform word spotting.
The core contribution lies in the design and evaluation of different neural network architectures which are specifically designed to be applied to document images.
A big part of this design is to determine suitable loss functions for the CNNs.
Loss functions are a crucial ingredient in the training of neural networks in general and largely determine what kind of annotations the individual networks are able to learn for the given images.
In particular, two loss function are derived, which allow for learning binary attribute representations as well as real-valued representations who can be considered attribute-like.
Besides the loss functions, the second major contribution is the design of three CNN architectures which are tailor-made for being applied to problems involving handwritten text as data.
Using the loss functions and the three architectures, a number experiments are conducted in which the neural networks are trained to predict the attribute or attribute-like representations Pyramidal Histogram of Characters (PHOC), Spatial Pyramid of Characters (SPOC) and Discrete Cosine Transform of Words (DCToW).
It is shown experimentally, that the proposed approach of using neural networks for predicting attribute representations achieves state-of-the-art results for various word spotting benchmarks.2018-01-01T00:00:00ZPartially supervised learning of models for visual scene and object recognitionGrzeszick, Renéhttp://hdl.handle.net/2003/371172018-09-05T08:52:51Z2018-01-01T00:00:00ZTitle: Partially supervised learning of models for visual scene and object recognition
Authors: Grzeszick, René
Abstract: When creating a visual recognition system for a novel task, one of the main burdens is the collection and annotation of data. Often several thousand samples need to be manually reviewed and labeled so that the recognition system achieves the desired accuracy. The goal of this thesis is to provide methods that lower the annotation effort for visual scene and object recognition. These methods are applicable to traditional pattern recognition approaches as well as methods from the field of deep learning. The contributions are three-fold and range from feature augmentation, over semi-supervised learning for natural scene classification to zero-shot object recognition.
The contribution in the field of feature augmentation deals with handcrafted feature representations. A novel method for incorporating additional information at feature level has been introduced. This information is subsequently integrated in a Bag-of-Features representation. The additional information can, for example, be of spatial or temporal nature, encoding a local feature's position within a sample in its feature descriptor. The information is quantized and appended to the feature vector and thus also integrated in the unsupervised learning step of the Bag-of-Features representation. As a result more specific codebook entries are computed for different regions within the samples.
The results in the field of image classification for natural scenes and objects as well as the field of acoustic event detection, show that the proposed approach allows for learning compact feature representations without reducing the accuracy of the subsequent classification.
In the field of semi-supervised learning, a novel approach for learning annotations in large image collections of natural scene images has been proposed. The approach is based on the active learning principle and incorporates multiple views on the data. The views, i.e. different feature representations, are clustered independently of each other. A human in the loop is asked to label each data cluster. The clusters are then iteratively refined based on cluster evaluation measures and additional labels are assigned to the dataset. Ultimately, a voting over all views creates a partially labeled sample set that is used for training a classifier.
The results on natural scene images show that a powerful visual classifier can be learned with minimal annotation effort. The approach has been evaluated for traditional handcrafted features as well as features derived from a convolutional neural network. For the semi-supervised learning it is desirable to have compact feature representation. For traditional features, the ones obtained by the proposed feature augmentation approach are a good example of such a representation. Especially the application in the field of deep learning, which usually requires large amounts of labeled samples for training or even adapting a deep neural network, the semi-supervised learning is beneficial.
For the zero-shot object prediction, a method that combines visual and semantic information about natural scenes is proposed. A convolutional neural network is trained in order to distinguish different scene categories. Furthermore, the relations between scene categories and visual object classes are learned based on their semantic relation in large text corpora. The probability for a given image to show a certain scene is derived from the network and combined with the semantic relations based on a statistical approach. This allows for predicting the presence of certain object classes in an image without having any visual training sample from any of the object classes.
The results on a challenging dataset depicting various objects in natural scene images, show that especially in cluttered scenes the semantic relations can be a powerful information cue. Furthermore, when post-processing the results of a visual object predictor, the detection accuracy can be improved at the minimal cost of providing additional scene labels.
When combining these contributions, it is shown that a scene classifier can be trained with minimal human effort and its predictions can still be leveraged for object prediction. Thus, information about natural scene images and the object classes within these images can be gained without having the burden to manually label tremendous amounts of images beforehand.2018-01-01T00:00:00ZMethods for efficient resource utilization in statistical machine learning algorithmsKotthaus, Helenahttp://hdl.handle.net/2003/369292018-06-20T08:35:57Z2018-01-01T00:00:00ZTitle: Methods for efficient resource utilization in statistical machine learning algorithms
Authors: Kotthaus, Helena
Abstract: In recent years, statistical machine learning has emerged as a key technique for tackling problems that elude a classic algorithmic approach. One such problem, with a major impact on human life, is the analysis of complex biomedical data. Solving this problem in a fast and efficient manner is of major importance, as it enables, e.g., the prediction of the efficacy of different drugs for therapy selection. While achieving the highest possible prediction quality appears desirable, doing so is often simply infeasible due to resource constraints. Statistical learning algorithms for predicting the health status of a patient or for finding the best algorithm configuration for the prediction require an excessively high amount of resources. Furthermore, these algorithms are often implemented with no awareness of the underlying system architecture, which leads to sub-optimal resource utilization.
This thesis presents methods for efficient resource utilization of statistical learning applications. The goal is to reduce the resource demands of these algorithms to meet a given time budget while simultaneously preserving the prediction quality. As a first step, the resource consumption characteristics of learning algorithms are analyzed, as well as their scheduling on underlying parallel architectures, in order to develop optimizations that enable these algorithms to scale to larger problem sizes. For this purpose, new profiling mechanisms are incorporated into a holistic profiling framework.
The results show that one major contributor to the resource issues is memory consumption. To overcome this obstacle, a new optimization based on dynamic sharing of memory is developed that speeds up computation by several orders of magnitude in situations when available main memory is the bottleneck, leading to swapping out memory. One important application that can be applied for automated parameter tuning of learning algorithms is model-based optimization. Within a huge search space, algorithm configurations are evaluated to find the configuration with the best prediction quality. An important step towards better managing this search space is to parallelize the search process itself.
However, a high runtime variance within the configuration space can cause inefficient resource utilization. For this purpose, new resource-aware scheduling strategies are developed that efficiently map evaluations of configurations to the parallel architecture, depending on their resource demands. In contrast to classical scheduling problems, the new scheduling interacts with the configuration proposal mechanism to select configurations with suitable resource demands. With these strategies, it becomes possible to make use of the full potential of parallel architectures.
Compared to established parallel execution models, the results show that the new approach enables model-based optimization to converge faster to the optimum within a given time budget.2018-01-01T00:00:00ZEfficient implementation of resource-constrained cyber-physical systems using multi-core parallelismNeugebauer, Olafhttp://hdl.handle.net/2003/369282018-06-16T01:41:30Z2018-01-01T00:00:00ZTitle: Efficient implementation of resource-constrained cyber-physical systems using multi-core parallelism
Authors: Neugebauer, Olaf
Abstract: The quest for more performance of applications and systems became more challenging in the recent years. Especially in the cyber-physical and mobile domain, the performance requirements increased significantly. Applications, previously found in the high-performance domain, emerge in the area of resource-constrained domain. Modern heterogeneous high-performance MPSoCs provide a solid foundation to satisfy the high demand. Such systems combine general processors with specialized accelerators ranging from GPUs to machine learning chips. On the other side of the performance spectrum, the demand for small energy efficient systems exposed by modern IoT applications increased vastly. Developing efficient software for such resource-constrained multi-core systems is an error-prone, time-consuming and challenging task. This thesis provides with PA4RES a holistic semiautomatic approach to parallelize and implement applications for such platforms efficiently. Our solution supports the developer to find good trade-offs to tackle the requirements exposed by modern applications and systems. With PICO, we propose a comprehensive approach to express parallelism in sequential applications. PICO detects data dependencies and implements required synchronization automatically. Using a genetic algorithm, PICO optimizes the data synchronization. The evolutionary algorithm considers channel capacity, memory mapping, channel merging and flexibility offered by the channel implementation with respect to execution time, energy consumption and memory footprint. PICO's communication optimization phase was able to generate a speedup almost 2 or an energy improvement of 30% for certain benchmarks.
The PAMONO sensor approach enables a fast detection of biological viruses using optical methods. With a sophisticated virus detection software, a real-time virus detection running on stationary computers was achieved.
Within this thesis, we were able to derive a soft real-time capable virus detection running on a high-performance embedded system, commonly found in today's smart phones. This was accomplished with smart DSE algorithm which optimizes for execution time, energy consumption and detection quality. Compared to a baseline implementation, our solution achieved a speedup of 4.1 and 87\% energy savings and satisfied the soft real-time requirements. Accepting a degradation of the detection quality, which still is usable in medical context, led to a speedup of 11.1. This work provides the fundamentals for a truly mobile real-time virus detection solution. The growing demand for processing power can no longer satisfied following well-known approaches like higher frequencies. These so-called performance walls expose a serious challenge for the growing performance demand. Approximate computing is a promising approach to overcome or at least shift the performance walls by accepting a degradation in the output quality to gain improvements in other objectives. Especially for a safe integration of approximation into existing application or during the development of new approximation techniques, a method to assess the impact on the output quality is essential.
With QCAPES, we provide a multi-metric assessment framework to analyze the impact of approximation.
Furthermore, QCAPES provides useful insights on the impact of approximation on execution time and energy consumption. With ApproxPICO we propose an extension to PICO to consider approximate computing during the parallelization of sequential applications.2018-01-01T00:00:00ZAcoustic sensor network geometry calibration and applicationsPlinge, Axelhttp://hdl.handle.net/2003/363432018-01-26T02:40:48Z2017-01-01T00:00:00ZTitle: Acoustic sensor network geometry calibration and applications
Authors: Plinge, Axel
Abstract: In the modern world, we are increasingly surrounded by computation devices with communication links and one or more microphones.
Such devices are, for example, smartphones, tablets, laptops or hearing aids. These devices can work together as nodes in an acoustic sensor network (ASN). Such networks are a growing platform that opens the possibility for many practical applications. ASN based speech enhancement, source localization, and event detection can be applied for teleconferencing, camera control, automation, or assisted living. For this kind of applications, the awareness of auditory objects and their spatial positioning are key properties. In order to provide these two kinds of information, novel methods have been developed in this thesis. Information on the type of auditory objects is provided by a novel real-time sound classification method. Information on the position of human speakers is provided by a novel localization and tracking method. In order to localize with respect to the ASN, the relative arrangement of the sensor nodes has to be known. Therefore, different novel geometry calibration methods were developed.
Sound classification
The first method addresses the task of identification of auditory objects. A novel application of the bag-of-features (BoF) paradigm on acoustic event classification and detection was introduced. It can be used for event and speech detection as well as for speaker identification.
The use of both mel frequency cepstral coefficient (MFCC) and Gammatone frequency cepstral coefficient (GFCC) features improves the classification accuracy. By using soft quantization and introducing supervised training for the BoF model, superior accuracy is achieved. The method generalizes well from limited training data. It is working online and can be computed in a fraction of real-time.
By a dedicated training strategy based on a hierarchy of stationarity, the detection of speech in mixtures with noise was realized. This makes the method robust against severe noises levels corrupting the speech signal. Thus it is possible to provide control information to a beamformer in order to realize blind speech enhancement. A reliable improvement is achieved in the presence of one or more stationary noise sources.
Speaker localization
The localization method enables each node to determine the direction of arrival (DoA) of concurrent sound sources. The author's neuro-biologically inspired speaker localization method for microphone arrays was refined for the use in ASNs. By implementing a dedicated cochlear and midbrain model, it is robust against the reverberation found in indoor rooms. In order to better model the unknown number of concurrent speakers, an application of the EM algorithm that realizes probabilistic clustering according to auditory scene analysis (ASA) principles was introduced.
Based on this approach, a system for Euclidean tracking in ASNs was designed. Each node applies the node wise localization method and shares probabilistic DoA estimates together with an estimate of the spectral distribution with the network. As this information is relatively sparse, it can be transmitted with low bandwidth. The system is robust against jitter and transmission errors. The information from all nodes is integrated according to spectral similarity to correctly associate concurrent speakers. By incorporating the intersection angle in the triangulation, the precision of the Euclidean localization is improved. Tracks of concurrent speakers are computed over time, as is shown with recordings in a reverberant room.
Geometry calibration
The central task of geometry calibration has been solved with special focus on sensor nodes equipped with multiple microphones. Novel methods were developed for different scenarios. An audio-visual method was introduced for the calibration of ASNs in video conferencing scenarios. The DoAs estimates are fused with visual speaker tracking in order to provide sensor positions in a common coordinate system.
A novel acoustic calibration method determines the relative positioning of the nodes from ambient sounds alone. Unlike previous methods that only infer the positioning of distributed microphones, the DoA is incorporated and thus it becomes possible to calibrate the orientation of the nodes with a high accuracy. This is very important for all applications using the spatial information, as the triangulation error increases dramatically with bad orientation estimates. As speech events can be used, the calibration becomes possible without the requirement of playing dedicated calibration sounds.
Based on this, an online method employing a genetic algorithm with incremental measurements was introduced. By using the robust speech localization method, the calibration is computed in parallel to the tracking. The online method is be able to calibrate ASNs in real time, as is shown with recordings of natural speakers in a reverberant room.
The informed acoustic sensor network
All new methods are important building blocks for the use of ASNs. The online methods for localization and calibration both make use of the neuro-biologically inspired processing in the nodes which leads to state-of-the-art results, even in reverberant enclosures. The high robustness and reliability can be improved even more by including the event detection method in order to exclude non-speech events. When all methods are combined, both semantic information on what is happening in the acoustic scene as well as spatial information on the positioning of the speakers and sensor nodes is automatically acquired in real time. This realizes truly informed audio processing in ASNs. Practical applicability is shown by application to recordings in reverberant rooms. The contribution of this thesis is thus not only to advance the state-of-the-art in automatically acquiring information on the acoustic scene, but also pushing the practical applicability of such methods.2017-01-01T00:00:00ZCo-Konfiguration von Hardware- und Systemsoftware-ProduktlinienMeier, Matthiashttp://hdl.handle.net/2003/360062017-06-24T02:00:17Z2017-01-01T00:00:00ZTitle: Co-Konfiguration von Hardware- und Systemsoftware-Produktlinien
Authors: Meier, Matthias
Abstract: Hardwarearchitekturen im Kontext von Eingebetteten Systemen werden immer komplexer und bewegen sich zukünftig immer häufiger in Richtung von Multi- oder Manycore-Systemen. Damit diese Systeme ihre optimale Leistungsfähigkeit – für die oftmals speziellen Aufgaben im Kontext von Eingebetteten Systemen – ausspielen können, beschäftigen sich ganze Forschungszweige mit der anwendungsspezifischen Maßschneiderung dieser Systeme. Insbesondere die Popularität von Hardwarebeschreibungssprachen trägt dazu ihren Teil bei. Jedoch ist die Entwicklung von solchen Systemen, selbst bei der Verwendung von Hardwarebeschreibungssprachen und der damit verbundenen höheren Abstraktionsebene, aufwendig und fehleranfällig.
Die Verwendung von Hardwarebeschreibungssprachen lässt allerdings die Grenze zwischen Hard- und Software verschwimmen, denn Hardware kann nun – ähnlich wie auch Software – in textueller Form beschrieben werden. Dies eröffnet Möglichkeiten zur Übertragung von Konzepten aus der Software- auf die Hardwareentwicklung. Ein Konzept um der wachsenden Komplexität im Bereich der Softwareentwicklung zu begegnen, ist die organisierte Wiederverwendung von Komponenten, wie sie in der Produktlinienentwicklung zum Einsatz kommt. Inwieweit sich Produktlinienkonzepte auf Hardwarearchitekturen übertragen lassen und wie Hardware-Produktlinien entworfen werden können, soll in dieser Arbeit detailliert untersucht werden. Die Vorteile der Produktlinientechniken, wie die Möglichkeit zur Wiederverwendung von erprobten und zuverlässigen Komponenten, könnten so auch für Hardwarearchitekturen genutzt werden, um die Entwicklungskomplexität zu reduzieren und so mit erheblich geringerem Aufwand spezifische Hardwarearchitekturen entwickeln zu können. Zudem kann durch die gemeinsame Codebasis einer Produktlinie eine schnellere Markteinführungszeit unter geringeren Entwicklungskosten realisiert werden.
Auf Basis dieser neuen Konzepte beschäftigt sich diese Arbeit zudem mit der Fragestellung, wie zukünftig solche parallelen Systeme programmiert und automatisiert optimiert werden können, um den Entwickler von der Anwendung über die Systemsoftware bis hin zur Hardware mit einer automatisierten Werkzeugkette bei der Umsetzung zu unterstützen. Im Fokus stehen dabei die in dieser Arbeit entworfenen Techniken zur durchgängigen Konfigurierung von Hardware und Systemsoftware. Diese Techniken beruhen im Wesentlichen auf den Programmierschnittstellen zwischen den Schichten, deren Zugriffsmuster sich statisch analysieren lassen. Die so gewonnenen Konfigurationsinformationen lassen sich dann zur automatisierten Maßschneiderung der Systemsoftware- und Hardware-Produktlinie für ein spezifisches Anwendungsszenario nutzen.
Die anwendungsspezifische Optimierung der Systeme wird in dieser Arbeit mittels einer Entwurfsraumexploration durchgeführt. Der Fokus der Entwurfsraumexploration liegt allerdings nicht allein auf der Hardwarearchitektur, sondern umfasst ebenso die Softwareebene. Denn neben der Maßschneiderung der Systemsoftware, wird auch die auf einer parallelen Programmierschnittstelle aufsetzende Anwendung innerhalb der Entwurfsraumexploration automatisch skaliert, um die Leistungsfähigkeit von Manycore-Systemen ausschöpfen zu können.2017-01-01T00:00:00ZMemory-aware platform description and framework for source-level embedded MPSoC software optimizationPyka, Roberthttp://hdl.handle.net/2003/360032017-06-24T02:00:08Z2017-01-01T00:00:00ZTitle: Memory-aware platform description and framework for source-level embedded MPSoC software optimization
Authors: Pyka, Robert
Abstract: Developing optimizing source-level transformations, consists of numerous non-trivial subtasks. Besides identifying actual optimization goals within a particular target-platform and compiler setup, the actual implementation is a tedious, error-prone and often recurring work. Providing appropriate support for this development work is a challenging task. Defining and implementing a well-suited target-platform description which can be used by a wide set of optimization techniques while being precise and easy to maintain is one dimension of this challenging task. Another dimension, which has also been tackled in this work, deals with provision of an infrastructure for optimization-step representation, interaction and data retention. Finally, an appropriate source-code representation has been integrated into this approach. These contributions are tightly related to each other, they have been bundled into the MACCv2 framework, a fullfledged optimization-technique implementation and integration approach. Together, they significantly alleviate the effort required for implementation of source-level memory-aware optimization techniques for Multi Processor Systems on a Chip (MPSoCs).
The system-modeling approach presented in this dissertation has been located at the processor-memory-switch (PMS) abstraction level. It offers a novel combined structural and semantical description. It combines a locally-scoped, structural modeling approach, as preferred by system designers, and a fast, database-like interface, best suited for optimization technique developers. It supports model refinement and requires only limited effort for an initial abstract system model.
The general structure consists of components and channels. Based on this structure, the system model provides mechanisms for database-like access to system-global target-platform properties, while requiring only definition of locally-scoped input data annotated to system-model items. A typical set of these properties contains energy-consumption and access-latency values. The request-based retrieval of system properties is a unique feature, which makes this approach superior to state-of-the-art table-lookup-based or full-system-simulation-based approaches.
Combining such component-local properties to system-global target-platform data is performed via aspect handlers. These handlers define computational rules which are applied to correlated locally-scoped data along access paths in the memory-subsystem hierarchy. This approach is capable of calculating these system-global values at a rate similar to plain table lookups, while maintaining a precision close to full-system-simulation-based estimations. This has been shown for both, energy-consumption values as well as access-latency values of the MPARM platform.
The MACCv2 framework provides a set of fundamental services to the optimization technique developer. On top of these services, a system model and source-code representation are provided. Further, framework-based optimization-technique implementations are encapsulated into self-contained entities exposing well-defined interfaces.
This framework has been successfully used within the European Commission funded MNEMEE project. The hierarchical processing-step representation in MACCv2 allows for encapsulation of tasks at various granularity levels. For simplified reuse in future projects, the entire toolchain as well as individual optimization techniques have been represented as processing-step entities in terms of MACCv2. A common notion of target-platform structure and properties as well as inter-processing-step communication, is achieved via framework-provided services.
The system-modeling approach and the framework show the right set of properties needed to support development of memory-aware optimization techniques. The MNEMEE project, continued research work, teaching activities and PhD theses have been successfully founded on approaches and the framework proposed in this dissertation.2017-01-01T00:00:00ZScheduling algorithms and timing analysis for hard real-time systemsHuang, Wen-Hung Kevinhttp://hdl.handle.net/2003/359842017-06-09T02:00:08Z2017-01-01T00:00:00ZTitle: Scheduling algorithms and timing analysis for hard real-time systems
Authors: Huang, Wen-Hung Kevin
Abstract: Real-time systems are designed for applications in which response time is critical. As timing is a major property of such systems, proving timing correctness is of utter importance. To achieve this, a two-fold approach of timing analysis is traditionally involved: (i) worst-case execution time (WCET) analysis, which computes an upper bound on the execution time of a single job of a task running in isolation; and (ii) schedulability analysis using the WCET as the input, which determines whether multiple tasks are guaranteed to meet their deadlines. Formal models used for representing recurrent real-time tasks have traditionally been characterized by a collection of independent jobs that are released periodically. However, such a modeling may result in resource under-utilization in systems whose behaviors are not entirely periodic or independent. Examples are (i) multicore platforms where tasks share a communication fabric, like bus, for accesses to a shared memory beside processors; (ii) tasks with synchronization, where no two concurrent access to one shared resource are allowed to be in their critical section at the same time; and (iii) automotive systems, where tasks are linked to rotation (e.g., of the crankshaft, gears, or wheels). There, their activation rate is proportional to the angular velocity of a specific device. This dissertation presents multiple approaches towards designing scheduling algorithms and schedulability analysis for a variety of real-time systems with different characteristics. Specifically, we look at those design problems from the perspective of speedup factor — a metric that quantifies both the pessimism of the analysis and the non-optimality of the scheduling algorithm. The proposed solutions are shown promising by means of not only speedup factor but also extensive evaluations.2017-01-01T00:00:00ZAspect-oriented technology for dependable operating systemsBorchert, Christophhttp://hdl.handle.net/2003/359752017-05-27T02:00:11Z2017-01-01T00:00:00ZTitle: Aspect-oriented technology for dependable operating systems
Authors: Borchert, Christoph
Abstract: Modern computer devices exhibit transient hardware faults that disturb the electrical behavior but do not cause permanent physical damage to the devices. Transient faults are caused by a multitude of sources, such as fluctuation of the supply voltage, electromagnetic interference, and radiation from the natural environment. Therefore, dependable computer systems must incorporate methods of fault tolerance to cope with transient faults. Software-implemented fault tolerance represents a promising approach that does not need expensive hardware redundancy for reducing the probability of failure to an acceptable level.
This thesis focuses on software-implemented fault tolerance for operating systems because they are the most critical pieces of software in a computer system: All computer programs depend on the integrity of the operating system. However, the C/C++ source code of common operating systems tends to be already exceedingly complex, so that a manual extension by fault tolerance is no viable solution. Thus, this thesis proposes a generic solution based on Aspect-Oriented Programming (AOP).
To evaluate AOP as a means to improve the dependability of operating systems, this thesis presents the design and implementation of a library of aspect-oriented fault-tolerance mechanisms. These mechanisms constitute separate program modules that can be integrated automatically into common off-the-shelf operating systems using a compiler for the AOP language. Thus, the aspect-oriented approach facilitates improving the dependability of large-scale software systems without affecting the maintainability of the source code. The library allows choosing between several error-detection and error-correction schemes, and provides wait-free synchronization for handling asynchronous and multi-threaded operating-system code.
This thesis evaluates the aspect-oriented approach to fault tolerance on the basis of two off-the-shelf operating systems. Furthermore, the evaluation also considers one user-level program for protection, as the library of fault-tolerance mechanisms is highly generic and transparent and, thus, not limited to operating systems. Exhaustive fault-injection experiments show an excellent trade-off between runtime overhead and fault tolerance, which can be adjusted and optimized by fine-grained selective placement of the fault-tolerance mechanisms. Finally, this thesis provides evidence for the effectiveness of the approach in detecting and correcting radiation-induced hardware faults: High-energy particle radiation experiments confirm improvements in fault tolerance by almost 80 percent.2017-01-01T00:00:00ZMemory-aware mapping strategies for heterogeneous MPSoC systemsHolzkamp, Oliverahttp://hdl.handle.net/2003/359582017-05-09T02:00:11Z2017-01-01T00:00:00ZTitle: Memory-aware mapping strategies for heterogeneous MPSoC systems
Authors: Holzkamp, Olivera
Abstract: Embedded systems, such as mobile phones, integrate more and more features, e.g. multiple cameras, GPS sensors and many other sensors and actuators. These kind of embedded systems are dealing with increasing complexity due to demands on performance and constraints in energy consumption. The performance on such systems can be increased by executing application tasks in parallel. To achieve this, multiprocessor systems-on-chip (MPSoC) devices were introduced. On the other side, the energy consumption of these systems has to be decreased, especially for battery-driven embedded systems. A reduction in energy consumption can be achieved by efficiently utilizing the hardware resources on these devices. MPSoC devices can be either homogeneous or heterogeneous. Homogeneous MPSoC devices usually contain the same type of processors with the same speed, i.e. clock frequency, and the same type and size of memories for each processor. In heterogeneous MPSoC devices, the processor types and/or clock frequencies and memory types and/or sizes may vary.
During the last decade, research has dealt with optimizations for the efficient utilization of hardware resources on MPSoCs. Central issues are the extraction of parallelism from sequential code and the efficient mapping of the parallelized application tasks onto the processors of the system. A few frameworks have been developed which distribute parallelized application tasks to available processors while optimizing for one or more objectives such as performance and energy consumption. They usually integrate all required, foregoing steps such as the extraction of parallelized tasks from sequential code and the extraction of a task graph as input for the mapping optimization. These steps are performed either manually or in an automated way. These kind of frameworks help the embedded system designer to significantly reduce design time. Unfortunately, the influence of memories or memory hierarchies is neglected in mapping optimizations, even though it is a well-known fact that memories have a drastic impact on the runtime and energy consumption of the system.
This dissertation investigates the effect of memory hierarchies in MPSoC mapping. Since a thread based application model is used, a thread graph extraction tool is introduced. Furthermore, two approaches for memory-aware mapping optimization for homogeneous and heterogeneous embedded MPSoC devices are presented. The thread graph extraction tool extracts a flat thread graph with important annotations for software requirements, hardware performance and energy consumption. This thread graph represents all required input information for the subsequent memory-aware mapping optimizations. Dependent on the complexity of the application, the designer can choose between a fine-grained and a coarse-grained thread graph and thus influence the overall design time.
The first presented memory-aware mapping approach handles single objective optimizations, which reduce either the runtime or the energy consumption of the system. The second presented memory-aware mapping approach handles a multiobjective optimization, which reduces both, runtime and energy consumption. All approaches additionally reduce the work of the embedded system designer and thus the design time. They work in a fully automated way and are integrated within the MACCv2/MNEMEE tool flow. The MNEMEE tool flow also provides all required foregoing steps such as the parallelization of sequential application code. The presented evaluations show that considering memory mapping during MPSoC mapping optimization significantly reduces the application runtime and energy consumption. The single objective optimizations are able to achieve an average reduction in runtime by about 21% and an average reduction in energy consumption by about 28%. The multiobjective memory-aware mapping optimization achieves an average reduction in runtime by about 21% and an average reduction in energy consumption by about 26%. Both presented optimization approaches were validated for homogeneous and heterogeneous MPSoC devices. The results clearly show that neglecting the memory subsystem can lead to wasted optimization potential.2017-01-01T00:00:00ZModeling and training options for handwritten Arabic text recognitionAhmad, Irfanhttp://hdl.handle.net/2003/358992017-03-25T03:00:11Z2016-01-01T00:00:00ZTitle: Modeling and training options for handwritten Arabic text recognition
Authors: Ahmad, Irfan2016-01-01T00:00:00ZDie Detektion interessanter Objekte unter Verwendung eines objektbasierten AufmerksamkeitsmodellsNaße, Fabianhttp://hdl.handle.net/2003/357832017-02-09T03:00:07Z2016-01-01T00:00:00ZTitle: Die Detektion interessanter Objekte unter Verwendung eines objektbasierten Aufmerksamkeitsmodells
Authors: Naße, Fabian
Abstract: Das visuelle System des Menschen ist in der Lage, komplexe Aufgaben, wie beispielsweise das Erkennen von Objekten und Personen, problemlos zu bewältigen. Mit dem Begriff Computer-Vision wird ein Forschungsgebiet bezeichnet, bei der die Fragestellung im Vordergrund steht, wie eine vergleichbare Leistungsfähigkeit in technischen Systemen erreicht werden kann. In dieser Dissertation wird diesbezüglich das Prinzip der visuellen Aufmerksamkeit betrachtet, dass einen wichtigen Aspekt des menschlichen Sehsystems darstellt. Es besagt, dass der bewussten Wahrnehmung ein unbewusster Prozess vorausgeht, durch den die Aufmerksamkeit selektiv auf potentiell wichtige oder interessante Sehinhalte gelenkt wird. Es handelt sich dabei um eine Strategie der effizienten Informationsverarbeitung, die ein schnelles Reagieren auf relevante Inhalte erlaubt. In diesem Zusammenhang bezeichnet der Begriff der visuellen Salienz die Eigenschaft von Sehinhalten, im Vergleich zu ihrem Umfeld hervorzustechen und deshalb Aufmerksamkeit zu stimulieren. Im Allgemeinen besteht für solche Inhalte eine vergleichsweise hohe Wahrscheinlichkeit, dass sie für das beobachtende Individuum von Interesse sind. Diese Arbeit hat das Thema der aufmerksamkeitsbasierten Objektdetektion zum Gegenstand. Motiviert wird das Thema als eine Alternative zu wissensbasierten Objektdetektionsverfahren, bei denen Klassifizierungsmodelle mittels annotierten Beispielbildern angelernt werden. Solche Verfahren sind im Allgemeinen mit einem hohen manuellen Vorbereitungsaufwand verbunden, weisen eine hohe Komplexität auf und skalieren schlecht mit der Anzahl der betrachteten Objektkategorien. Die zentrale Fragestellung dieser Arbeit ist es deshalb, ob sich Salienz als Kriterium für eine effizientere Lokalisierung von Objekten in Bildern nutzen lässt. Aufbauend auf der These, dass gerade die interessanten Objekte einer Szene visuell salient sind, soll durch einen aufmerksamkeitsbasierten Ansatz eine schnelle und aufwandsarme Detektion solcher Objekte ermöglicht werden. Es werden in dieser Arbeit zunächst wichtige Grundlagen aus den Bereichen der Mustererkennung, des maschinellen Lernens und der Bildverarbeitung erläutert. Anschließend werden klassische Strategien zur Lokalisierung von Objekten in Bildern aufgezeigt. Dabei werden Vor- und Nachteile verschiedener Lokalisierungsstrategien im Hinblick auf den aufmerksamkeitsbasierten Ansatz betrachtet. Im Anschluss daran werden grundlegende Konzepte sowie einflussreiche Theorien und Modelle zur visuellen Aufmerksamkeit des Menschen aufgezeigt. Hieran schließt sich eine Betrachtung mathematischer Aufmerksamkeitsmodelle aus der Literatur an. Aufbauend darauf wird ein eigenes Aufmerksamkeitsmodell vorgeschlagen, dass Objektvorschläge ermittelt und anhand ihrer Salienz bewertet. Zwecks einer generischen Anwendbarkeit wird dabei ein rein datengetriebener Ansatz favorisiert, bei dem bewusst auf die Verwendung problemspezifischen Vorwissens verzichtet wird. Das Verfahren wird schließlich auf einem schwierigen Benchmark evaluiert. Dabei werden durch Vergleiche mit anderen Modellen aus der Literatur die Vorteile der vorgeschlagenen Methoden hervorgehoben. Des Weiteren wird bei der Betrachtung der Ergebnisse gezeigt, dass Salienz ein wichtiges Kriterium bei der generischen Lokalisierung von Objekten in komplexen Bildern darstellt.2016-01-01T00:00:00ZLampung handwritten character recognitionJunaidi, Akmalhttp://hdl.handle.net/2003/353212017-04-28T08:11:54Z2016-01-01T00:00:00ZTitle: Lampung handwritten character recognition
Authors: Junaidi, Akmal
Abstract: Lampung script is a local script from Lampung province Indonesia. The script is a
non-cursive script which is written from left to right. It consists of 20 characters. It
also has 7 unique diacritics that can be put on top, bottom, or right of the character.
Considering this position, the number of diacritics augments into 12 diacritics. This
research is devoted to recognize Lampung characters along with diacritics. The
research aim to attract more concern on this script especially from Indonesian
researchers. Beside, it is also an endeavor to preserve the script from extinction.
The work of recognition is administered by multi steps processing system the so
called Lampung handwritten character recognition framework. It is started by a
preprocessing of a document image as an input. In the preprocessing stage, the input
should be distinguished between characters and diacritics. The character is classified
by a multistage scheme. The first stage is to classify 18 character classes and the
second stage is to classify special characters which consist of two components. The
number of classes after the second stage classification becomes 20 class. The diacritic
is classified into 7 classes. These diacritics should be associated to the characters to
form compound characters. The association is performed in two steps. Firstly, the
diacritic detects some characters nearby. The character with closest distance to that
diacritic is selected as the association. This is completed until all diacritics get their
characters. Since every diacritic already has one-to-one association to a character, the
pivot element is switched to a character in the second step. Each character collects
all its diacritics as a composition of the compound characters. This framework has
been evaluated on Lampung dataset created and annotated during this work and
is hosted at the Department of Computer Science, TU Dortmund, Germany. The
proposed framework achieved 80.64% recognition rate on this data.2016-01-01T00:00:00ZEfficient fault-injection-based assessment of software-implemented hardware fault toleranceSchirmeier, Horst Benjaminhttp://hdl.handle.net/2003/351752016-08-12T08:42:15Z2016-01-01T00:00:00ZTitle: Efficient fault-injection-based assessment of software-implemented hardware fault tolerance
Authors: Schirmeier, Horst Benjamin
Abstract: With continuously shrinking semiconductor structure sizes and lower supply
voltages, the per-device susceptibility to transient and permanent hardware
faults is on the rise. A class of countermeasures with growing popularity
is Software-Implemented Hardware Fault Tolerance (SIHFT), which avoids
expensive hardware mechanisms and can be applied application-specifically.
However, SIHFT can, against intuition, cause more harm than good, because
its overhead in execution time and memory space also increases the figurative
“attack surface” of the system – it turns out that application-specific configuration of SIHFT is in fact a necessity rather than just an advantage.
Consequently, target programs need to be analyzed for particularly critical spots to harden. SIHFT-hardened programs need to be measured and compared throughout all development phases of the program to observe reliability improvements or deteriorations over time. Additionally, SIHFT implementations
need to be tested.
The contributions of this dissertation focus on Fault Injection (FI) as an assessment technique satisfying all these requirements – analysis, measurement and comparison, and test. I describe the design and implementation of an FI tool, named Fail*, that overcomes several shortcomings in the state of
the art, and enables research on the general drawbacks of simulation-based
FI. As demonstrated in four case studies in the context of SIHFT research,
Fail* provides novel fine-grained analysis techniques that exploit the newly
gained possibility to analyze FI results from complete fault-space exploration.
These analysis techniques aid SIHFT design decisions on the level of program
modules, functions, variables, source-code lines, or single machine instructions.
Based on the experience from the case studies, I address the problem
of large computation efforts that accompany exhaustive fault-space exploration
from two different angles: Firstly, I develop a heuristical fault-space
pruning technique that allows to freely trade the total FI-experiment count
for result accuracy, while still providing information on all possible faultspace
coordinates. Secondly, I speed up individual TAP-based FI experiments
by improving the fast-forwarding operation by several orders of magnitude
for most workloads. Finally, I dissect current practices in FI-based evaluation
of SIHFT-hardened programs, identify three widespread pitfalls in the
result interpretation, and advance the state of the art by defining a novel
comparison metric.2016-01-01T00:00:00ZTight integration of cache, path and task-interference modeling for the analysis of hard real time systemsKleinsorge, Jan C.http://hdl.handle.net/2003/343322015-11-16T07:32:34Z2015-01-01T00:00:00ZTitle: Tight integration of cache, path and task-interference modeling for the analysis of hard real time systems
Authors: Kleinsorge, Jan C.
Abstract: Traditional timing analysis for hard real-time systems is a two-step approach consisting of isolated per-task timing analysis and subsequent scheduling analysis which is conceptually entirely separated and is based only on execution time bounds of whole tasks. Today this model is outdated as it relies on technical assumptions that are not feasible on modern processor architectures any longer. The key limiting factor in this traditional model is the interfacing from micro-architectural analysis of individual tasks to scheduling analysis — in particular path analysis as the binding step between the two is a major obstacle. In this thesis, we contribute to traditional techniques that overcome this problem by means of by passing path analysis entirely, and propose a general path analysis and several derivatives to support improved interfacing. Specifically, we discuss, on the basis of a precise cache analysis, how existing metrics to bound cache-related preemption delay (CRPD) can be derived from cache representation without separate analyses, and suggest optimizations to further reduce analysis complexity and to increase accuracy. In addition, we propose two new estimation methods for CRPD based on the explicit elimination of infeasible task interference scenarios. The first one is conventional in that path analysis is ignored, the second one specifically relies on it. We formally define a general path analysis framework in accordance to the principles of program analysis — as opposed to most existing approaches that differ conceptually and therefore either increase complexity or entail inherent loss of information — and propose solutions for several problems specific to timing analysis in this context. First, we suggest new and efficient methods for loop identification. Based on this, we show how path analysis itself is applied to the traditional
problem of per-task worst-case execution time bounds, define its generalization to sub-tasks, discuss several optimizations and present an efficient reference algorithm. We further propose analyses to solve related problems in this domain, such as the estimation of bounds on best-case execution times, latest execution times, maximum blocking times and execution frequencies. Finally, we then demonstrate the utility of this additional information in scheduling analysis by proposing a new CRPD bound.2015-01-01T00:00:00ZFlexible error handling for embedded real time systemsHeinig, Andreashttp://hdl.handle.net/2003/340982015-08-13T01:43:42Z2015-01-01T00:00:00ZTitle: Flexible error handling for embedded real time systems
Authors: Heinig, Andreas
Abstract: Due to advancements of semiconductor fabrication that lead to shrinking geometries and lowered supply voltages of semiconductor devices, transient fault rates will increase significantly for future semiconductor generations [Int13]. To cope with transient faults, error detection and correction is mandatory. However, additional resources are required for their implementation. This is a serious problem in embedded systems development since embedded systems possess only a limited number of resources, like processing time, memory, and energy. To cope with this problem, a software-based flexible error handling approach is proposed in this dissertation. The goal of flexible error handling is to decide if, how, and when errors have to be corrected. By applying this approach, deadline misses will be reduced by up to 97% for the considered video decoding benchmark. Furthermore, it will be shown that the approach is able to cope with very high error rates of nearly 50 errors per second.2015-01-01T00:00:00ZCache-Kohärenz in hart echtzeitfähigen Mehrkern-ProzessorenPyka, Arthurhttp://hdl.handle.net/2003/340972015-08-12T20:19:31Z2015-01-01T00:00:00ZTitle: Cache-Kohärenz in hart echtzeitfähigen Mehrkern-Prozessoren
Authors: Pyka, Arthur
Abstract: Im Bereich der Echtzeitsysteme rücken Mehrkern-Prozessoren zunehmend in den Fokus. Dabei stellen Echtzeitsysteme besondere Anforderungen an die eingesetzte Systemarchitektur. Neben der logischen Korrektheit, ist in Echtzeitsystemen eine zeitlich vorhersagbare Ausführung entscheidend. Cache- Speicher spielen in dieser Hinsicht eine besondere Rolle. Zum einen sind sie notwendig, um schnelle Zugriffe auf Instruktionen und Daten zu gewährleisten, zum anderen beeinträchtigen sie die zeitliche Vorhersagbarkeit der Ausführung. Beim Zugriff auf gemeinsame Daten in Mehrkern-Prozessoren ist zudem der Einsatz eines Cache-Kohärenzverfahrens notwendig. Gängige Kohärenzverfahren können die Anforderungen an Performanz und Echtzeitfähigkeit nicht hinreichend erfüllen. Die in hardwarebasierten Kohärenzverfahren eingesetzten Kohärenzoperationen machen eine präzise WCET-Abschätzung undurchführbar. Der On-Demand Coherent Cache (ODC2) stellt ein Cache- Kohärenzverfahren dar, das im Hinblick auf den Einsatz in Echtzeitsystemen entwickelt wurde. Es verzichtet auf eine gegenseitige Beeinflussung von Cache- Speicher durch Kohärenzoperationen und erreicht dadurch eine hinreichende zeitliche Vorhersagbarkeit der Zugriffe auf gemeinsame Daten. Das Verfahren des ODC2 zielt auf eine möglichst effiziente Nutzung des Cache-Speichers hin. Im Vergleich zu gängigen, softwarebasierten Verfahren ermöglicht es eine signifikant höhere (Worst-Case) Performanz.2015-01-01T00:00:00ZWCET analysis and optimization for multi-core real-time systemsKelter, Timonhttp://hdl.handle.net/2003/339922015-08-13T01:42:19Z2015-01-01T00:00:00ZTitle: WCET analysis and optimization for multi-core real-time systems
Authors: Kelter, Timon2015-01-01T00:00:00ZAutomatic parallelization for embedded multi-core systems using high level cost modelsCordes, Daniel Alexanderhttp://hdl.handle.net/2003/317962015-08-12T23:15:05Z2013-12-20T00:00:00ZTitle: Automatic parallelization for embedded multi-core systems using high level cost models
Authors: Cordes, Daniel Alexander
Abstract: Nowadays, embedded and cyber-physical systems are utilized in nearly all operational areas in order to support and enrich peoples' everyday life. To cope with the demands imposed by modern embedded systems, the employment of MPSoC devices is often the most profitable solution. However, many embedded applications are still written in a sequential way. In order to benefit from the multiple cores available on those devices, the application code has to be divided into concurrently executed tasks. Since performing this partitioning manually is an error-prone and also time-consuming job, many automatic parallelization approaches were developed in the past. Most of these existing approaches were developed in the context of high-performance and desktop computers so that their applicability to embedded devices is limited. Many new challenges arise if applications should be ported to embedded MPSoCs in an efficient way. Therefore, novel parallelization techniques were developed in the context of this thesis that are tailored towards special requirements demanded by embedded multi-core devices.
All approaches presented in this thesis are based on sophisticated parallelization techniques employing high-level cost models to estimate the benefit of parallel execution. This enables the creation of well-balanced tasks, which is essential if applications should be parallelized efficiently. In addition, several other requirements of embedded devices are covered, like the consideration of multiple objectives simultaneously. As a result, beneficial trade-offs between several objectives, like, e.g., energy consumption and execution time can be found enabling the extraction of solutions which are highly optimized for a specific application scenario.
To be applicable to many embedded application domains, approaches extracting different kinds of parallelism were also developed. The structure of the global parallelization approach facilitates the combination of different approaches in a plug-and-play fashion. Thus, the advantages of multiple parallelization techniques can easily be combined. Finally, in addition to parallelization approaches for homogeneous MPSoCs, optimized ones for heterogeneous devices were also developed in this thesis since the trend towards heterogeneous multi-core architectures is inexorable.
To the best of the author's knowledge, most of these objectives and especially their combination were not covered by existing parallelization frameworks, so far. By combining all of them, a parallelization framework that is well optimized for embedded multi-core devices was developed in the context of this thesis.2013-12-20T00:00:00ZResource efficient processing and communication in sensor/actuator environmentsTimm, Constantinhttp://hdl.handle.net/2003/297312015-08-12T22:14:56Z2012-10-29T00:00:00ZTitle: Resource efficient processing and communication in sensor/actuator environments
Authors: Timm, Constantin
Abstract: The future of computer systems will not be dominated by personal computer like hardware platforms but by embedded and cyber-physical systems assisting humans in a hidden but omnipresent manner. These pervasive computing
devices can, for example, be utilized in the home automation sector to create sensor/
actuator networks supporting the inhabitants of a house in everyday life. The efficient usage of resources is an important topic at design time and operation time of mobile embedded and cyber-physical systems. Therefore, this thesis presents methods which allow an efficient use of energy and processing resources in sensor/actuator networks. These networks comprise different nodes cooperating for a “smart” joint control function. Sensor/actuator nodes are typical cyber-physical systems comprising sensors/actuators and processing and communication components. Processing components of today’s sensor nodes can comprise many-core chips. This thesis introduces new methods for optimizing the code and the application mapping of the aforementioned systems and presents novel results with regard to design space explorations for energy-efficient and embedded many-core systems. The considered many-core systems are graphics processing units. The application code for these graphics processing units is optimized for a particular platform variant with the objectives of minimal energy consumption and/or of minimal runtime. These two objectives are targeted with the utilization of multi-objective optimization techniques. The mapping optimizations are realized by means of multi-objective design space explorations. Furthermore, this thesis introduces new techniques and functions for a resource-efficient middleware design employing service-oriented architectures.
Therefore, a service-oriented architecture based middleware framework is presented which comprises a lightweight service orchestration. In addition to that,
a flexible resource management mechanism will be introduced. This resource management
adapts resource utilization and services to an environmental context and provides methods to reduce the energy consumption of sensor nodes.2012-10-29T00:00:00ZMemory-based optimization techniques for real-time systemsPlazar, Saschahttp://hdl.handle.net/2003/295002015-08-12T23:49:40Z2012-07-06T00:00:00ZTitle: Memory-based optimization techniques for real-time systems
Authors: Plazar, Sascha
Abstract: Embedded/Cyber-physical systems, have become popular in a wide range
of application scenarios. Such systems are called real-time systems
if they underlie strict timing constraints. To verify if such
systems can meet their deadlines, the knowledge of an upper bound
for a program's execution time is mandatory. This upper bound is
also called worst-case execution time (WCET) and is estimated by
static timing analyzers.
Established optimizing compilers are not aware of the WCET as
objective since they focus on the minimization of the average-case
execution time (ACET). To overcome this obstacle, this thesis
presents memory-based optimization techniques which focus on the
reduction of the WCET of programs. All presented optimizations are
integrated into the WCET-aware C Compiler (WCC) framework.
Since the memory interface of a system often turns out to be a
bottleneck which limits the performance of a system, the presented
optimizations are applied to different levels of the memory
hierarchy of a system. Starting within a CPU core, the instruction
fetch buffer is the most tightly coupled memory which tries to
provide the next few instructions to be executed. Optimization
techniques are presented improving the efficiency of this buffer
w.r.t. the WCET of a system. Instruction caches placed between the
CPU core and the main memory try to speed up accesses to the main
memory by storing local copies in fast small cache memories. In
order to improve the efficiency of this part of the memory
hierarchy, a memory content selection approach is introduced which
improves the WCET of a program by improving the cache performance.
Due to the fact that multi-task systems are employed in almost all
domains, this thesis presents elaborate extensions to a compiler
supporting the compilation and WCET-aware optimization of multi-task
systems. These extensions exploited to develop a number of novel
optimizations for systems running multiple tasks. As first
optimization, a WCET-driven software-based cache partitioning
demonstrates the effectiveness of considering the WCET for the
optimization of a set of tasks. Furthermore, many embedded systems
integrate so-called scratchpad memories (SPM) as tightly coupled
memories. An optimization approach for SPM allocation in a
multi-task scenario is proposed. Besides, a holistic view of memory
architecture compilation considers a number of memory-based WCET
optimizations and presents approaches for a combined application.
Existing compiler frameworks which are able to consider the WCET
during optimization are limited to a particular hardware platform.
In order to support multiple platforms, this thesis presents
techniques to extend an existing WCET-aware compiler framework.
Based on these extensions, a novel static cache locking optimization
selects memory blocks which are statically locked into the
instruction cache driven by WCET reductions.
Applying these optimizations, the WCET of real-time applications can
be reduced by about 35% to 48%. These results underline the need for
specialized WCET-driven optimization techniques integrated into a
sophisticated compiler framework. Otherwise, immense optimization
potential would remain unused resulting in oversized and thus costly
Embedded/Cyber-physical systems.2012-07-06T00:00:00ZVideobasierte Gestenerkennung in einer intelligenten UmgebungRicharz, Janhttp://hdl.handle.net/2003/292872015-08-12T18:06:22Z2012-01-19T00:00:00ZTitle: Videobasierte Gestenerkennung in einer intelligenten Umgebung
Authors: Richarz, Jan
Abstract: Diese Dissertation umfasst die Konzeption einer berührungslosen und nutzerunabhängigen visuellen Klassifikation von Armgesten anhand ihrer räumlich-zeitlichen Bewegungsmuster mit Methoden der Computer Vision, der Mustererkennung und des maschinellen Lernens. Das Anwendungsszenario ist hierbei ein intelligenter Konferenzraum, der mit mehreren handelsüblichen Kameras ausgerüstet ist. Dieses Szenario stellt aus drei Gründen eine besondere Herausforderung dar: Für eine möglichst intuitive Interaktion ist es erstens notwendig, die Erkennung unabhängig von der Position und Orientierung des Nutzers im Raum zu realisieren. Somit werden vereinfachende Annahmen bezüglich der relativen Positionen von Nutzer und Kamera weitgehend ausgeschlossen. Zweitens wird ein realistisches Innenraumszenario betrachtet, bei dem sich die Umgebungsbedingungen abrupt ändern können und sehr unterschiedliche Blickwinkel der Kameras auftreten. Das erfordert die Entwicklung adaptiver Methoden, die sich schnell an derartige Änderungen anpassen können bzw. in weiten Grenzen dagegen robust sind. Drittens stellt die Verwendung eines nicht synchronisierten Multikamerasystems eine Neuerung dar, die dazu führt, dass während der 3D-Rekonstruktion der Hypothesen aus verschiedenen Kamerabildern besonderes Augenmerk auf den Umgang mit dem auftretenden zeitlichen Versatz gelegt werden muss. Dies hat auch Folgen für die Klassifikationsaufgabe, weil in den rekonstruierten 3D-Trajektorien mit entsprechenden Ungenauigkeiten zu rechnen ist.
Ein wichtiges Kriterium für die Akzeptanz einer gestenbasierten Mensch-Maschine-Schnittstelle ist ihre Reaktivität. Daher wird bei der Konzeption besonderes Augenmerk auf die effiziente Umsetzbarkeit der gewählten Methoden gelegt. Insbesondere wird eine parallele Verarbeitungsstruktur realisiert, in der die verschiedenen Kameradatenströme getrennt verarbeitet und die Einzelergebnisse anschließend kombiniert werden. Im Rahmen der Dissertation wurde die komplette Bildverarbeitungspipeline prototypisch realisiert. Sie umfasst unter anderem die Schritte Personendetektion, Personentracking, Handdetektion, 3D-Rekonstruktion der Hypothesen und Klassifikation der räumlich-zeitlichen Gestentrajektorien mit semikontinuierlichen Hidden Markov Modellen (HMM). Die realisierten Methoden werden anhand realistischer, anspruchsvoller Datensätze ausführlich evaluiert. Dabei werden sowohl für die Personen- als auch für die Handdetektion sehr gute Ergebnisse erzielt. Die Gestenklassifikation erreicht Klassifikationsraten von annähernd 90% für neun verschiedene Gesten.2012-01-19T00:00:00ZSubword-based Stochastic Segment Modeling for Offline Arabic Handwriting RecognitionCao, HuaiguManohar, VasantNatarajan, PremPrasad, RohitSubramanian, Krishnahttp://hdl.handle.net/2003/275642015-08-13T00:00:58Z2011-01-12T00:00:00ZTitle: Subword-based Stochastic Segment Modeling for Offline Arabic Handwriting Recognition
Authors: Cao, Huaigu; Manohar, Vasant; Natarajan, Prem; Prasad, Rohit; Subramanian, Krishna
Abstract: In this paper, we describe several experiments in which we use a stochastic segment model (SSM) to improve offline handwriting recognition (OHR) performance. We use the SSM to re-rank (re-score) multiple decoder hypotheses. Then, a probabilistic multi-class SVM is trained to model stochastic segments obtained from force aligning transcriptions with the underlying image. We extract multiple features from the stochastic segments that are sensitive to larger context span to train the SVM. Our experiments show that using confidence scores from the trained SVM within the SSM framework can significantly improve OHR performance. We also show that OHR performance can be improved by using a combination of character-based and parts-of-Arabic-words (PAW)-based SSMs.2011-01-12T00:00:00ZArabic Handwritten Alphanumeric Character Recognition using Fuzzy Attributed Turning FunctionsMahmoud, SabriParvez, Mohammad Tanvirhttp://hdl.handle.net/2003/275632015-08-12T22:55:28Z2011-01-12T00:00:00ZTitle: Arabic Handwritten Alphanumeric Character Recognition using Fuzzy Attributed Turning Functions
Authors: Mahmoud, Sabri; Parvez, Mohammad Tanvir
Abstract: In this paper, we present a novel method for recognition of unconstrained handwritten Arabic alphanumeric characters. The algorithm binarizes the character image, smoothes it and extracts its contour. A novel approach for polygonal approximation of handwritten character contours is applied. The directions and length features are extracted from the polygonal approximation. These features are used to build character models in the training phase. For the recognition purpose, we introduce Fuzzy Attributed Turning Functions (FATF) and define a dissimilarity measure based on FATF for comparing polygonal shapes. Experimental results demonstrate the effectiveness of our algorithm for recognition of handwritten Arabic characters. We have obtained around 98% accuracy for Arabic handwritten characters and more than 97% accuracy for handwritten Arabic numerals.2011-01-12T00:00:00ZArabic Handwriting SynthesisAl-Muhtaseb, HusniElarian, YousefGhouti, Lahouarihttp://hdl.handle.net/2003/275622015-08-12T23:56:57Z2011-01-12T00:00:00ZTitle: Arabic Handwriting Synthesis
Authors: Al-Muhtaseb, Husni; Elarian, Yousef; Ghouti, Lahouari
Abstract: Training and testing data for optical character recognition are cumbersome to obtain. If large amounts of data can be produced from small amounts, much time and effort can be saved. This paper presents an approach to synthesize Arabic handwriting. We segment word images into labeled characters and then use these in synthesizing arbitrary words. The synthesized text should look natural; hence, we define some criteria to decide on what is acceptable as natural-looking.
The text that is synthesized by using the natural-looking constrain is compared to text that is synthesized without using the natural-looking constrain for evaluation.2011-01-12T00:00:00ZA Lexicon of Connected Components for Arabic Optical Text RecognitionElarian, YousefIdris, Fayezhttp://hdl.handle.net/2003/275612015-08-13T02:28:30Z2011-01-12T00:00:00ZTitle: A Lexicon of Connected Components for Arabic Optical Text Recognition
Authors: Elarian, Yousef; Idris, Fayez
Abstract: Arabic is a cursive script that lacks the ease of character segmentation. Hence, we suggest a unit that is discrete in nature, viz. the connected component, for Arabic text recognition. A lexicon listing valid Arabic connected components is necessary to any system that is to use such unit. Here, we produce and analyze a comprehensive lexicon of connected components.
A lexicon can be extracted from corpora or synthesized from morphemes. We follow both approaches and merge their results. Besides, generation of a lexicon of connected components encompasses extra tokenization and point-normalization steps to make the size of the lexicon tractable. We produce a lexicon of surface-words, reduce it into a lexicon of connected components, and finally into a lexicon of point normalized connected components. The lexicon of point normalized connected components contains 684,743 entries, showing a percent decrease of 97.17% from the word-lexicon.2011-01-12T00:00:00ZWriter Identification of Arabic Handwritten DigitsAwaida, SamehMahmoud, Sabrihttp://hdl.handle.net/2003/275602015-08-12T16:50:13Z2011-01-12T00:00:00ZTitle: Writer Identification of Arabic Handwritten Digits
Authors: Awaida, Sameh; Mahmoud, Sabri
Abstract: This paper addresses the identification of Arabic handwritten digits. In addition to digit identifiability, the paper presents digit recognition. The digit image is divided into grids based on the distribution of the black pixels in the image. Several types of features are extracted (viz. gradient, curvature, density, horizontal and vertical run lengths, stroke, and concavity features) from the grid segments. K-Nearest Neighbor and Nearest Mean classifiers are used. A database of 70000 of Arabic handwritten digit samples written by 700 writers is used in the analysis and experimentations.
The identifiability of isolated and combined digits are tested. The analysis of the results indicates that Arabic digits 3 (٣), 4 (٤), 8 (٨), and 9 (٩) are more identifiable than other digits while Arabic digit 0 (٠) and 1 (١) are the least identifiable. In addition, the paper shows that combining the writer’s digits increases the discriminability power of Arabic handwritten digits. Combining the features of all digits, K-NN provided the best accuracy in text-independent writer identification with top-1 result of 88.14%, top-5 result of 94.81%, and top-10 results of 96.48%.2011-01-12T00:00:00ZA new System for offline Printed Arabic Recognition for Large Vocabulary : SPARLVDhouib, Mariem MilediKanoun, Slimhttp://hdl.handle.net/2003/275592015-08-13T00:52:15Z2011-01-12T00:00:00ZTitle: A new System for offline Printed Arabic Recognition for Large Vocabulary : SPARLV
Authors: Dhouib, Mariem Miledi; Kanoun, Slim
Abstract: This paper presents a contribution for the
Arabic printed recognition. In fact, we are
interested in the printed decomposable Arabic
word recognition. The proposed system uses the
analytical approach through the segmentation into
characters to succeed to a generation of letter
hypotheses as well as word hypotheses using a
lexical verification in a pre-established dictionary
of the language. Our proposed system SPARLV is
able to put valid hypotheses of words thanks to the
lexical verification.2011-01-12T00:00:00ZTowards Feature Learning for HMM-based Offline Handwriting RecognitionFink, Gernot A.Hammerla, Nils Y.Plötz, ThomasVajda, Szilárdhttp://hdl.handle.net/2003/275562015-08-12T20:35:41Z2011-01-12T00:00:00ZTitle: Towards Feature Learning for HMM-based Offline Handwriting Recognition
Authors: Fink, Gernot A.; Hammerla, Nils Y.; Plötz, Thomas; Vajda, Szilárd
Abstract: Statistical modelling techniques for automatic reading systems substantially rely on the availability of compact and meaningful feature representations. State-of-the-art feature extraction for offline handwriting recognition is usually based on heuristic approaches that describe either basic geometric properties or statistical distributions of raw pixel values. Working well on average, still fundamental insights into the nature of handwriting are desired. In this paper we present a novel approach for the automatic extraction of appearance-based representations of offline handwriting data. Given the framework of deep belief networks -- Restricted Boltzmann Machines -- a two-stage method for feature learning and optimization is developed. Given two standard corpora of both Arabic and Roman handwriting data it is demonstrated across script boundaries, that automatically learned features achieve recognition results comparable to state-of-the-art handcrafted features. Given these promising results the potential of feature learning for future reading systems is discussed.2011-01-12T00:00:00ZAdvanced ensemble methods for automatic classification of 1H-NMR spectraLienemann, Kaihttp://hdl.handle.net/2003/273212017-06-03T18:12:27Z2010-08-03T00:00:00ZTitle: Advanced ensemble methods for automatic classification of 1H-NMR spectra
Authors: Lienemann, Kai2010-08-03T00:00:00ZMikroarchitektur-Synthese mit genetischen AlgorithmenLorenz, Markushttp://hdl.handle.net/2003/214532015-08-12T19:02:29Z2005-06-01T00:00:00ZTitle: Mikroarchitektur-Synthese mit genetischen Algorithmen
Authors: Lorenz, Markus2005-06-01T00:00:00ZHardware-Partitionierung für Prototypen-BoardsFalk, Heikohttp://hdl.handle.net/2003/214522015-08-12T19:02:26Z2005-06-01T00:00:00ZTitle: Hardware-Partitionierung für Prototypen-Boards
Authors: Falk, Heiko2005-06-01T00:00:00ZCodeerzeugung für den digitalen Signalprozessor TI TMS320X5xBarschdorf, Thomashttp://hdl.handle.net/2003/214512015-08-12T19:02:24Z2005-06-01T00:00:00ZTitle: Codeerzeugung für den digitalen Signalprozessor TI TMS320X5x
Authors: Barschdorf, Thomas2005-06-01T00:00:00ZVergleich von CLP und ILP basierten Optimierungsstrategien am Beispiel der Codegenerierung für DSPsMenne, Torstenhttp://hdl.handle.net/2003/214502015-08-12T19:02:22Z2005-06-01T00:00:00ZTitle: Vergleich von CLP und ILP basierten Optimierungsstrategien am Beispiel der Codegenerierung für DSPs
Authors: Menne, Torsten2005-06-01T00:00:00ZAnalysen und Methoden optimierender Compiler zur Steigerung der Effizienz von Speicherzugriffen in eingebetteten SystemenFranke, Bjoernhttp://hdl.handle.net/2003/214492015-08-12T23:53:06Z2005-06-01T00:00:00ZTitle: Analysen und Methoden optimierender Compiler zur Steigerung der Effizienz von Speicherzugriffen in eingebetteten Systemen
Authors: Franke, Bjoern2005-06-01T00:00:00ZEntwurf und Realisierung eines skalierbaren FPGA-PrototypenboardsRave, Stefanhttp://hdl.handle.net/2003/214482015-08-13T00:43:46Z2005-06-01T00:00:00ZTitle: Entwurf und Realisierung eines skalierbaren FPGA-Prototypenboards
Authors: Rave, Stefan2005-06-01T00:00:00ZEnergiemessung von ARM7TDMI Prozessor-InstruktionTheokharidis, Michaelhttp://hdl.handle.net/2003/214472015-08-12T23:53:08Z2005-06-01T00:00:00ZTitle: Energiemessung von ARM7TDMI Prozessor-Instruktion
Authors: Theokharidis, Michael2005-06-01T00:00:00ZReduktion des Energiebedarfs von Programmen für den ARM-Prozessor durch RegisterpipeliningSchwarz, Rüdigerhttp://hdl.handle.net/2003/214462015-08-12T23:53:01Z2005-06-01T00:00:00ZTitle: Reduktion des Energiebedarfs von Programmen für den ARM-Prozessor durch Registerpipelining
Authors: Schwarz, Rüdiger2005-06-01T00:00:00ZAdresszuweisung für den M3-DSPKottmann, Davidhttp://hdl.handle.net/2003/214452015-08-12T23:53:03Z2005-06-01T00:00:00ZTitle: Adresszuweisung für den M3-DSP
Authors: Kottmann, David2005-06-01T00:00:00ZÜbersetzung und Optimierung objektorientierter Programmiersprachen unter besonderer Berücksichtigung eingebetteter SystemeJagla, Frankhttp://hdl.handle.net/2003/214442015-08-12T23:52:55Z2005-06-01T00:00:00ZTitle: Übersetzung und Optimierung objektorientierter Programmiersprachen unter besonderer Berücksichtigung eingebetteter Systeme
Authors: Jagla, Frank2005-06-01T00:00:00ZSpeicherpartitionierung in DSP-CompilernKotte, Danielhttp://hdl.handle.net/2003/214432015-08-12T23:52:59Z2005-06-01T00:00:00ZTitle: Speicherpartitionierung in DSP-Compilern
Authors: Kotte, Daniel2005-06-01T00:00:00ZMessung des Energieverbrauchs von Caches am Beispiel de StrongARM-ProzessorsSapsford, Gregoryhttp://hdl.handle.net/2003/214422015-08-12T23:52:57Z2005-06-01T00:00:00ZTitle: Messung des Energieverbrauchs von Caches am Beispiel de StrongARM-Prozessors
Authors: Sapsford, Gregory2005-06-01T00:00:00ZCodierungsverfahren zur Reduktion des Energiebedarfs von ProgrammenKnauer, Markushttp://hdl.handle.net/2003/214412015-08-12T23:51:29Z2005-06-01T00:00:00ZTitle: Codierungsverfahren zur Reduktion des Energiebedarfs von Programmen
Authors: Knauer, Markus2005-06-01T00:00:00ZEnergieeinsparung durch compilergesteuerte Nutzung des On-Chip-SpeichersZobiegala, Christophhttp://hdl.handle.net/2003/214402015-08-12T23:52:53Z2005-06-01T00:00:00ZTitle: Energieeinsparung durch compilergesteuerte Nutzung des On-Chip-Speichers
Authors: Zobiegala, Christoph2005-06-01T00:00:00ZVergleich des Energieverbrauchs von Cache- und Scratch-Pad-Speichern für den ARM7-ProzessorLee, Bo-Sikhttp://hdl.handle.net/2003/214392015-08-12T23:52:51Z2005-06-01T00:00:00ZTitle: Vergleich des Energieverbrauchs von Cache- und Scratch-Pad-Speichern für den ARM7-Prozessor
Authors: Lee, Bo-Sik2005-06-01T00:00:00ZGenerische Low-Level Optimierungen für RISC-ArchitekturenHornbach, Larshttp://hdl.handle.net/2003/214382015-08-12T23:52:49Z2005-06-01T00:00:00ZTitle: Generische Low-Level Optimierungen für RISC-Architekturen
Authors: Hornbach, Lars2005-06-01T00:00:00ZXML-basierte generische Zwischendarstellung für CompilerFiesel, Markushttp://hdl.handle.net/2003/214362015-08-12T23:52:47Z2005-06-01T00:00:00ZTitle: XML-basierte generische Zwischendarstellung für Compiler
Authors: Fiesel, Markus2005-06-01T00:00:00ZArchitekturunabhängige Quellcodeoptimierung durch MustererkennungJakubowski, Jacekhttp://hdl.handle.net/2003/214352015-08-12T23:52:05Z2005-06-01T00:00:00ZTitle: Architekturunabhängige Quellcodeoptimierung durch Mustererkennung
Authors: Jakubowski, Jacek2005-06-01T00:00:00ZEnergieminimierung eingebetteter Programme durch die dynamische Nutzung eines Scratchpad-SpeichersGrundwald, Nilshttp://hdl.handle.net/2003/214342015-08-12T19:02:19Z2005-06-01T00:00:00ZTitle: Energieminimierung eingebetteter Programme durch die dynamische Nutzung eines Scratchpad-Speichers
Authors: Grundwald, Nils2005-06-01T00:00:00ZCodegrößenreduktion eingebetteter Systeme durch kombiniertes In- und ExliningImhoff, Peterhttp://hdl.handle.net/2003/214332015-08-13T00:42:06Z2005-06-01T00:00:00ZTitle: Codegrößenreduktion eingebetteter Systeme durch kombiniertes In- und Exlining
Authors: Imhoff, Peter2005-06-01T00:00:00ZEntwicklung eines generischen Codegenerators für RISC-ArchitekturenKamphausen, Jörghttp://hdl.handle.net/2003/214322015-08-12T23:52:44Z2005-06-01T00:00:00ZTitle: Entwicklung eines generischen Codegenerators für RISC-Architekturen
Authors: Kamphausen, Jörg2005-06-01T00:00:00ZCompilergestützte Optimierung von Zugriffen auf partitionierte SpeicherHelmig, Urshttp://hdl.handle.net/2003/214312015-08-12T23:52:42Z2005-06-01T00:00:00ZTitle: Compilergestützte Optimierung von Zugriffen auf partitionierte Speicher
Authors: Helmig, Urs2005-06-01T00:00:00ZPlattformabhängige Eliminierung gemeinsamer Teilausdrücke auf Quellcode-EbeneVogt, Michaelhttp://hdl.handle.net/2003/214302015-08-12T23:52:40Z2005-06-01T00:00:00ZTitle: Plattformabhängige Eliminierung gemeinsamer Teilausdrücke auf Quellcode-Ebene
Authors: Vogt, Michael2005-06-01T00:00:00ZCompilergestützte Energiereduktion von SDRAM- und Flash-basierten SpeichertechnologienKernchen, Andréhttp://hdl.handle.net/2003/214292015-08-12T23:52:37Z2005-06-01T00:00:00ZTitle: Compilergestützte Energiereduktion von SDRAM- und Flash-basierten Speichertechnologien
Authors: Kernchen, André2005-06-01T00:00:00ZDidaktik der Informatik - Teil 1 (Sommersemester 2004)Humbert, Ludgerhttp://hdl.handle.net/2003/213462015-08-12T23:44:03Z2004-07-28T00:00:00ZTitle: Didaktik der Informatik - Teil 1 (Sommersemester 2004)
Authors: Humbert, Ludger2004-07-28T00:00:00ZHumbert, Ludger: Didaktik der Informatik -Teil 2 (Wintersemester 2003/2004)Humbert, Ludgerhttp://hdl.handle.net/2003/213452021-04-12T14:08:20Z2004-02-18T00:00:00ZTitle: Humbert, Ludger: Didaktik der Informatik -Teil 2 (Wintersemester 2003/2004)
Authors: Humbert, Ludger2004-02-18T00:00:00ZDidaktik der Informatik für die Sekundarstufe IHumbert, Ludgerhttp://hdl.handle.net/2003/213442021-04-12T14:07:17Z2004-02-18T00:00:00ZTitle: Didaktik der Informatik für die Sekundarstufe I
Authors: Humbert, Ludger2004-02-18T00:00:00ZDidaktik der Informatik - Teil 1Humbert, Ludgerhttp://hdl.handle.net/2003/213432021-04-12T14:06:00Z2003-11-06T00:00:00ZTitle: Didaktik der Informatik - Teil 1
Authors: Humbert, Ludger2003-11-06T00:00:00ZIntroduction to embedded systemsMarwedel, Peterhttp://hdl.handle.net/2003/203642021-04-12T14:01:10Z2005-04-25T00:00:00ZTitle: Introduction to embedded systems
Authors: Marwedel, Peter2005-04-25T00:00:00ZProzessrechnertechnikMarwedel, Peterhttp://hdl.handle.net/2003/203632015-08-13T02:18:41Z1999-10-14T00:00:00ZTitle: Prozessrechnertechnik
Authors: Marwedel, Peter1999-10-14T00:00:00ZRechnerarchitekturMarwedel, Peterhttp://hdl.handle.net/2003/203622015-08-13T02:18:38Z1999-10-14T00:00:00ZTitle: Rechnerarchitektur
Authors: Marwedel, Peter1999-10-14T00:00:00ZRechnergestützter Entwurf / Produktion (MikroelektronikLeupers, Rainerhttp://hdl.handle.net/2003/203612021-04-12T14:00:00Z1999-10-13T00:00:00ZTitle: Rechnergestützter Entwurf / Produktion (Mikroelektronik
Authors: Leupers, Rainer1999-10-13T00:00:00ZBegleitmaterial zur Vorlesung Einführung in die Didaktik der InformatikSchubert, Sigridhttp://hdl.handle.net/2003/27712015-08-12T19:07:12Z1999-10-14T00:00:00ZTitle: Begleitmaterial zur Vorlesung Einführung in die Didaktik der Informatik; Einführung in die Didaktik der Informati
Authors: Schubert, Sigrid1999-10-14T00:00:00ZPerformance- und energieeffiziente Compilierung für digitale SIMD-Signalprozessoren mittels genetischer AlgorithmenLorenz, Markushttp://hdl.handle.net/2003/27702015-08-13T00:05:50Z2003-06-03T00:00:00ZTitle: Performance- und energieeffiziente Compilierung für digitale SIMD-Signalprozessoren mittels genetischer Algorithmen
Authors: Lorenz, Markus
Abstract: In den letzten Jahren war ein ständig zunehmender Einsatz von eingebetteten Systemen in vielen Produkten unseres täglichen Lebens zu verzeichnen. Häufig sind an diese Systeme spezielle Anforderungen bezüglich einer Realzeitfähigkeit, einer geringen Größe und auch zunehmend eines geringen Energiebedarfs gebunden. Um diesen Anforderungen zu genügen und dennoch ein hohes Maß an Flexibilität beim Systementwurf beizubehalten, werden anstelle von anwendungsspezifischer Hardware häufig digitale Signalprozessoren (DSPs) zur Datenverarbeitung eingesetzt. Mit diesen wird auch bei Spezifikationsänderungen in späten Entwicklungsphasen i.d.R. keine kosten- und zeitintensive Neuentwicklung der verwendeten Hardware erforderlich. Leider stellt die manuelle Überführung eines Anwendungsprogramms in Assemblercode des Zielprozessors eine äußerst zeitaufwändige und fehlerträchtige Aufgabe dar. Aus diesem Grund werden Compiler benötigt, die in der Lage sind, eine gegebene Anwendung in effizienten Assemblercode zu überführen. Im Vergleich zu General-Purpose Prozessoren (GPPs) weisen DSPs jedoch spezielle Architekturmerkmale auf, die von herkömmlichen Compilertechniken nur unzureichend oder gar nicht ausgenutzt werden. Das Ziel dieser Arbeit besteht in der Entwicklung neuer Compilertechniken für DSPs, um die durch Compiler generierte Codequalität insbesondere hinsichtlich der Ausführungszeit und des Energiebedarfs zu verbessern. Um eine Wiederverwendung der entwickelten Techniken in anderen Compilern zu ermöglichen, setzen diese auf der ebenfalls in dieser Arbeit beschriebenen neuen Zwischendarstellung GeLIR (Generic Low-Level Intermediate Representation) auf. Als Schwerpunkt dieser Arbeit wird ein Codegenerator vorgestellt, der in der Lage ist, eine graphbasierte Codeselektion durchzuführen und zusätzlich die Phasen der Codeselektion, Instruktionsanordnung (einschließlich Kompaktierung) und Registerallokation im Sinne einer Phasenkopplung simultan löst. Da dies die Lösung eines NP-harten Optimierungsproblems darstellt, ist dem Codegenerator ein Optimierungsverfahren auf Basis eines genetischen Algorithmus zugrunde gelegt. Zusätzlich werden bei der Durchführung der Teilaufgaben Codeselektion, Instruktionsauswahl und Registerallokation bereits Wechselwirkungen mit der nachfolgend durchgeführten Adresscode-Generierung berücksichtigt. Aufgrund der flexiblen Spezifikationsmöglichkeit von Kostenfunktionen in genetischen Optimierungsverfahren ist der Codegenerator unter Verwendung eines Energiekostenmodells in der Lage, eine energieeffiziente Auswahl und Anordnung von Instruktionen durchzuführen. Als weiterer Schwerpunkt werden Optimierungsverfahren zur effektiven Ausnutzung der parallelen Datenpfade und von SIMD-Speicherzugriffen vorgestellt. Mit der Integration des Energiekostenmodells in den Codegenerator und den Simulator wird dabei mit dieser Arbeit erstmalig das Potential von SIMD-Operationen hinsichtlich der energieeffizienten Ausführung von DSP-Programmen compilerunterstützt untersucht. Durch die beispielhafte Implementierung der Techniken für eine DSP-Architektur und die Retargierung des genetischen Codegenerators auf einen weiteren DSP wird die Anwendbarkeit für reale Prozessoren gezeigt.2003-06-03T00:00:00ZUntersuchung des Energieeinsparungspotenzials in eingebetteten Systemen durch energieoptimierende CompilertechnikSteinke, Stefanhttp://hdl.handle.net/2003/27692015-08-13T00:01:13Z2003-01-27T00:00:00ZTitle: Untersuchung des Energieeinsparungspotenzials in eingebetteten Systemen durch energieoptimierende Compilertechnik
Authors: Steinke, Stefan
Abstract: In der Arbeitswelt und in der Freizeit hat die Nutzung von mobilen elektronischen Geräten wie Handys oder PDAs in den letzten Jahren stark zugenommen. Die Funktionen dieser Geräte nehmen sowohl in der Anzahl als auch in der Komplexität weiter zu, wodurch die Kapazitätsgrenze der Akkus häufiger erreicht wird. Dies schränkt die Anwender ein und führt zu der Motivation, den Energieverbrauch zu reduzieren. Außerdem sind andere neue mobile Applikationen zukünftig nur realisierbar, nachdem der Energieverbrauch vorab weiter reduziert wurde. Neben der bekannten Optimierung der Hardware der Geräte auf Energieverbrauch liefert der steigende Anteil der Software ein neues Potenzial zur Energieeinsparung. Das Ziel dieser Arbeit ist die systematische Untersuchung dieses Energieeinsparungspotenzials bei der Ausführung der Applikationssoftware, welches durch modifizierte oder neue Compilertechniken erreicht werden kann.Zu Beginn der Arbeit werden die Grundlagen des Energieverbrauchs untersucht und daraus Ansatzpunkte für die Energiereduzierung durch Software entwickelt. Innerhalb des betrachteten Entwurfsablaufs eingebetteter Systeme liefert die Phase der SW-Synthese die Möglichkeit, Einfluss auf den generierten Maschinencode zu nehmen. Im Compiler liegen ausreichende Informationen zur Abschätzung des späteren Energiebedarfs vor, wenn ein entsprechendes Energiemodell integriert wird. Das in dieser Arbeit neu vorgestellte Energiemodell berücksichtigt die Unterschiede im Energieverbrauch in Abhängigkeit von den ausgeführten Instruktionen, ihren verwendeten Funktionseinheiten, den Zugriffen auf verschiedene Speicher sowie den Bitmustern der über Busse transportierten Daten. Diese Eigenschaften sind eine notwendige Voraussetzung zur umfassenden Untersuchung des Potenzials bei der Codegenerierung.Die verschiedenen Bestandteile und Phasen eines Compilers werden auf der Basis dieses Energiemodells systematisch betrachtet und auf ihr Einsparungspotenzial und die mögliche Integration des Optimierungsziels des Energieverbrauchs hin untersucht. Die Phasen im Front-End des Compilers bieten wenig Ansatzpunkte, da noch kein Bezug zu den Maschineninstruktionen und dem jeweiligen Energieverbrauch hergestellt werden kann. Den Schwerpunkt bilden somit die Phasen im Back-End mit der Instruktionsauswahl, der Instruktionsanordnung, der Registerallokation und den maschinenabhängigen Optimierungen.Im Detail werden die Phasen und Optimierungen betrachtet, in denen der Energieverbrauch einen Einfluss auf die Verarbeitung hat und die energiesparenden Optimierungen ausführlich beschrieben, die den größten Effekt aufzeigen. Insbesondere die Zugriffe auf den Speicher weisen einen hohen Anteil am Gesamtenergieverbrauch auf, so dass sich hieraus ein großes Potenzial ergibt. Daher bilden Optimierungen zur effizienteren Nutzung des Speichers den Schwerpunkt der Untersuchungen. Neben der Anwendung bekannter Optimierungen zur effizienteren Nutzung der Prozessorregister werden neue Optimierungen vorgestellt, die eine effiziente Nutzung kleiner, frei adressierbarer Onchip-Speicher unterstützen. Die bisher eingesetzten Caches beinhalten eine Hardwaresteuerung zum Einlagern von häufig verwendeten Programmteilen und Daten. Dieser Mechanismus kann die Programmausführung nennenswert beschleunigen, verbraucht aber in der zusätzlichen Hardware relativ viel Energie für häufige Adressvergleiche. Die Einbeziehung der während des Compilerlaufs vorliegenden Informationen bei der Entscheidung für die Programmteile und Daten, die in den Onchip-Speicher verlagert werden, bietet ein hohes Energieeinsparungspotenzial. Das dafür notwendige Verfahren wird sowohl als statische Variante mit einer festen Zuordnung von Programmteilen und Daten zum Hauptspeicher und Onchip-Speicher beschrieben als auch in einer erweiterten Variante mit integriertem Umkopieren der Blöcke während des Programmablaufs.Als Abschluss der Arbeit wird untersucht, wie alternative Codierungen auf Bussen zur Reduzierung des Energieverbrauchs genutzt werden können.Insgesamt konnte mit dieser Arbeit das Energieeinsparungspotenzial durch einen Compiler in seinen jeweiligen Phasen aufgezeigt werden, sowie neue Techniken, die die Speicherzugriffe effizienter generieren, vorgestellt werden. Der Energieverbrauch einer Applikation lässt sich dadurch in den betrachteten Fallbeispielen um ca. 50% gegenüber heute eingesetzten Systemen reduzieren.2003-01-27T00:00:00ZConstraintbasierte Codegenerierung für eingebettete ProzessorenBashford, Stevenhttp://hdl.handle.net/2003/27682015-08-13T02:18:03Z2001-10-25T00:00:00ZTitle: Constraintbasierte Codegenerierung für eingebettete Prozessoren
Authors: Bashford, Steven
Abstract: Eingebettete Systeme gewinnen zunehmenden Einfluss in vielen Bereichen unseres all-täglichen Lebens, wie z.B. in der Telekommunikation, Fahrzeugelektronik, Medizin-technik und in der Unterhaltungselektronik. Diese Systeme unterliegen strengen Rand-bedingungen, wie z.B. Realzeitanforderungen und Energieverbrauch. Beim Entwurf eingebetteter Systeme spielt die Realisierung möglichst vieler Systemkomponenten durch sogenannte eingebettete Prozessoren eine große Rolle. Diese können für eine Vielzahl von Systemen wiederverwendet werden, wodurch ein teurer und äußerst zeitaufwändiger Entwurfs-, Test- und Fertigungsprozess von dedizierter Hardware entfallen kann. Die Entwicklung von Software ermöglicht wesentlich schnellere De-signprozesse, und man gewinnt weiterhin ein hohes Maß an Flexibilität, da Designfehler noch in einer späten Entwurfsphase behoben werden können.Bei der Entwicklung von Software besteht natürlich der Wunsch, moderne Hochsprachen einzusetzen. Das Problem, das hier auftritt, ist ein Mangel an guten Compilern, besonders im Bereich der digitalen Signalprozessoren. Traditionelle Compilertechniken sind nicht geeignet, die spezifischen Eigenschaften dieser Prozessoren effektiv auszunutzen. Die erforderliche Qualität des generierten Codes genügt bei weitem nicht den gestellten Randbedingungen der Systeme. Um trotzdem den Einsatz von Prozessoren zu ermöglichen, wird in der Entwicklung von Software häufig auf die Assemblerprogrammierung zurückgegriffen, was zu großen Nachteilen führt: Entwicklungszeiten und Phasen zum Testen und zur Fehlerkorrektur verlängern sich i.d.R. wesentlich, und die Wiederverwendung von Software ist bei einem Prozessorwechsel kaum noch möglich.Ziel dieser Arbeit ist die Entwicklung neuer Compilertechniken für eingebettet Prozessoren, wobei der Schwerpunkt auf digitalen Signalprozessoren liegt, die hochgradig irreguläre Datenpfade mit eingeschränkter Parallelausführung auf Instruktionsebene besitzen. Ziel ist die Generierung einer sehr hohen Codequalität bzgl. der Ausführungsgeschwindigkeit und der Codegröße. Bei der Entwicklung neuer Techniken wird verstärkt auf die Integration von Teilphasen der Codegenerierung, im Sinne einer Phasenkopplung, hingezielt. Weiterhin spielt die Einbeziehung graphbasierter Techniken zur Instruktionsauswahl eine bedeutende Rolle. Um diesen Anforderungen gerecht zu werden, ist zur handhabbaren Umsetzung entsprechender Compiler-techniken der Einsatz neuer Programmierungs- und Optimierungsmethoden unbedingt notwendig. In dieser Arbeit werden Techniken auf der Basis der Constraint-Logikprogrammierung (CLP) entworfen und realisiert. Es wird gezeigt, in welchem Maße der Einsatz von CLP in diesem Problembereich geeignet ist. Ein weiteres Ziel ist der Entwurf von Konzepten, die eine schnelle Adaption von Compilern an neue Prozessoren erlauben.2001-10-25T00:00:00ZSystem level modeling and design with the SpecC languageDömer, Rainerhttp://hdl.handle.net/2003/27672015-08-13T00:51:00Z2000-04-11T00:00:00ZTitle: System level modeling and design with the SpecC language
Authors: Dömer, Rainer
Abstract: The semiconductor roadmap estimates the design complexity for digital systems to continue to increase according to Moore's law. In the next years, embedded systems with 10ths of millions of transistors on one chip will be standard technology. SystemonChip (SOC) designs will integrate processor cores, memories and special purpose custom logic into a complete system fitting on a single die. However, the increased complexity of SOC designs requires more effort, more efficient tools and new methodologies. Increasing the design time is not an option due to market pressures.
Systemlevel design reduces the complexity of the design models by raising the level of abstraction. Starting from an abstract specification model, the system is stepwise refined with the help of computeraided design (CAD) tools. Using codesign techniques, the system is partitioned into hardware and software parts and finally implemented on a target architecture. Established design methodologies for behavioral synthesis and standard software design are utilized. However, moving to higher abstraction levels is not sufficient.
The key to cope with the complexity involved with SOC designs is the reuse of Intellectual Property (IP). The integration of complex components, which are predesigned and welltested, drastically reduces the design complexity and, thus, saves design time and allows a shorter timetomarket. Since the idea of IP reuse promises great benefits, it must become an integral part in the system design methodology. Furthermore, the use of IP components must be directly supported by the design models, the tools and the languages being used throughout the design process. For example, it must be easy to insert and replace IP components in the design model (``plugandplay'').
This work addresses the main issues in SOC design, namely the system design methodology, systemlevel modeling, and the specification language. First, an IPcentric system design methodology is proposed which is based on the reuse of IP. It allows the reuse and integration of IP components at any level and at any time during the design process. Starting with an abstract executable specification of the system, architecture exploration and communication synthesis are performed in order to map the design model onto the target architecture. At any stage, the systems functionality and its characteristics can be evaluated and validated.
The model being used in the methodology to represent the system must meet system design requirements. It must be suitable to represent abstract properties at early stages as well as specific details about design decisions later in the design process. In order to support IP, the model must clearly separate communication from computation. In this work, a hierarchical model is described which encapsu lates computation and communication in separate entities, namely behaviors and channels. This model naturally supports reuse, integration and protection of IP. In order to formally describe a design model, a language should be used which directly represents the properties and characteristics of the model. This work presents a newly developed language, called SpecC, which allows to map modeling concepts onto language constructs in a one to one fashion. Unlike other systemlevel languages, the SpecC language precisely covers the unique requirements for embedded systems design in an orthogonal manner. Built on top of the C language, the defacto standard for software development, SpecC supports additional concepts needed in hardware design and allows IPcentric modeling. Recently, the SpecC language has been proposed as a standard systemlevel language for adoption in industry by some of Japan's toptier electronics and semiconductor companies.
The proposed methodology and the SpecC language have been implemented in the SpecC design environment. In a graphical framework, the SpecC design environment integrates a set of CAD tools which support systemlevel modeling, design validation, design space exploration, and (semi) automatic refinement. The framework and all tools rely on a powerful, central design representation, the SpecC Internal Representation (SIR).
Using the SpecC design environment, the IPcentric methodology has been successfully applied to several designs of industrial size, including a GSM vocoder used in mobile telecommunication.2000-04-11T00:00:00ZNovel Code Optimization Techniques for DSPsLeupers, Rainerhttp://hdl.handle.net/2003/27652015-08-12T19:07:02Z1998-07-02T00:00:00ZTitle: Novel Code Optimization Techniques for DSPs
Authors: Leupers, Rainer
Abstract: Software development for DSPs is frequently a bottleneck in the system design process, due to the poor code quality delivered by many current C compilers. As a consequence, most of the DSP software still has to be written manually in assembly language. In order to overcome this problem, new DSP-specific code optimization techniques are required, which, in contrast to classical compiler technology, take the detailed processor architecture sufficiently into account. This paper describes several new DSP code optimization techniques: maximum utilization of parallel address generation units, exploitation of instruction-level parallelism through exact code compaction, and optimized code generation for IF-statements by means of conditional instructions. Experimental results indicate significant improvements in code quality as compared to existing compilers.1998-07-02T00:00:00ZRetargierbare Codeerzeugung für digitale SignalprozessorenLeupers, Rainerhttp://hdl.handle.net/2003/27662016-02-02T14:13:39Z1998-07-02T00:00:00ZTitle: Retargierbare Codeerzeugung für digitale Signalprozessoren
Authors: Leupers, Rainer
Abstract: Digitale Signalprozessoren (DSPs) sind programmierbare Bausteine mit speziellen, für rechenintensive Anwendungen optimierten Befehlssätzen, welche vor allem zur Signalverarbeitung unter Echtzeitbedingungen eingesetzt werden. Aufgrund fehlender DSP-spezifischer Optimierungstechniken erzeugen derzeitige Hochsprachen-Compiler für DSPs meist sehr schlechten Code, so daß der Großteil der DSP-Software auch heute noch zeitaufwendig in Assemblersprachen entwickelt werden muß. Dies bedeutet einen erheblichen Flaschenhals in der Entwicklung eingebetteter Systeme. In dieser Arbeit werden neue Compilertechniken vorgestellt, welche die besonderen Randbedingungen im DSP-Bereich berücksichtigen. Hierzu zählen Optimierungstechniken, welche die charakteristischen Hardware-Eigenschaften von DSPs (u.a. spezialisierte Register, parallele Maschinenbefehle, separate Adreßrecheneinheiten) zur Verbesserung der Codequalität ausnutzen, mit dem Ziel, den Einsatz von Compilern auch im DSP-Bereich zu ermöglichen. Gleichzeitig sind diese Techniken hinreichend allgemein gehalten, um auf eine ganze Klasse von DSPs anwendbar zu sein. Diese Eigenschaft wird als Retargierbarkeit bezeichnet. Retargierbare Compiler helfen bei der Optimierung von Prozessorarchitekturen für gegebene Anwendungen. Das in dieser Arbeit vorgestellte Compilersystem RECORD ermöglicht die automatische Anpassung von Compilern an neue Prozessoren auf der Basis von Prozessormodellen, die in einer Hardware-Beschreibungssprache spezifiziert sind. Hierdurch wird die notwendige Brücke zwischen dem Compilerbau und dem computergestützten Entwurf integrierter Schaltungen geschlagen. Experimentelle Ergebnisse für realistische Prozessoren zeigen die praktische Anwendbarkeit der vorgestellten Techniken.1998-07-02T00:00:00ZSynthesis of Communicating Controllers for Concurrent Hardware/Software SystemsMarwedel, PeterNiemann, Ralfhttp://hdl.handle.net/2003/27642015-08-12T19:06:49Z1998-07-02T00:00:00ZTitle: Synthesis of Communicating Controllers for Concurrent Hardware/Software Systems
Authors: Marwedel, Peter; Niemann, Ralf
Abstract: Two main aspects in hardware/software codesign are hardware/software partitioning and co-synthesis. Most codesign approaches work only on one of these problems. In this paper, an approach coupling hardware/software partitioning and co-synthesis will be presented, working fully-automatic. The techniques have been integrated in the codesign tool COOL (COdesign toOL) supporting the complete design flow from system specification to board-level implementation for multi-processor and multi-ASIC target architectures for data-flow dominated applications.1998-07-02T00:00:00ZOptimized Array Index Computation in DSP ProgramsBasu, AnupamLeupers, RainerMarwedel, Peterhttp://hdl.handle.net/2003/27632015-08-12T19:07:09Z1998-07-02T00:00:00ZTitle: Optimized Array Index Computation in DSP Programs
Authors: Basu, Anupam; Leupers, Rainer; Marwedel, Peter
Abstract: An increasing number of components in embedded systems are implemented by software running on embedded processors. This trend creates a need for compilers for embedded processors capable of generating high quality machine code. Particularly for DSPs, such compilers are hardly available, and novel DSP-specific code optimization techniques are required. In this paper we focus on efficient address computation for array accesses in loops. Based on previous work, we present a new and optimal algorithm for address register allocation and provide an experimental evaluation of different algorithms. Furthermore, an efficient and close-to-optimum heuristic is proposed for large problems.1998-07-02T00:00:00ZRetargetable Code Generation based on Structural Processor DescriptionsLeupers, RainerMarwedel, Peterhttp://hdl.handle.net/2003/27622015-08-12T19:07:04Z1998-07-02T00:00:00ZTitle: Retargetable Code Generation based on Structural Processor Descriptions
Authors: Leupers, Rainer; Marwedel, Peter
Abstract: Design automation for embedded systems comprising both hardware and software components demands for code generators integrated into electronic CAD systems. These code generators provide the necessary link between software synthesis tools in HW/SW codesign systems and embedded processors. General-purpose compilers for standard processors are often insufficient, because they do not provide flexibility with respect to different target processors and also suffer from inferior code quality. While recent research on code generation for embedded processors has primarily focussed on code quality issues, in this contribution we emphasize the importance of retargetability, and we describe an approach to achieve retargetability. We propose usage of uniform, external target processor models in code generation, which describe embedded processors by means of RT-level netlists. Such structural models incorporate more hardware details than purely behavioral models, thereby permitting a close link to hardware design tools and fast adaptation to different target processors. The MSSQ compiler, which is part of the MIMOLA hardware design system, operates on structural models. We describe input formats, central data structures, and code generation techniques in MSSQ. The compiler has been successfully retargeted to a number of real-life processors, which proves feasibility of our approach with respect to retargetability. We discuss capabilities and limitations of MSSQ, and identify possible areas of improvement.1998-07-02T00:00:00ZInterface Synthesis for Embedded Applications in a Codesign EnvironmentBasu, AnupamMarwedel, PeterMitra, Raj S.http://hdl.handle.net/2003/27612015-08-13T00:09:57Z1998-07-02T00:00:00ZTitle: Interface Synthesis for Embedded Applications in a Codesign Environment
Authors: Basu, Anupam; Marwedel, Peter; Mitra, Raj S.
Abstract: In embedded systems, programmable peripherals are often coupled with the main programmable processor to achieve desired functionality. Interfacing such peripherals with the processor qualifies as an important task of hardware software codesign. In this paper, three important aspects of such interfacing, namely the allocation of addresses to the devices, allocation of device drivers, and approaches to handle events and transitions have been discussed. The proposed approaches have been incorporated in a codesign system MICKEY. The paper includes a number of examples, taken from the results synthesized by MICKEY, to illustrate the ideas.1998-07-02T00:00:00ZRegister-Constrained Address Computation in DSP ProgramsBasu, AnupamLeupers, RainerMarwedel, Peterhttp://hdl.handle.net/2003/27602015-08-12T18:04:07Z1998-07-02T00:00:00ZTitle: Register-Constrained Address Computation in DSP Programs
Authors: Basu, Anupam; Leupers, Rainer; Marwedel, Peter
Abstract: This paper describes a new code optimization technique for digital signal processors (DSPs). One important characteristic of DSP algorithms are iterative accesses to data array elements within loops. DSPs support efficient address computations for such array accesses by means of dedicated address generation units (AGUs). We present a heuristic technique which, given an AGU with a fixed number of address registers, minimizes the number of instructions needed for array address computations in a program loop.1998-07-02T00:00:00ZProcessor-Core Based Design and TestMarwedel, Peterhttp://hdl.handle.net/2003/27582015-08-12T19:07:00Z1998-07-04T00:00:00ZTitle: Processor-Core Based Design and Test
Authors: Marwedel, Peter
Abstract: This tutorial responds to the rapidly increasing use of various cores for implementing systems-on-a-chip. It specifically focusses on processor cores. We will give some examples of cores, including DSP cores and application-specific instruction-set processors (ASIPs). We will mention market trends for these components, and we will touch design procedures, in particular the use compilers. Finally, we will discuss the problem of testing core-based designs. Existing solutions include boundary scan, embedded in-circuit emulation (ICE), the use of processor resources for stimuli/response compaction and self-test programs.1998-07-04T00:00:00ZAn Algorithm for Hardware/Software Partitioning Using Mixed Integer LinearMarwedel, PeterNiemann, Ralfhttp://hdl.handle.net/2003/27592015-08-12T19:06:40Z1998-07-04T00:00:00ZTitle: An Algorithm for Hardware/Software Partitioning Using Mixed Integer Linear
Authors: Marwedel, Peter; Niemann, Ralf
Abstract: One of the key problems in hardware/software codesign is hardware/software partitioning. This paper describes a new approach to hardware/software partitioning using integer programming (IP). The advantage of using IP is that optimal results are calculated for a chosen objective function. The partitioning approach works fully automatic and supports multi-processor systems, interfacing and hardware sharing. In contrast to other approaches where special estimators are used, we use compilation and synthesis tools for cost estimation. The increased time for calculating values for the cost metrics is compensated by an improved quality of the values. Therefore, fewer iteration steps for partitioning are needed. The paper presents an algorithm using integer programming for solving the hardware/software partitioning problem leading to promising results.1998-07-04T00:00:00ZCompilers for Embedded ProcessorsMarwedel, Peterhttp://hdl.handle.net/2003/27572015-08-12T20:20:17Z1998-07-04T00:00:00ZTitle: Compilers for Embedded Processors
Authors: Marwedel, Peter
Abstract: This talk responds to the rapidly increasing use of embedded processors for implementing systems. Such processors come in the form of discrete processors as well as in the form of core processors. They are available both from vendors and within system companies. Applications can be found in most segments of the embedded system market, such as automotive electronics and telecommunications. These applications demand for extremely efficient processor architectures, optimized for a certain application domain or even a certain application. Current compiler technology supports these architectures very poorly and has recently been recognized as a major bottleneck for designing systems quickly, efficiently and reliably. A number of recent research projects aim at removing this bottleneck. The talk will briefly discuss the trend towards embedded processors. We will show market trends and examples of recent embedded processors. We will also introduce the terms "application specific instruction-set processors" (ASIPs), "application-specific signal processors" (ASSPs), "soft cores" and "hard cores". We will then present new code optimization approaches taking the special characterstics of embedded processor architectures into account. In particular, we will present new memory allocation and code compaction algorithms. In the final section of the talk, we will present techniques for retargeting compilers to new architectures easily. These techniques are motivated by the need for domain- or application-dependent optimizations of processor architectures. The scope for such optimizations should not be restricted to hardware architectures but has to include the corresponding work on compilers as well. We will show, how compilers can be generated from descriptions of processor architectures. Presented techniques aim at bridging the gap between electronic CAD and compiler generation.1998-07-04T00:00:00ZCode Generation for Core ProcessorsMarwedel, Peterhttp://hdl.handle.net/2003/27562015-08-12T19:06:55Z1998-07-04T00:00:00ZTitle: Code Generation for Core Processors
Authors: Marwedel, Peter
Abstract: This tutorial responds to the rapidly increasing use of cores in general and of processor cores in particular for implementing systems-on-a-chip. In the first part of this text, we will provide a brief introduction to various cores. Applications can be found in most segments of the embedded systems market. These applications demand for extreme efficiency, and in particular for efficient processor architectures and for efficient embedded software. In the second part of this text, we will show that current compilers do not provide the required efficiency and we will give an overview over new compiler optimization techniques, which aim at making assembly language programming for embedded software obsolete. These new techniques take advantage of the special characteristics of embedded software and embedded architectures. Due to efficiency considerations, processor architectures optimized for application domains or even for particular applications are of interest. This results in a large number of architectures and instruction sets, leading to the requirement for retargeting compilers to those numerous architectures. In the final section of the tutorial, we will present techniques for retargeting compilers to new architectures easily. We will show, how compilers can be generated from descriptions of processors. One of the approaches closes the gap which so far existed between electronic CAD and compiler generation.1998-07-04T00:00:00ZIntroducing Complex Components into Architectural SynthesisDömer, RainerLandwehr, BirgerMarwedel, Peterhttp://hdl.handle.net/2003/27552015-08-12T20:20:15Z1998-07-04T00:00:00ZTitle: Introducing Complex Components into Architectural Synthesis
Authors: Dömer, Rainer; Landwehr, Birger; Marwedel, Peter
Abstract: In this paper, we extend the set of library components which are usually considered in architectural synthesis by components with built-in chaining. For such components, the result of some internally computed arithmetic function is made available as an argument to some other function through a local connection. These components can be used to implement chaining in a data-path in a single component. Components with built-in chaining are combinatorial circuits. They correspond to ``complex gates in logic synthesis. If compared to implementations with several components, components with built-in chaining usually provide a denser layout, reduced power consumption, and a shorter delay time. Multiplier/accumulators are the most prominent example of such components. Such components require new approaches for library mapping in architectural synthesis. In this paper, we describe an IP-based approach taken in our OSCAR synthesis system.1998-07-04T00:00:00ZTime-Constrained Code Compaction for DSPsLeupers, RainerMarwedel, Peterhttp://hdl.handle.net/2003/27542015-08-12T20:20:08Z1998-07-04T00:00:00ZTitle: Time-Constrained Code Compaction for DSPs
Authors: Leupers, Rainer; Marwedel, Peter
Abstract: This paper addresses instruction-level parallelism in code generation for DSPs. In presence of potential parallelism, the task of code generation includes code compaction, which parallelizes primitive processor operations under given dependency and resource constraints. Furthermore, DSP algorithms in most cases are required to guarantee real-time response. Since the exact execution speed of a DSP program is only known after compaction, real-time constraints should be taken into account during the compaction phase. While previous DSP code generators rely on rigid heuristics for compaction, we propose a novel approach to exact local code compaction based on an Integer Programming model, which handles time constraints. Due to a general problem formulation, the IP model also captures encoding restrictions and handles instructions having alternative encodings and side effects, and therefore applies to a large class of instruction formats. Capabilities and limitations of our approach are discussed for different DSPs.1998-07-04T00:00:00ZRetargetable Generation of Code Selectors from HDL Processor ModelsLeupers, RainerMarwedel, Peterhttp://hdl.handle.net/2003/27532015-08-12T19:06:44Z1998-07-04T00:00:00ZTitle: Retargetable Generation of Code Selectors from HDL Processor Models
Authors: Leupers, Rainer; Marwedel, Peter
Abstract: Besides high code quality, a primary issue in embedded code generation is retargetability of code generators. This paper presents techniques for automatic generation of code selectors from externally specified processor models. In contrast to previous work, our retargetable compiler RECORD does not require tool-specific modelling formalisms, but starts from general HDL processor models. From an HDL model, all processor aspects needed for code generation are automatically derived. As demonstrated by experimental results, short turnaround times for retargeting are achieved, which permits to study the HW/SW trade-off between processor architectures and program execution speed.1998-07-04T00:00:00ZRetargetable Compilers for Embedded DSPsLeupers, RainerMarwedel, Peterhttp://hdl.handle.net/2003/27522015-08-12T20:20:13Z1998-07-04T00:00:00ZTitle: Retargetable Compilers for Embedded DSPs
Authors: Leupers, Rainer; Marwedel, Peter
Abstract: Programmable devices are a key technology for the design of embedded systems, such as in the consumer electronics market. Processor cores are used as building blocks for more and more embedded system designs, since they provide a unique combination of features: flexibility and reusability. Processor-based design implies that compilers capable of generating efficient machine code are necessary. However, highly efficient compilers for embedded processors are hardly available. In particular, this holds for digital signal processors (DSPs). This contribution is intended to outline different aspects of DSP compiler technology. First, we cover demands on compilers for embedded DSPs, which are partially in sharp contrast to traditional compiler construction. Secondly, we present recent advances in DSP code optimization techniques, which explore a comparatively large search space in order to achieve high code quality. Finally, we discuss the different approaches to retargetability of compilers, that is, techniques for automatic generation of compilers from processor models.1998-07-04T00:00:00ZOptimierende Compiler für DSPs: Was ist verfügbar?Leupers, RainerMarwedel, Peterhttp://hdl.handle.net/2003/27512015-08-12T19:06:26Z1998-07-04T00:00:00ZTitle: Optimierende Compiler für DSPs: Was ist verfügbar?
Authors: Leupers, Rainer; Marwedel, Peter
Abstract: Die Softwareentwicklung für eingebettete Prozessoren findet heute größtenteils noch auf Assemblerebene statt. Der Grund für diesen langfristig wohl unhaltbaren Zustand liegt in der mangelnden Verfügbarkeit von guten C-Compilern. In den letzten Jahren wurden allerdings wesentliche Fortschritte in der Codeoptimierung - speziell für DSPs - erzielt, welche bisher nur unzureichend in kommerzielle Produkte umgesetzt wurden. Dieser Beitrag zeigt die prinzipiellen Optimierungsquellen auf und faßt den Stand der Technik zusammen. Die zentralen Methoden hierbei sind komplexe Optimierungsverfahren, welche über die traditionelle Compilertechnologie hinausgehen, sowie die Ausnutzung der DSP-spezifischen Hardware-Architekturen zur effizienten Übersetzung von C-Sprachkonstrukten in DSP-Maschinenbefehle. Die genannten Verfahren lassen sich teilweise auch allgemein auf (durch Compiler generierte oder handgeschriebene) Assemblerprogramme anwenden.1998-07-04T00:00:00Z