Entwurfsautomatisierung für Eingebettete Systeme
Permanent URI for this collection
As the name suggests, the DAES group focusses on the design automation of embedded systems. This means that the group works on methodologies and tools which automate or help to design embedded systems. Design automation aims at reducing the cost of designing embedded sytems and at improving properties of the resulting designs (e.g. the energy consumption or the timing predictability). The focus is on generating tool chains for designing embedded software. Such tool chains include compilers, optimizers and tools mapping applications to execution platforms.
Browse
Recent Submissions
Item Parallel path progression DAG scheduling(2023-05-25) Ueter, Niklas; Günzel, Mario; Brüggen, Georg von der; Chen, Jian-JiaIncreasing performance needs of modern cyber-physical systems leads to multiprocessor architectures being increasingly utilized. To efficiently exploit their potential parallelism in hard real-time systems, appropriate task models and scheduling algorithms that allow to provide timing guarantees are required. Such scheduling algorithms and the corresponding worst-case response time analyses usually suffer from resource over-provisioning due to pessimistic analyses based on worst-case assumptions. Hence, scheduling algorithms and analyses with high resource efficiency are required. A prominent fine-grained parallel task model is the directed-acyclic-graph (DAG) task model that is composed of precedence constrained subjobs. This paper studies the hierarchical real-time scheduling problem of sporadic arbitrary-deadline DAG tasks. We propose a parallel path progression scheduling property that is implemented with only two distinct subtask priorities, which allows to quantify the parallel execution of a user chosen collection of complete paths in the response time analysis. This novel approach significantly improves the state-of-the-art response time analyses for parallel DAG tasks for highly parallel DAG structures and can provably exhaust large core numbers. Two hierarchical scheduling algorithms are designed based on this property, extending the parallel path progression properties and improve the response time analysis for sporadic arbitrary-deadline DAG task sets.Item Software exploitation of traditional interfaces for modern technologies(2024) Hakert, Christian; Chen, Jian-Jia; Castrillon, JeronimoModern computer Technologies are skyrocketing to spheres, which frequently seemed unimaginable years ago. Quantum effect petabyte-sized storage devices or deep cache hierarchies, acting within nanoseconds, make only a few examples. At the same time, interfaces to communicate with such technologies are settled and remain largely unaffected by the technology development. While loading and storing a word to a given memory address was the standard interface to communicate with memory devices in very early stages of computer systems, it still features a similar shape nowadays. Unsurprisingly, modern computing technologies come with increasing demand of management, such as lifetime management for NON-VOLATILE MEMORY (NVM) or prefetching and eviction strategies for cache hierarchies. Leaving this management to the hardware solely provides a limited design space and space for optimization. Consequently, soft- ware has to find ways, which allow an either direct or indirect management of the technologies over the traditional interfaces. This dissertation picks up this need and studies selected modern technologies and their need for management. Methods are presented in this thesis, which systematically exploit existing traditional interfaces in order to provide extended functionalities for the management of modern technologies. The exploitations in this thesis are solely conducted on a software level and do not require any actions in the available hardware. In a first part, memory technologies are picked up as a target technology. In greater detail, NON-VOLATILE MEMORY (NVM) is studied. This thesis discusses the lifetime issue of these technologies and the resulting need for wear-leveling. Various approaches are introduced, which allow different forms of wear-leveling on different levels of the software. This ranges from wear-leveling procedures inside the operating system and the system software towards direct application integration to extend the memory lifetime. Apart from the lifetime issue, the latency and energy property of a specific type of emerging memory, namely RACETRACK MEMORY (RTM), is considered. Dedicated to the application of RANDOM FOREST (RF) models, the access properties are optimized in the application level directly. In the last part of this thesis, the focus is moved from memories to arithemtic compuation. RANDOM FOREST (RF) models are kept as a target application and their execution on modern computation technologies is considered. The usage of floating- point numbers is put to a major focus and the memory behavior of floating-point numbers is optimized. By proposing alternative computation schemes for floating-point numbers, which are entirely realized in software and leave the hardware untouched, significant performance improvement is gained.Item Unlocking efficiency in BNNs: global by local thresholding for analog-based HW accelerators(2023-09-14) Yayla, Mikail; Frustaci, Fabio; Spagnolo, Fanny; Chen, Jian-Jia; Amrouch, HussamFor accelerating Binarized Neural Networks (BNNs), analog computing-based crossbar accelerators, utilizing XNOR gates and additional interface circuits, have been proposed. Such accelerators demand a large amount of analog-to-digital converters (ADCs) and registers, resulting in expensive designs. To increase the inference efficiency, the state of the art divides the interface circuit into an Analog Path (AP), utilizing (cheap) analog comparators, and a Digital Path (DP), utilizing (expensive) ADCs and registers. During BNN execution, a certain path is selectively triggered. Ideally, as inference via AP is more efficient, it should be triggered as often as possible. However, we reveal that, unless the number of weights is very small, the AP is rarely triggered. To overcome this, we propose a novel BNN inference scheme, called Local Thresholding Approximation (LTA). It approximates the global thresholdings in BNNs by local thresholdings. This enables the use of the AP through most of the execution, which significantly increases the interface circuit efficiency. In our evaluations with two BNN architectures, using LTA reduces the area by 42x and 54x, the energy by 2.7x and 4.2x, and the latency by 3.8x and 1.15x, compared to the state-of-the-art crossbar-based BNN accelerators.Item Probabilistic reaction time analysis(2023-09-09) Günzel, Mario; Ueter, Niklas; Chen, Kuan-Hsun; Brüggen, Georg von der; Chen, Jian-JiaIn many embedded systems, for instance, in the automotive, avionic, or robotics domain, critical functionalities are implemented via chains of communicating recurrent tasks. To ensure safety and correctness of such systems, guarantees on the reaction time, that is, the delay between a cause (e.g., an external activity or reading of a sensor) and the corresponding effect, must be provided. Current approaches focus on the maximum reaction time, considering the worst-case system behavior. However, in many scenarios, probabilistic guarantees on the reaction time are sufficient. That is, it is sufficient to provide a guarantee that the reaction does not exceed a certain threshold with (at least) a certain probability. This work provides such probabilistic guarantees on the reaction time, considering two types of randomness: response time randomness and failure probabilities. To the best of our knowledge, this is the first work that defines and analyzes probabilistic reaction time for cause-effect chains based on sporadic tasks.Item A vision for edge AI(2024) Yayla, Mikail; Chen, Jian-Jia; Teich, JürgenEdge Artificial Intelligence is progressively pervading all aspects of our life. However, to perform complex tasks, a massive amount of matrix multiplications needs to be computed. At the same time, the available hardware resources for computation are highly limited. The pressing need for efficiency serves as the motivation for this dissertation. In this dissertation, we propose a vision for highly-resource constrained future intelligent systems that are comprised of robust Binarized Neural Networks operating with approximate memory and approximate computing units, while being able to be trained on the edge.Item Analyses and optimizations of timing-constrained embedded systems considering resource synchronization and machine learning approaches(2023) Shi, Junjie; Chen, Jian-Jia; Biondi, AlessandroNowadays, embedded systems have become ubiquitous, powering a vast array of applications from consumer electronics to industrial automation. Concurrently, statistical and machine learning algorithms are being increasingly adopted across various application domains, such as medical diagnosis, autonomous driving, and environmental analysis, offering sophisticated data analysis and decision-making capabilities. As the demand for intelligent and time-sensitive applications continues to surge, accompanied by growing concerns regarding data privacy, the deployment of machine learning models on embedded devices has emerged as an indispensable requirement. However, this integration introduces both significant opportunities for performance enhancement and complex challenges in deployment optimization. On the one hand, deploying machine learning models on embedded systems with limited computational capacity, power budgets, and stringent timing requirements necessitates additional adjustments to ensure optimal performance and meet the imposed timing constraints. On the other hand, the inherent capabilities of machine learning, such as self-adaptation during runtime, prove invaluable in addressing challenges encountered in embedded systems, aiding in optimization and decision-making processes. This dissertation introduces two primary modifications for the analyses and optimizations of timing-constrained embedded systems. For one thing, it addresses the relatively long access times required for shared resources of machine learning tasks. For another, it considers the limited communication resources and data privacy concerns in distributed embedded systems when deploying machine learning models. Additionally, this work provides a use case that employs a machine learning method to tackle challenges specific to embedded systems. By addressing these key aspects, this dissertation contributes to the analysis and optimization of timing-constrained embedded systems, considering resource synchronization and machine learning models to enable improved performance and efficiency in real-time applications with stringent constraints.Item Complex scheduling models and analyses for property-based real-time embedded systems(2023) Ueter, Niklas; Chen, Jian-Jia; Li, JingModern multi core architectures and parallel applications pose a significant challenge to the worst-case centric real-time system verification and design efforts. The involved model and parameter uncertainty contest the fidelity of formal real-time analyses, which are mostly based on exact model assumptions. In this dissertation, various approaches that can accept parameter and model uncertainty are presented. In an attempt to improve predictability in worst-case centric analyses, the exploration of timing predictable protocols are examined for parallel task scheduling on multiprocessors and network-on-chip arbitration. A novel scheduling algorithm, called stationary rigid gang scheduling, for gang tasks on multiprocessors is proposed. In regard to fixed-priority wormhole-switched network-on-chips, a more restrictive family of transmission protocols called simultaneous progression switching protocols is proposed with predictability enhancing properties. Moreover, hierarchical scheduling for parallel DAG tasks under parameter uncertainty is studied to achieve temporal- and spatial isolation. Fault-tolerance as a supplementary reliability aspect of real-time systems is examined, in spite of dynamic external causes of fault. Using various job variants, which trade off increased execution time demand with increased error protection, a state-based policy selection strategy is proposed, which provably assures an acceptable quality-of-service (QoS). Lastly, the temporal misalignment of sensor data in sensor fusion applications in cyber-physical systems is examined. A modular analysis based on minimal properties to obtain an upper-bound for the maximal sensor data time-stamp difference is proposed.Item Memory carousel: LLVM-based bitwise wear leveling for nonvolatile main memory(2022-12-14) Hölscher, Nils; Hakert, Christian; Nassar, Hassan; Chen, Kuan-Hsun; Bauer, Lars; Chen, Jian-Jia; Henkel, JörgEmerging nonvolatile memory yields, alongside many advantages, technical shortcomings, such as reduced cell lifetime. Although many wear-leveling approaches exist to extend the lifetime of such memories, usually a tradeoff for the granularity of wear leveling has to be made. Due to iterative write schemes (repeatedly sense and write), wear out of memory in certain systems is directly dependent on the written bit value and thus can be highly imbalanced, requiring dedicated bit-wise wear leveling. Such a bit-wise wear leveling so far has only be proposed together with a special hardware support. However, if no dedicated hardware solutions are available, especially for commercial off-the-shelf systems with nonvolatile memories, a software solution can be crucial for the system lifetime. In this work, we propose entirely software-based bit-wise wear leveling, where the position of bits within CPU words in the main memory is rotated on a regular basis. We leverage the LLVM intermediate representation to adjust load and store operations of the application with a custom compiler pass. Experimental evaluation shows that the lifetime by applying local rotation within the CPU word can be extended by a factor of up to 21× . We also show that our method can incorporate with coarser-grained wear leveling, e.g., on block granularity and assist achievement of higher lifetime improvements.Item Special issue on practical and robust design of real-time systems(2022-08-20) Chen, Jian-Jia; Shrivastava, AviralItem MODES: model-based optimization on distributed embedded systems(2021-06-04) Shi, Junjie; Bian, Jiang; Richter, Jakob; Chen, Kuan-Hsun; Rahnenführer, Jörg; Xiong, Haoyi; Chen, Jian-JiaThe predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to collect all the data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the time available for tuning. Model-Based Optimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) MODES-B considers the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, and (2) MODES-I considers all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate MODES by conducting experiments on the optimization for the hyper-parameters of a random forest and a multi-layer perceptron. The experimental results demonstrate that, with an improvement in terms of mean accuracy (MODES-B), run-time efficiency (MODES-I), and statistical stability for both modes, MODES outperforms the baseline, i.e., carry out tuning with MBO on each node individually with its local sub-data set.Item Correspondence article: counterexample for suspension-aware schedulability analysis of EDF scheduling(2020-08-18) Günzel, Mario; Chen, Jian-JiaItem A note on slack enforcement mechanisms for self-suspending tasks(2021-01-27) Günzel, Mario; Chen, Jian-JiaThis paper provides counterexamples for the slack enforcement mechanisms to handle segmented self-suspending real-time tasks by Lakshmanan and Rajkumar (Proceedings of the Real-Time and Embedded Technology and Applications Symposium (RTAS), pp 3–12, 2010).Item Nanoparticle classification using frequency domain analysis on resource-limited platforms(2019-09-24) Yayla, Mikail; Toma, Anas; Chen, Kuan-Hsun; Lenssen, Jan Eric; Shpacovitch, Victoria; Hergenröder, Roland; Weichert, Frank; Chen, Jian-JiaA mobile system that can detect viruses in real time is urgently needed, due to the combination of virus emergence and evolution with increasing global travel and transport. A biosensor called PAMONO (for Plasmon Assisted Microscopy of Nano-sized Objects) represents a viable technology for mobile real-time detection of viruses and virus-like particles. It could be used for fast and reliable diagnoses in hospitals, airports, the open air, or other settings. For analysis of the images provided by the sensor, state-of-the-art methods based on convolutional neural networks (CNNs) can achieve high accuracy. However, such computationally intensive methods may not be suitable on most mobile systems. In this work, we propose nanoparticle classification approaches based on frequency domain analysis, which are less resource-intensive. We observe that on average the classification takes 29 μ s per image for the Fourier features and 17 μ s for the Haar wavelet features. Although the CNN-based method scores 1–2.5 percentage points higher in classification accuracy, it takes 3370 μ s per image on the same platform. With these results, we identify and explore the trade-off between resource efficiency and classification performance for nanoparticle classification of images provided by the PAMONO sensor.Item Realistic scheduling models and analyses for advanced real-time embedded systems(2019) Brüggen, Georg von der; Chen, Jian-Jia; Davis, Robert I.Focusing on real-time scheduling theory, the thesis demonstrates how essential realistic scheduling models and analyses are when guaranteeing timing correctness without over-provisioning the necessary system resources. It details potential pitfalls of the de facto standards for theoretical examination of scheduling algorithms and schedulability tests, namely resource augmentation bounds and utilization bounds, and proposes parametric augmentation functions to improve their meaningfulness. Considering uncertain execution behaviour, systems with dynamic real-time guarantees are introduced to model this scenario more realistically than mixed-criticality systems, and the first technique that allows to precisely calculate the worst-case deadline failure probability for task sets with a realistic number of tasks is provided. Furthermore, hybrid self-suspension models are proposed that bridge the gap between the over-flexible dynamic and the over-restrictive segmented self-suspension model with different tradeoffs between accuracy and flexibility.Item Optimization and analysis for dependable application software on unreliable hardware platforms(2019) Chen, Kuan-Hsun; Chen, Jian-Jia; Ernst, RolfAs chip technology keeps on shrinking towards higher densities and lower operating vol- tages, memory and logic components are now vulnerable to electromagnetic inference and radiation, leading to transient faults in the underlying hardware, which may jeopar- dize the correctness of software execution and cause so-called soft errors. To mitigate threats of soft errors, embedded-software developers have started to deploy Software- Implemented Hardware Fault Tolerance (SIHFT) techniques. However, the main cost is the signi cant amount of time due to the additional computation of using SIHFT techniques. To support safety critical systems, e.g., computing systems in automotive and avionic devices, real-time system technology has been primarily used and been wi- dely studied. While considering hardware transient faults and SIHFT techniques with real-time system technology, novel scheduling approaches and schedulability analyses are desired to provide a less pessimistic o -line guarantee for timeliness or at least to provide a certain degree of performance for new application models. Moreover, reliability optimizations also need to be designed thoughtfully while considering di erent resource constraints. In this dissertation, we present three treatments for soft errors. Firstly, we study how to allow erroneous computations without deadline misses by modeling inherent safety margins and noise tolerance in control applications as (m; k) constraints. We further dis- cuss how a given (m; k) requirement can be satis ed by individual error detection and exible compensations while satisfying the given hard real-time constraints. Secondly, we analyze the probability of deadline misses and the deadline miss rate in soft real-time systems, which allow to have occasional deadline misses without erroneous computations. Thirdly, we consider how to deploy redundant multi-threading techniques to improve the system reliability under two di erent system models for multi-core systems: 1) Under core-to-core frequency variations, we address the reliability-aware task-mapping problem. 2) We decide on redundancy levels for each task while satisfying the given real-time constraints and the limited redundant cores even under multi-tasking. Finally, an enhan- cement for real time operating systems is also provided to maintain the strict periodicity for task overruns due to potential transient faults, especially on one popular platform named Real-Time Executive for Multiprocessor Systems (RTEMS).Item Methods for efficient resource utilization in statistical machine learning algorithms(2018) Kotthaus, Helena; Marwedel, Peter; Rahnenführer, JörgIn recent years, statistical machine learning has emerged as a key technique for tackling problems that elude a classic algorithmic approach. One such problem, with a major impact on human life, is the analysis of complex biomedical data. Solving this problem in a fast and efficient manner is of major importance, as it enables, e.g., the prediction of the efficacy of different drugs for therapy selection. While achieving the highest possible prediction quality appears desirable, doing so is often simply infeasible due to resource constraints. Statistical learning algorithms for predicting the health status of a patient or for finding the best algorithm configuration for the prediction require an excessively high amount of resources. Furthermore, these algorithms are often implemented with no awareness of the underlying system architecture, which leads to sub-optimal resource utilization. This thesis presents methods for efficient resource utilization of statistical learning applications. The goal is to reduce the resource demands of these algorithms to meet a given time budget while simultaneously preserving the prediction quality. As a first step, the resource consumption characteristics of learning algorithms are analyzed, as well as their scheduling on underlying parallel architectures, in order to develop optimizations that enable these algorithms to scale to larger problem sizes. For this purpose, new profiling mechanisms are incorporated into a holistic profiling framework. The results show that one major contributor to the resource issues is memory consumption. To overcome this obstacle, a new optimization based on dynamic sharing of memory is developed that speeds up computation by several orders of magnitude in situations when available main memory is the bottleneck, leading to swapping out memory. One important application that can be applied for automated parameter tuning of learning algorithms is model-based optimization. Within a huge search space, algorithm configurations are evaluated to find the configuration with the best prediction quality. An important step towards better managing this search space is to parallelize the search process itself. However, a high runtime variance within the configuration space can cause inefficient resource utilization. For this purpose, new resource-aware scheduling strategies are developed that efficiently map evaluations of configurations to the parallel architecture, depending on their resource demands. In contrast to classical scheduling problems, the new scheduling interacts with the configuration proposal mechanism to select configurations with suitable resource demands. With these strategies, it becomes possible to make use of the full potential of parallel architectures. Compared to established parallel execution models, the results show that the new approach enables model-based optimization to converge faster to the optimum within a given time budget.Item Efficient implementation of resource-constrained cyber-physical systems using multi-core parallelism(2018) Neugebauer, Olaf; Marwedel, Peter; Müller, HeinrichThe quest for more performance of applications and systems became more challenging in the recent years. Especially in the cyber-physical and mobile domain, the performance requirements increased significantly. Applications, previously found in the high-performance domain, emerge in the area of resource-constrained domain. Modern heterogeneous high-performance MPSoCs provide a solid foundation to satisfy the high demand. Such systems combine general processors with specialized accelerators ranging from GPUs to machine learning chips. On the other side of the performance spectrum, the demand for small energy efficient systems exposed by modern IoT applications increased vastly. Developing efficient software for such resource-constrained multi-core systems is an error-prone, time-consuming and challenging task. This thesis provides with PA4RES a holistic semiautomatic approach to parallelize and implement applications for such platforms efficiently. Our solution supports the developer to find good trade-offs to tackle the requirements exposed by modern applications and systems. With PICO, we propose a comprehensive approach to express parallelism in sequential applications. PICO detects data dependencies and implements required synchronization automatically. Using a genetic algorithm, PICO optimizes the data synchronization. The evolutionary algorithm considers channel capacity, memory mapping, channel merging and flexibility offered by the channel implementation with respect to execution time, energy consumption and memory footprint. PICO's communication optimization phase was able to generate a speedup almost 2 or an energy improvement of 30% for certain benchmarks. The PAMONO sensor approach enables a fast detection of biological viruses using optical methods. With a sophisticated virus detection software, a real-time virus detection running on stationary computers was achieved. Within this thesis, we were able to derive a soft real-time capable virus detection running on a high-performance embedded system, commonly found in today's smart phones. This was accomplished with smart DSE algorithm which optimizes for execution time, energy consumption and detection quality. Compared to a baseline implementation, our solution achieved a speedup of 4.1 and 87\% energy savings and satisfied the soft real-time requirements. Accepting a degradation of the detection quality, which still is usable in medical context, led to a speedup of 11.1. This work provides the fundamentals for a truly mobile real-time virus detection solution. The growing demand for processing power can no longer satisfied following well-known approaches like higher frequencies. These so-called performance walls expose a serious challenge for the growing performance demand. Approximate computing is a promising approach to overcome or at least shift the performance walls by accepting a degradation in the output quality to gain improvements in other objectives. Especially for a safe integration of approximation into existing application or during the development of new approximation techniques, a method to assess the impact on the output quality is essential. With QCAPES, we provide a multi-metric assessment framework to analyze the impact of approximation. Furthermore, QCAPES provides useful insights on the impact of approximation on execution time and energy consumption. With ApproxPICO we propose an extension to PICO to consider approximate computing during the parallelization of sequential applications.Item Memory-aware platform description and framework for source-level embedded MPSoC software optimization(2017) Pyka, Robert; Marwedel, Peter; Teubner, JensDeveloping optimizing source-level transformations, consists of numerous non-trivial subtasks. Besides identifying actual optimization goals within a particular target-platform and compiler setup, the actual implementation is a tedious, error-prone and often recurring work. Providing appropriate support for this development work is a challenging task. Defining and implementing a well-suited target-platform description which can be used by a wide set of optimization techniques while being precise and easy to maintain is one dimension of this challenging task. Another dimension, which has also been tackled in this work, deals with provision of an infrastructure for optimization-step representation, interaction and data retention. Finally, an appropriate source-code representation has been integrated into this approach. These contributions are tightly related to each other, they have been bundled into the MACCv2 framework, a fullfledged optimization-technique implementation and integration approach. Together, they significantly alleviate the effort required for implementation of source-level memory-aware optimization techniques for Multi Processor Systems on a Chip (MPSoCs). The system-modeling approach presented in this dissertation has been located at the processor-memory-switch (PMS) abstraction level. It offers a novel combined structural and semantical description. It combines a locally-scoped, structural modeling approach, as preferred by system designers, and a fast, database-like interface, best suited for optimization technique developers. It supports model refinement and requires only limited effort for an initial abstract system model. The general structure consists of components and channels. Based on this structure, the system model provides mechanisms for database-like access to system-global target-platform properties, while requiring only definition of locally-scoped input data annotated to system-model items. A typical set of these properties contains energy-consumption and access-latency values. The request-based retrieval of system properties is a unique feature, which makes this approach superior to state-of-the-art table-lookup-based or full-system-simulation-based approaches. Combining such component-local properties to system-global target-platform data is performed via aspect handlers. These handlers define computational rules which are applied to correlated locally-scoped data along access paths in the memory-subsystem hierarchy. This approach is capable of calculating these system-global values at a rate similar to plain table lookups, while maintaining a precision close to full-system-simulation-based estimations. This has been shown for both, energy-consumption values as well as access-latency values of the MPARM platform. The MACCv2 framework provides a set of fundamental services to the optimization technique developer. On top of these services, a system model and source-code representation are provided. Further, framework-based optimization-technique implementations are encapsulated into self-contained entities exposing well-defined interfaces. This framework has been successfully used within the European Commission funded MNEMEE project. The hierarchical processing-step representation in MACCv2 allows for encapsulation of tasks at various granularity levels. For simplified reuse in future projects, the entire toolchain as well as individual optimization techniques have been represented as processing-step entities in terms of MACCv2. A common notion of target-platform structure and properties as well as inter-processing-step communication, is achieved via framework-provided services. The system-modeling approach and the framework show the right set of properties needed to support development of memory-aware optimization techniques. The MNEMEE project, continued research work, teaching activities and PhD theses have been successfully founded on approaches and the framework proposed in this dissertation.Item Scheduling algorithms and timing analysis for hard real-time systems(2017) Huang, Wen-Hung Kevin; Chen, Jian-Jia; Reineke, JanReal-time systems are designed for applications in which response time is critical. As timing is a major property of such systems, proving timing correctness is of utter importance. To achieve this, a two-fold approach of timing analysis is traditionally involved: (i) worst-case execution time (WCET) analysis, which computes an upper bound on the execution time of a single job of a task running in isolation; and (ii) schedulability analysis using the WCET as the input, which determines whether multiple tasks are guaranteed to meet their deadlines. Formal models used for representing recurrent real-time tasks have traditionally been characterized by a collection of independent jobs that are released periodically. However, such a modeling may result in resource under-utilization in systems whose behaviors are not entirely periodic or independent. Examples are (i) multicore platforms where tasks share a communication fabric, like bus, for accesses to a shared memory beside processors; (ii) tasks with synchronization, where no two concurrent access to one shared resource are allowed to be in their critical section at the same time; and (iii) automotive systems, where tasks are linked to rotation (e.g., of the crankshaft, gears, or wheels). There, their activation rate is proportional to the angular velocity of a specific device. This dissertation presents multiple approaches towards designing scheduling algorithms and schedulability analysis for a variety of real-time systems with different characteristics. Specifically, we look at those design problems from the perspective of speedup factor — a metric that quantifies both the pessimism of the analysis and the non-optimality of the scheduling algorithm. The proposed solutions are shown promising by means of not only speedup factor but also extensive evaluations.Item Memory-aware mapping strategies for heterogeneous MPSoC systems(2017) Holzkamp, Olivera; Marwedel, Peter; Teubner, JensEmbedded systems, such as mobile phones, integrate more and more features, e.g. multiple cameras, GPS sensors and many other sensors and actuators. These kind of embedded systems are dealing with increasing complexity due to demands on performance and constraints in energy consumption. The performance on such systems can be increased by executing application tasks in parallel. To achieve this, multiprocessor systems-on-chip (MPSoC) devices were introduced. On the other side, the energy consumption of these systems has to be decreased, especially for battery-driven embedded systems. A reduction in energy consumption can be achieved by efficiently utilizing the hardware resources on these devices. MPSoC devices can be either homogeneous or heterogeneous. Homogeneous MPSoC devices usually contain the same type of processors with the same speed, i.e. clock frequency, and the same type and size of memories for each processor. In heterogeneous MPSoC devices, the processor types and/or clock frequencies and memory types and/or sizes may vary. During the last decade, research has dealt with optimizations for the efficient utilization of hardware resources on MPSoCs. Central issues are the extraction of parallelism from sequential code and the efficient mapping of the parallelized application tasks onto the processors of the system. A few frameworks have been developed which distribute parallelized application tasks to available processors while optimizing for one or more objectives such as performance and energy consumption. They usually integrate all required, foregoing steps such as the extraction of parallelized tasks from sequential code and the extraction of a task graph as input for the mapping optimization. These steps are performed either manually or in an automated way. These kind of frameworks help the embedded system designer to significantly reduce design time. Unfortunately, the influence of memories or memory hierarchies is neglected in mapping optimizations, even though it is a well-known fact that memories have a drastic impact on the runtime and energy consumption of the system. This dissertation investigates the effect of memory hierarchies in MPSoC mapping. Since a thread based application model is used, a thread graph extraction tool is introduced. Furthermore, two approaches for memory-aware mapping optimization for homogeneous and heterogeneous embedded MPSoC devices are presented. The thread graph extraction tool extracts a flat thread graph with important annotations for software requirements, hardware performance and energy consumption. This thread graph represents all required input information for the subsequent memory-aware mapping optimizations. Dependent on the complexity of the application, the designer can choose between a fine-grained and a coarse-grained thread graph and thus influence the overall design time. The first presented memory-aware mapping approach handles single objective optimizations, which reduce either the runtime or the energy consumption of the system. The second presented memory-aware mapping approach handles a multiobjective optimization, which reduces both, runtime and energy consumption. All approaches additionally reduce the work of the embedded system designer and thus the design time. They work in a fully automated way and are integrated within the MACCv2/MNEMEE tool flow. The MNEMEE tool flow also provides all required foregoing steps such as the parallelization of sequential application code. The presented evaluations show that considering memory mapping during MPSoC mapping optimization significantly reduces the application runtime and energy consumption. The single objective optimizations are able to achieve an average reduction in runtime by about 21% and an average reduction in energy consumption by about 28%. The multiobjective memory-aware mapping optimization achieves an average reduction in runtime by about 21% and an average reduction in energy consumption by about 26%. Both presented optimization approaches were validated for homogeneous and heterogeneous MPSoC devices. The results clearly show that neglecting the memory subsystem can lead to wasted optimization potential.