Entwurfsautomatisierung für Eingebettete Systeme

Permanent URI for this collection

http://hdl.handle.net/2003/75

As the name suggests, the DAES group focusses on the design automation of embedded systems. This means that the group works on methodologies and tools which automate or help to design embedded systems. Design automation aims at reducing the cost of designing embedded sytems and at improving properties of the resulting designs (e.g. the energy consumption or the timing predictability). The focus is on generating tool chains for designing embedded software. Such tool chains include compilers, optimizers and tools mapping applications to execution platforms.

Browse

Now showing 1 - 20 of 80

Property-based timing analysis of distributed real-time systems
(2024) Günzel, Mario; Chen, Jian-Jia; Baruah, Sanjoy
For real-time systems, timing requirements must be met to avoid catastrophic outcomes. While the behavior of classical real-time systems is already studied extensively in the literature, the trend towards distributed systems raises new challenges. In this dissertation, we focus on the following two challenges: (i) distributed execution of jobs, resulting in self-suspending task behavior, and (ii) interplay of several distributed tasks, requiring to shift the real-time requirements from a single task to a sequence of tasks, the so-called end-to-end latency of cause-effect chains. Although the study of cause-effect chains and self-suspending tasks has led to a series of results, some of them suffer from missing assumptions or use intuitive extension of classical results on real-time systems, resulting in flawed analyses. We tackle this issue by providing careful analysis of self-suspending tasks and cause-effect chains based on fundamental properties of possible evolutions of the system. The dissertation is structured into five main chapters. In Chapter 2, real-time systems and classical task models used in the literature are introduced. In this dissertation, we pursue a fundamental view on real-time systems, starting from possible system evolutions and schedules, and abstracting task properties afterwards. This is different to the typical procedure of defining the task model first and deriving possible schedules based on the task model. Our approach has the benefit that a task can be observed in different abstraction levels without translating the task into a different task model. Furthermore, in Chapter 2, we introduce scheduling algorithms as a systematic approach to derive task schedules from system evolutions based on the information of the task abstraction, and we discuss analytical literature results for the adherence to real-time requirements. Chapter 3 introduces the challenges that arise from distributed systems. The first challenge is the distributed execution of jobs. This is enabled by refining tasks into smaller blocks which can be distributed on different computing elements. The impact of this refinement on the generation of schedules is discussed and different refined task models are presented. One of them, the so-called self-suspending task model, is in focus for this dissertation. The second challenge is the interplay of distributed tasks. More specifically, if a functionality is not described by a single task but by a sequence of task (a so-called cause-effect chain) then the timing requirement must be shifted from the single task to the cause-effect chain. The resulting timing metric for cause-effect chains is the end-to-end latency. Chapter 4 emphasizes the need for careful, property-based analysis. That is, three counterexamples for literature results are provided. Two of them are related to self-suspending tasks, closing the unresolved issues discussed in a review paper on self-suspending tasks in 2019. The third one disproves a basic result for real-time systems in the context of probabilistic setups. Chapter 5 examines self-suspending tasks. Self-suspending tasks can voluntarily suspend their execution. Such behavior can cause timing anomalies, i.e., the worst-case behavior cannot be observed with maximal execution and suspension time. Therefore, self-suspending tasks require careful analysis and treatment. We provide novel analytical approaches for the Worst-Case Response Time (WCRT) of self-suspending tasks under different scheduling algorithms, namely, (i) Task-level Fixed Priority (T-FP), (ii) Earliest-Deadline-First (EDF) and (iii) EDF-Like (EL), all outperforming the state of the art. Furthermore, we present mechanisms to avoid the analytical pessimism that is necessary to tolerate timing anomalies, namely, segment release time enforcement and segment priority modification. Chapter 6 examines cause-effect chains. While for typical task models the real-time requirement is given on the task-level, for cause-effect chains the end-to-end latency is considered. We prove fundamental properties of the end-to-end latency, namely the compositional property and the equivalence of typical metrics. Furthermore, we provide novel analyses of the worst-case and probabilistic end-to-end latency. For the worst-case end-to-end latency, we focus on a property-based approach, uncovering fundamental principles of literature results and using our insights from the fundamental properties to derive novel solutions. For the probabilistic end-to-end latency, we are one of the first to define a metric. Therefore, we focus on potential pitfalls and conduct careful analysis.
Parallel path progression DAG scheduling
(2023-05-25) Ueter, Niklas; Günzel, Mario; Brüggen, Georg von der; Chen, Jian-Jia
Increasing performance needs of modern cyber-physical systems leads to multiprocessor architectures being increasingly utilized. To efficiently exploit their potential parallelism in hard real-time systems, appropriate task models and scheduling algorithms that allow to provide timing guarantees are required. Such scheduling algorithms and the corresponding worst-case response time analyses usually suffer from resource over-provisioning due to pessimistic analyses based on worst-case assumptions. Hence, scheduling algorithms and analyses with high resource efficiency are required. A prominent fine-grained parallel task model is the directed-acyclic-graph (DAG) task model that is composed of precedence constrained subjobs. This paper studies the hierarchical real-time scheduling problem of sporadic arbitrary-deadline DAG tasks. We propose a parallel path progression scheduling property that is implemented with only two distinct subtask priorities, which allows to quantify the parallel execution of a user chosen collection of complete paths in the response time analysis. This novel approach significantly improves the state-of-the-art response time analyses for parallel DAG tasks for highly parallel DAG structures and can provably exhaust large core numbers. Two hierarchical scheduling algorithms are designed based on this property, extending the parallel path progression properties and improve the response time analysis for sporadic arbitrary-deadline DAG task sets.
Software exploitation of traditional interfaces for modern technologies
(2024) Hakert, Christian; Chen, Jian-Jia; Castrillon, Jeronimo
Modern computer Technologies are skyrocketing to spheres, which frequently seemed unimaginable years ago. Quantum effect petabyte-sized storage devices or deep cache hierarchies, acting within nanoseconds, make only a few examples. At the same time, interfaces to communicate with such technologies are settled and remain largely unaffected by the technology development. While loading and storing a word to a given memory address was the standard interface to communicate with memory devices in very early stages of computer systems, it still features a similar shape nowadays. Unsurprisingly, modern computing technologies come with increasing demand of management, such as lifetime management for NON-VOLATILE MEMORY (NVM) or prefetching and eviction strategies for cache hierarchies. Leaving this management to the hardware solely provides a limited design space and space for optimization. Consequently, soft- ware has to find ways, which allow an either direct or indirect management of the technologies over the traditional interfaces. This dissertation picks up this need and studies selected modern technologies and their need for management. Methods are presented in this thesis, which systematically exploit existing traditional interfaces in order to provide extended functionalities for the management of modern technologies. The exploitations in this thesis are solely conducted on a software level and do not require any actions in the available hardware. In a first part, memory technologies are picked up as a target technology. In greater detail, NON-VOLATILE MEMORY (NVM) is studied. This thesis discusses the lifetime issue of these technologies and the resulting need for wear-leveling. Various approaches are introduced, which allow different forms of wear-leveling on different levels of the software. This ranges from wear-leveling procedures inside the operating system and the system software towards direct application integration to extend the memory lifetime. Apart from the lifetime issue, the latency and energy property of a specific type of emerging memory, namely RACETRACK MEMORY (RTM), is considered. Dedicated to the application of RANDOM FOREST (RF) models, the access properties are optimized in the application level directly. In the last part of this thesis, the focus is moved from memories to arithemtic compuation. RANDOM FOREST (RF) models are kept as a target application and their execution on modern computation technologies is considered. The usage of floating- point numbers is put to a major focus and the memory behavior of floating-point numbers is optimized. By proposing alternative computation schemes for floating-point numbers, which are entirely realized in software and leave the hardware untouched, significant performance improvement is gained.
Unlocking efficiency in BNNs: global by local thresholding for analog-based HW accelerators
(2023-09-14) Yayla, Mikail; Frustaci, Fabio; Spagnolo, Fanny; Chen, Jian-Jia; Amrouch, Hussam
For accelerating Binarized Neural Networks (BNNs), analog computing-based crossbar accelerators, utilizing XNOR gates and additional interface circuits, have been proposed. Such accelerators demand a large amount of analog-to-digital converters (ADCs) and registers, resulting in expensive designs. To increase the inference efficiency, the state of the art divides the interface circuit into an Analog Path (AP), utilizing (cheap) analog comparators, and a Digital Path (DP), utilizing (expensive) ADCs and registers. During BNN execution, a certain path is selectively triggered. Ideally, as inference via AP is more efficient, it should be triggered as often as possible. However, we reveal that, unless the number of weights is very small, the AP is rarely triggered. To overcome this, we propose a novel BNN inference scheme, called Local Thresholding Approximation (LTA). It approximates the global thresholdings in BNNs by local thresholdings. This enables the use of the AP through most of the execution, which significantly increases the interface circuit efficiency. In our evaluations with two BNN architectures, using LTA reduces the area by 42x and 54x, the energy by 2.7x and 4.2x, and the latency by 3.8x and 1.15x, compared to the state-of-the-art crossbar-based BNN accelerators.
Probabilistic reaction time analysis
(2023-09-09) Günzel, Mario; Ueter, Niklas; Chen, Kuan-Hsun; Brüggen, Georg von der; Chen, Jian-Jia
In many embedded systems, for instance, in the automotive, avionic, or robotics domain, critical functionalities are implemented via chains of communicating recurrent tasks. To ensure safety and correctness of such systems, guarantees on the reaction time, that is, the delay between a cause (e.g., an external activity or reading of a sensor) and the corresponding effect, must be provided. Current approaches focus on the maximum reaction time, considering the worst-case system behavior. However, in many scenarios, probabilistic guarantees on the reaction time are sufficient. That is, it is sufficient to provide a guarantee that the reaction does not exceed a certain threshold with (at least) a certain probability. This work provides such probabilistic guarantees on the reaction time, considering two types of randomness: response time randomness and failure probabilities. To the best of our knowledge, this is the first work that defines and analyzes probabilistic reaction time for cause-effect chains based on sporadic tasks.
A vision for edge AI
(2024) Yayla, Mikail; Chen, Jian-Jia; Teich, Jürgen
Edge Artificial Intelligence is progressively pervading all aspects of our life. However, to perform complex tasks, a massive amount of matrix multiplications needs to be computed. At the same time, the available hardware resources for computation are highly limited. The pressing need for efficiency serves as the motivation for this dissertation. In this dissertation, we propose a vision for highly-resource constrained future intelligent systems that are comprised of robust Binarized Neural Networks operating with approximate memory and approximate computing units, while being able to be trained on the edge.
Analyses and optimizations of timing-constrained embedded systems considering resource synchronization and machine learning approaches
(2023) Shi, Junjie; Chen, Jian-Jia; Biondi, Alessandro
Nowadays, embedded systems have become ubiquitous, powering a vast array of applications from consumer electronics to industrial automation. Concurrently, statistical and machine learning algorithms are being increasingly adopted across various application domains, such as medical diagnosis, autonomous driving, and environmental analysis, offering sophisticated data analysis and decision-making capabilities. As the demand for intelligent and time-sensitive applications continues to surge, accompanied by growing concerns regarding data privacy, the deployment of machine learning models on embedded devices has emerged as an indispensable requirement. However, this integration introduces both significant opportunities for performance enhancement and complex challenges in deployment optimization. On the one hand, deploying machine learning models on embedded systems with limited computational capacity, power budgets, and stringent timing requirements necessitates additional adjustments to ensure optimal performance and meet the imposed timing constraints. On the other hand, the inherent capabilities of machine learning, such as self-adaptation during runtime, prove invaluable in addressing challenges encountered in embedded systems, aiding in optimization and decision-making processes. This dissertation introduces two primary modifications for the analyses and optimizations of timing-constrained embedded systems. For one thing, it addresses the relatively long access times required for shared resources of machine learning tasks. For another, it considers the limited communication resources and data privacy concerns in distributed embedded systems when deploying machine learning models. Additionally, this work provides a use case that employs a machine learning method to tackle challenges specific to embedded systems. By addressing these key aspects, this dissertation contributes to the analysis and optimization of timing-constrained embedded systems, considering resource synchronization and machine learning models to enable improved performance and efficiency in real-time applications with stringent constraints.
Complex scheduling models and analyses for property-based real-time embedded systems
(2023) Ueter, Niklas; Chen, Jian-Jia; Li, Jing
Modern multi core architectures and parallel applications pose a significant challenge to the worst-case centric real-time system verification and design efforts. The involved model and parameter uncertainty contest the fidelity of formal real-time analyses, which are mostly based on exact model assumptions. In this dissertation, various approaches that can accept parameter and model uncertainty are presented. In an attempt to improve predictability in worst-case centric analyses, the exploration of timing predictable protocols are examined for parallel task scheduling on multiprocessors and network-on-chip arbitration. A novel scheduling algorithm, called stationary rigid gang scheduling, for gang tasks on multiprocessors is proposed. In regard to fixed-priority wormhole-switched network-on-chips, a more restrictive family of transmission protocols called simultaneous progression switching protocols is proposed with predictability enhancing properties. Moreover, hierarchical scheduling for parallel DAG tasks under parameter uncertainty is studied to achieve temporal- and spatial isolation. Fault-tolerance as a supplementary reliability aspect of real-time systems is examined, in spite of dynamic external causes of fault. Using various job variants, which trade off increased execution time demand with increased error protection, a state-based policy selection strategy is proposed, which provably assures an acceptable quality-of-service (QoS). Lastly, the temporal misalignment of sensor data in sensor fusion applications in cyber-physical systems is examined. A modular analysis based on minimal properties to obtain an upper-bound for the maximal sensor data time-stamp difference is proposed.
Memory carousel: LLVM-based bitwise wear leveling for nonvolatile main memory
(2022-12-14) Hölscher, Nils; Hakert, Christian; Nassar, Hassan; Chen, Kuan-Hsun; Bauer, Lars; Chen, Jian-Jia; Henkel, Jörg
Emerging nonvolatile memory yields, alongside many advantages, technical shortcomings, such as reduced cell lifetime. Although many wear-leveling approaches exist to extend the lifetime of such memories, usually a tradeoff for the granularity of wear leveling has to be made. Due to iterative write schemes (repeatedly sense and write), wear out of memory in certain systems is directly dependent on the written bit value and thus can be highly imbalanced, requiring dedicated bit-wise wear leveling. Such a bit-wise wear leveling so far has only be proposed together with a special hardware support. However, if no dedicated hardware solutions are available, especially for commercial off-the-shelf systems with nonvolatile memories, a software solution can be crucial for the system lifetime. In this work, we propose entirely software-based bit-wise wear leveling, where the position of bits within CPU words in the main memory is rotated on a regular basis. We leverage the LLVM intermediate representation to adjust load and store operations of the application with a custom compiler pass. Experimental evaluation shows that the lifetime by applying local rotation within the CPU word can be extended by a factor of up to 21× . We also show that our method can incorporate with coarser-grained wear leveling, e.g., on block granularity and assist achievement of higher lifetime improvements.
Special issue on practical and robust design of real-time systems
(2022-08-20) Chen, Jian-Jia; Shrivastava, Aviral
MODES: model-based optimization on distributed embedded systems
(2021-06-04) Shi, Junjie; Bian, Jiang; Richter, Jakob; Chen, Kuan-Hsun; Rahnenführer, Jörg; Xiong, Haoyi; Chen, Jian-Jia
The predictive performance of a machine learning model highly depends on the corresponding hyper-parameter setting. Hence, hyper-parameter tuning is often indispensable. Normally such tuning requires the dedicated machine learning model to be trained and evaluated on centralized data to obtain a performance estimate. However, in a distributed machine learning scenario, it is not always possible to collect all the data from all nodes due to privacy concerns or storage limitations. Moreover, if data has to be transferred through low bandwidth connections it reduces the time available for tuning. Model-Based Optimization (MBO) is one state-of-the-art method for tuning hyper-parameters but the application on distributed machine learning models or federated learning lacks research. This work proposes a framework MODES that allows to deploy MBO on resource-constrained distributed embedded systems. Each node trains an individual model based on its local data. The goal is to optimize the combined prediction accuracy. The presented framework offers two optimization modes: (1) MODES-B considers the whole ensemble as a single black box and optimizes the hyper-parameters of each individual model jointly, and (2) MODES-I considers all models as clones of the same black box which allows it to efficiently parallelize the optimization in a distributed setting. We evaluate MODES by conducting experiments on the optimization for the hyper-parameters of a random forest and a multi-layer perceptron. The experimental results demonstrate that, with an improvement in terms of mean accuracy (MODES-B), run-time efficiency (MODES-I), and statistical stability for both modes, MODES outperforms the baseline, i.e., carry out tuning with MBO on each node individually with its local sub-data set.
Correspondence article: counterexample for suspension-aware schedulability analysis of EDF scheduling
(2020-08-18) Günzel, Mario; Chen, Jian-Jia
A note on slack enforcement mechanisms for self-suspending tasks
(2021-01-27) Günzel, Mario; Chen, Jian-Jia
This paper provides counterexamples for the slack enforcement mechanisms to handle segmented self-suspending real-time tasks by Lakshmanan and Rajkumar (Proceedings of the Real-Time and Embedded Technology and Applications Symposium (RTAS), pp 3–12, 2010).
Nanoparticle classification using frequency domain analysis on resource-limited platforms
(2019-09-24) Yayla, Mikail; Toma, Anas; Chen, Kuan-Hsun; Lenssen, Jan Eric; Shpacovitch, Victoria; Hergenröder, Roland; Weichert, Frank; Chen, Jian-Jia
A mobile system that can detect viruses in real time is urgently needed, due to the combination of virus emergence and evolution with increasing global travel and transport. A biosensor called PAMONO (for Plasmon Assisted Microscopy of Nano-sized Objects) represents a viable technology for mobile real-time detection of viruses and virus-like particles. It could be used for fast and reliable diagnoses in hospitals, airports, the open air, or other settings. For analysis of the images provided by the sensor, state-of-the-art methods based on convolutional neural networks (CNNs) can achieve high accuracy. However, such computationally intensive methods may not be suitable on most mobile systems. In this work, we propose nanoparticle classification approaches based on frequency domain analysis, which are less resource-intensive. We observe that on average the classification takes 29 μ s per image for the Fourier features and 17 μ s for the Haar wavelet features. Although the CNN-based method scores 1–2.5 percentage points higher in classification accuracy, it takes 3370 μ s per image on the same platform. With these results, we identify and explore the trade-off between resource efficiency and classification performance for nanoparticle classification of images provided by the PAMONO sensor.
Realistic scheduling models and analyses for advanced real-time embedded systems
(2019) Brüggen, Georg von der; Chen, Jian-Jia; Davis, Robert I.
Focusing on real-time scheduling theory, the thesis demonstrates how essential realistic scheduling models and analyses are when guaranteeing timing correctness without over-provisioning the necessary system resources. It details potential pitfalls of the de facto standards for theoretical examination of scheduling algorithms and schedulability tests, namely resource augmentation bounds and utilization bounds, and proposes parametric augmentation functions to improve their meaningfulness. Considering uncertain execution behaviour, systems with dynamic real-time guarantees are introduced to model this scenario more realistically than mixed-criticality systems, and the first technique that allows to precisely calculate the worst-case deadline failure probability for task sets with a realistic number of tasks is provided. Furthermore, hybrid self-suspension models are proposed that bridge the gap between the over-flexible dynamic and the over-restrictive segmented self-suspension model with different tradeoffs between accuracy and flexibility.
Optimization and analysis for dependable application software on unreliable hardware platforms
(2019) Chen, Kuan-Hsun; Chen, Jian-Jia; Ernst, Rolf
As chip technology keeps on shrinking towards higher densities and lower operating vol- tages, memory and logic components are now vulnerable to electromagnetic inference and radiation, leading to transient faults in the underlying hardware, which may jeopar- dize the correctness of software execution and cause so-called soft errors. To mitigate threats of soft errors, embedded-software developers have started to deploy Software- Implemented Hardware Fault Tolerance (SIHFT) techniques. However, the main cost is the signi cant amount of time due to the additional computation of using SIHFT techniques. To support safety critical systems, e.g., computing systems in automotive and avionic devices, real-time system technology has been primarily used and been wi- dely studied. While considering hardware transient faults and SIHFT techniques with real-time system technology, novel scheduling approaches and schedulability analyses are desired to provide a less pessimistic o -line guarantee for timeliness or at least to provide a certain degree of performance for new application models. Moreover, reliability optimizations also need to be designed thoughtfully while considering di erent resource constraints. In this dissertation, we present three treatments for soft errors. Firstly, we study how to allow erroneous computations without deadline misses by modeling inherent safety margins and noise tolerance in control applications as (m; k) constraints. We further dis- cuss how a given (m; k) requirement can be satis ed by individual error detection and exible compensations while satisfying the given hard real-time constraints. Secondly, we analyze the probability of deadline misses and the deadline miss rate in soft real-time systems, which allow to have occasional deadline misses without erroneous computations. Thirdly, we consider how to deploy redundant multi-threading techniques to improve the system reliability under two di erent system models for multi-core systems: 1) Under core-to-core frequency variations, we address the reliability-aware task-mapping problem. 2) We decide on redundancy levels for each task while satisfying the given real-time constraints and the limited redundant cores even under multi-tasking. Finally, an enhan- cement for real time operating systems is also provided to maintain the strict periodicity for task overruns due to potential transient faults, especially on one popular platform named Real-Time Executive for Multiprocessor Systems (RTEMS).
Methods for efficient resource utilization in statistical machine learning algorithms
(2018) Kotthaus, Helena; Marwedel, Peter; Rahnenführer, Jörg
In recent years, statistical machine learning has emerged as a key technique for tackling problems that elude a classic algorithmic approach. One such problem, with a major impact on human life, is the analysis of complex biomedical data. Solving this problem in a fast and efficient manner is of major importance, as it enables, e.g., the prediction of the efficacy of different drugs for therapy selection. While achieving the highest possible prediction quality appears desirable, doing so is often simply infeasible due to resource constraints. Statistical learning algorithms for predicting the health status of a patient or for finding the best algorithm configuration for the prediction require an excessively high amount of resources. Furthermore, these algorithms are often implemented with no awareness of the underlying system architecture, which leads to sub-optimal resource utilization. This thesis presents methods for efficient resource utilization of statistical learning applications. The goal is to reduce the resource demands of these algorithms to meet a given time budget while simultaneously preserving the prediction quality. As a first step, the resource consumption characteristics of learning algorithms are analyzed, as well as their scheduling on underlying parallel architectures, in order to develop optimizations that enable these algorithms to scale to larger problem sizes. For this purpose, new profiling mechanisms are incorporated into a holistic profiling framework. The results show that one major contributor to the resource issues is memory consumption. To overcome this obstacle, a new optimization based on dynamic sharing of memory is developed that speeds up computation by several orders of magnitude in situations when available main memory is the bottleneck, leading to swapping out memory. One important application that can be applied for automated parameter tuning of learning algorithms is model-based optimization. Within a huge search space, algorithm configurations are evaluated to find the configuration with the best prediction quality. An important step towards better managing this search space is to parallelize the search process itself. However, a high runtime variance within the configuration space can cause inefficient resource utilization. For this purpose, new resource-aware scheduling strategies are developed that efficiently map evaluations of configurations to the parallel architecture, depending on their resource demands. In contrast to classical scheduling problems, the new scheduling interacts with the configuration proposal mechanism to select configurations with suitable resource demands. With these strategies, it becomes possible to make use of the full potential of parallel architectures. Compared to established parallel execution models, the results show that the new approach enables model-based optimization to converge faster to the optimum within a given time budget.
Efficient implementation of resource-constrained cyber-physical systems using multi-core parallelism
(2018) Neugebauer, Olaf; Marwedel, Peter; Müller, Heinrich
The quest for more performance of applications and systems became more challenging in the recent years. Especially in the cyber-physical and mobile domain, the performance requirements increased significantly. Applications, previously found in the high-performance domain, emerge in the area of resource-constrained domain. Modern heterogeneous high-performance MPSoCs provide a solid foundation to satisfy the high demand. Such systems combine general processors with specialized accelerators ranging from GPUs to machine learning chips. On the other side of the performance spectrum, the demand for small energy efficient systems exposed by modern IoT applications increased vastly. Developing efficient software for such resource-constrained multi-core systems is an error-prone, time-consuming and challenging task. This thesis provides with PA4RES a holistic semiautomatic approach to parallelize and implement applications for such platforms efficiently. Our solution supports the developer to find good trade-offs to tackle the requirements exposed by modern applications and systems. With PICO, we propose a comprehensive approach to express parallelism in sequential applications. PICO detects data dependencies and implements required synchronization automatically. Using a genetic algorithm, PICO optimizes the data synchronization. The evolutionary algorithm considers channel capacity, memory mapping, channel merging and flexibility offered by the channel implementation with respect to execution time, energy consumption and memory footprint. PICO's communication optimization phase was able to generate a speedup almost 2 or an energy improvement of 30% for certain benchmarks. The PAMONO sensor approach enables a fast detection of biological viruses using optical methods. With a sophisticated virus detection software, a real-time virus detection running on stationary computers was achieved. Within this thesis, we were able to derive a soft real-time capable virus detection running on a high-performance embedded system, commonly found in today's smart phones. This was accomplished with smart DSE algorithm which optimizes for execution time, energy consumption and detection quality. Compared to a baseline implementation, our solution achieved a speedup of 4.1 and 87\% energy savings and satisfied the soft real-time requirements. Accepting a degradation of the detection quality, which still is usable in medical context, led to a speedup of 11.1. This work provides the fundamentals for a truly mobile real-time virus detection solution. The growing demand for processing power can no longer satisfied following well-known approaches like higher frequencies. These so-called performance walls expose a serious challenge for the growing performance demand. Approximate computing is a promising approach to overcome or at least shift the performance walls by accepting a degradation in the output quality to gain improvements in other objectives. Especially for a safe integration of approximation into existing application or during the development of new approximation techniques, a method to assess the impact on the output quality is essential. With QCAPES, we provide a multi-metric assessment framework to analyze the impact of approximation. Furthermore, QCAPES provides useful insights on the impact of approximation on execution time and energy consumption. With ApproxPICO we propose an extension to PICO to consider approximate computing during the parallelization of sequential applications.
Memory-aware platform description and framework for source-level embedded MPSoC software optimization
(2017) Pyka, Robert; Marwedel, Peter; Teubner, Jens
Developing optimizing source-level transformations, consists of numerous non-trivial subtasks. Besides identifying actual optimization goals within a particular target-platform and compiler setup, the actual implementation is a tedious, error-prone and often recurring work. Providing appropriate support for this development work is a challenging task. Defining and implementing a well-suited target-platform description which can be used by a wide set of optimization techniques while being precise and easy to maintain is one dimension of this challenging task. Another dimension, which has also been tackled in this work, deals with provision of an infrastructure for optimization-step representation, interaction and data retention. Finally, an appropriate source-code representation has been integrated into this approach. These contributions are tightly related to each other, they have been bundled into the MACCv2 framework, a fullfledged optimization-technique implementation and integration approach. Together, they significantly alleviate the effort required for implementation of source-level memory-aware optimization techniques for Multi Processor Systems on a Chip (MPSoCs). The system-modeling approach presented in this dissertation has been located at the processor-memory-switch (PMS) abstraction level. It offers a novel combined structural and semantical description. It combines a locally-scoped, structural modeling approach, as preferred by system designers, and a fast, database-like interface, best suited for optimization technique developers. It supports model refinement and requires only limited effort for an initial abstract system model. The general structure consists of components and channels. Based on this structure, the system model provides mechanisms for database-like access to system-global target-platform properties, while requiring only definition of locally-scoped input data annotated to system-model items. A typical set of these properties contains energy-consumption and access-latency values. The request-based retrieval of system properties is a unique feature, which makes this approach superior to state-of-the-art table-lookup-based or full-system-simulation-based approaches. Combining such component-local properties to system-global target-platform data is performed via aspect handlers. These handlers define computational rules which are applied to correlated locally-scoped data along access paths in the memory-subsystem hierarchy. This approach is capable of calculating these system-global values at a rate similar to plain table lookups, while maintaining a precision close to full-system-simulation-based estimations. This has been shown for both, energy-consumption values as well as access-latency values of the MPARM platform. The MACCv2 framework provides a set of fundamental services to the optimization technique developer. On top of these services, a system model and source-code representation are provided. Further, framework-based optimization-technique implementations are encapsulated into self-contained entities exposing well-defined interfaces. This framework has been successfully used within the European Commission funded MNEMEE project. The hierarchical processing-step representation in MACCv2 allows for encapsulation of tasks at various granularity levels. For simplified reuse in future projects, the entire toolchain as well as individual optimization techniques have been represented as processing-step entities in terms of MACCv2. A common notion of target-platform structure and properties as well as inter-processing-step communication, is achieved via framework-provided services. The system-modeling approach and the framework show the right set of properties needed to support development of memory-aware optimization techniques. The MNEMEE project, continued research work, teaching activities and PhD theses have been successfully founded on approaches and the framework proposed in this dissertation.
Scheduling algorithms and timing analysis for hard real-time systems
(2017) Huang, Wen-Hung Kevin; Chen, Jian-Jia; Reineke, Jan
Real-time systems are designed for applications in which response time is critical. As timing is a major property of such systems, proving timing correctness is of utter importance. To achieve this, a two-fold approach of timing analysis is traditionally involved: (i) worst-case execution time (WCET) analysis, which computes an upper bound on the execution time of a single job of a task running in isolation; and (ii) schedulability analysis using the WCET as the input, which determines whether multiple tasks are guaranteed to meet their deadlines. Formal models used for representing recurrent real-time tasks have traditionally been characterized by a collection of independent jobs that are released periodically. However, such a modeling may result in resource under-utilization in systems whose behaviors are not entirely periodic or independent. Examples are (i) multicore platforms where tasks share a communication fabric, like bus, for accesses to a shared memory beside processors; (ii) tasks with synchronization, where no two concurrent access to one shared resource are allowed to be in their critical section at the same time; and (iii) automotive systems, where tasks are linked to rotation (e.g., of the crankshaft, gears, or wheels). There, their activation rate is proportional to the angular velocity of a specific device. This dissertation presents multiple approaches towards designing scheduling algorithms and schedulability analysis for a variety of real-time systems with different characteristics. Specifically, we look at those design problems from the perspective of speedup factor — a metric that quantifies both the pessimism of the analysis and the non-optimality of the scheduling algorithm. The proposed solutions are shown promising by means of not only speedup factor but also extensive evaluations.

Browse

Recent Submissions