Optimization and analysis for dependable application software on unreliable hardware platforms
Loading...
Date
2019
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
As chip technology keeps on shrinking towards higher densities and lower operating vol-
tages, memory and logic components are now vulnerable to electromagnetic inference
and radiation, leading to transient faults in the underlying hardware, which may jeopar-
dize the correctness of software execution and cause so-called soft errors. To mitigate
threats of soft errors, embedded-software developers have started to deploy Software-
Implemented Hardware Fault Tolerance (SIHFT) techniques. However, the main cost
is the signi cant amount of time due to the additional computation of using SIHFT
techniques. To support safety critical systems, e.g., computing systems in automotive
and avionic devices, real-time system technology has been primarily used and been wi-
dely studied. While considering hardware transient faults and SIHFT techniques with
real-time system technology, novel scheduling approaches and schedulability analyses
are desired to provide a less pessimistic o -line guarantee for timeliness or at least to
provide a certain degree of performance for new application models. Moreover, reliability
optimizations also need to be designed thoughtfully while considering di erent resource
constraints.
In this dissertation, we present three treatments for soft errors. Firstly, we study how
to allow erroneous computations without deadline misses by modeling inherent safety
margins and noise tolerance in control applications as (m; k) constraints. We further dis-
cuss how a given (m; k) requirement can be satis ed by individual error detection and
exible compensations while satisfying the given hard real-time constraints. Secondly, we
analyze the probability of deadline misses and the deadline miss rate in soft real-time
systems, which allow to have occasional deadline misses without erroneous computations.
Thirdly, we consider how to deploy redundant multi-threading techniques to improve the
system reliability under two di erent system models for multi-core systems: 1) Under
core-to-core frequency variations, we address the reliability-aware task-mapping problem.
2) We decide on redundancy levels for each task while satisfying the given real-time
constraints and the limited redundant cores even under multi-tasking. Finally, an enhan-
cement for real time operating systems is also provided to maintain the strict periodicity
for task overruns due to potential transient faults, especially on one popular platform
named Real-Time Executive for Multiprocessor Systems (RTEMS).
Description
Table of contents
Keywords
Real-time systems, Fault tolerance, Embedded systems