Efficient fault-injection-based assessment of software-implemented hardware fault tolerance

Schirmeier, Horst Benjamin

Efficient fault-injection-based assessment of software-implemented hardware fault tolerance

Files

Dissertation.pdf (4.54 MB)

Date

2016

Authors

Schirmeier, Horst Benjamin

Abstract

With continuously shrinking semiconductor structure sizes and lower supply voltages, the per-device susceptibility to transient and permanent hardware faults is on the rise. A class of countermeasures with growing popularity is Software-Implemented Hardware Fault Tolerance (SIHFT), which avoids expensive hardware mechanisms and can be applied application-specifically. However, SIHFT can, against intuition, cause more harm than good, because its overhead in execution time and memory space also increases the figurative “attack surface” of the system – it turns out that application-specific configuration of SIHFT is in fact a necessity rather than just an advantage. Consequently, target programs need to be analyzed for particularly critical spots to harden. SIHFT-hardened programs need to be measured and compared throughout all development phases of the program to observe reliability improvements or deteriorations over time. Additionally, SIHFT implementations need to be tested. The contributions of this dissertation focus on Fault Injection (FI) as an assessment technique satisfying all these requirements – analysis, measurement and comparison, and test. I describe the design and implementation of an FI tool, named Fail*, that overcomes several shortcomings in the state of the art, and enables research on the general drawbacks of simulation-based FI. As demonstrated in four case studies in the context of SIHFT research, Fail* provides novel fine-grained analysis techniques that exploit the newly gained possibility to analyze FI results from complete fault-space exploration. These analysis techniques aid SIHFT design decisions on the level of program modules, functions, variables, source-code lines, or single machine instructions. Based on the experience from the case studies, I address the problem of large computation efforts that accompany exhaustive fault-space exploration from two different angles: Firstly, I develop a heuristical fault-space pruning technique that allows to freely trade the total FI-experiment count for result accuracy, while still providing information on all possible faultspace coordinates. Secondly, I speed up individual TAP-based FI experiments by improving the fast-forwarding operation by several orders of magnitude for most workloads. Finally, I dissect current practices in FI-based evaluation of SIHFT-hardened programs, identify three widespread pitfalls in the result interpretation, and advance the state of the art by defining a novel comparison metric.

Keywords

Fault injection, Transient memory faults, Software-implemented hardware fault tolerance, Criticality analysis, Fault-tolerance assessment, FAIL*, Fault-similarity pruning, Smart-hopping, Extrapolated absolute failure count, Software-based fault tolerance, Software test

Subjects based on RSWK

Fehlertoleranz, Softwareentwicklung

URI

http://hdl.handle.net/2003/35175
http://dx.doi.org/10.17877/DE290R-17222

Collections

Eingebettete Systemsoftware

Full item page

Efficient fault-injection-based assessment of software-implemented hardware fault tolerance

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Alternative Title(s)

Abstract

Description

Table of contents

Keywords

Subjects based on RSWK

Citation

URI

Collections