Publication:
Efficient selective replication of critical code regions for SDC mitigation leveraging redundant multithreading

dc.contributor.authorARSLAN YILMAZ, SANEM
dc.contributor.authorsArslan, Sanem; Unsal, Osman
dc.date.accessioned2022-03-12T22:57:20Z
dc.date.accessioned2026-01-10T19:46:11Z
dc.date.available2022-03-12T22:57:20Z
dc.date.issued2021
dc.description.abstractRedundant multithreading (RMT) is an effective reliability solution that provides thread-level replication; however, it imposes additional overheads in terms of performance loss or energy consumption. Partial-RMT is an alternative solution that provides partial redundancy of an executing thread to reduce such overheads while trading off full coverage from faults. In this study, we propose a software-level RMT approach that offers lightweight replication of partial code regions within the same application process. Our software-level RMT approach is particularly suitable for applications with varying code criticality, where we determine the critical code regions by performing a fault injection campaign in addition to execution time profile analysis. Using the results of the previous step, the application programmer annotates the source code to indicate the specific code regions that should be executed redundantly without re-implementing the application program from scratch. Our lightweight software-level RMT tool improves the average silent data corruption (SDC) rate of 30 applications of the PolyBench benchmark suite by around 7.6x with average performance and energy consumption overheads of 22 and 37%, respectively, compared to the original version of the program.
dc.identifier.doi10.1007/s11227-021-03804-6
dc.identifier.eissn1573-0484
dc.identifier.issn0920-8542
dc.identifier.urihttps://hdl.handle.net/11424/237027
dc.identifier.wosWOS:000648823700006
dc.language.isoeng
dc.publisherSPRINGER
dc.relation.ispartofJOURNAL OF SUPERCOMPUTING
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectRedundant multithreading
dc.subjectFault tolerance
dc.subjectSoft error reliability
dc.subjectSoftware reliability
dc.titleEfficient selective replication of critical code regions for SDC mitigation leveraging redundant multithreading
dc.typearticle
dspace.entity.typePublication
oaire.citation.endPage14160
oaire.citation.issue12
oaire.citation.startPage14130
oaire.citation.titleJOURNAL OF SUPERCOMPUTING
oaire.citation.volume77

Files