Publication:
Exploring Memory Error Vulnerability for Parallel Programming Models

dc.contributor.authorsOz, Isil; Gil, Marisa; Utrera, Gladys; Martorell, Xavier
dc.contributor.editorWyrzykowski, R
dc.contributor.editorDeelman, E
dc.contributor.editorDongarra, J
dc.contributor.editorKarczewski, K
dc.contributor.editorKitowski, J
dc.contributor.editorWiatr, K
dc.date.accessioned2022-03-12T16:16:27Z
dc.date.accessioned2026-01-11T14:28:08Z
dc.date.available2022-03-12T16:16:27Z
dc.date.issued2016
dc.description.abstractTransistor size reduction and more aggressive power modes in HPC platforms make chip components more error prone. In this context, HPC applications can have a diverse level of tolerance to memory errors that may change the execution in different ways. As the tolerance to memory errors depends on write frequency and access patterns, different programming models may exhibit a different behavior in the rate of failures and alleviate the performance loss caused by the overhead of fault-tolerance mechanisms. In this paper, we explore how tolerant to memory errors are two main parallel programming models, message-passing and shared memory: we perform a memory vulnerability analysis and also conduct error propagation experiments to observe the effect of memory errors through program flow. Our results show the need for soft error resiliency methods based on memory behavior of programs, and the evaluation of the tradeoffs between performance and reliability.
dc.identifier.doi10.1007/978-3-319-32149-3_1
dc.identifier.eissn1611-3349
dc.identifier.isbn978-3-319-32149-3; 978-3-319-32148-6
dc.identifier.issn0302-9743
dc.identifier.urihttps://hdl.handle.net/11424/225755
dc.identifier.wosWOS:000400134500001
dc.language.isoeng
dc.publisherSPRINGER INTERNATIONAL PUBLISHING AG
dc.relation.ispartofPARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT I
dc.relation.ispartofseriesLecture Notes in Computer Science
dc.rightsinfo:eu-repo/semantics/closedAccess
dc.subjectMemory errors
dc.subjectReliability
dc.subjectSDC
dc.subjectProgramming models
dc.titleExploring Memory Error Vulnerability for Parallel Programming Models
dc.typeconferenceObject
dspace.entity.typePublication
oaire.citation.endPage11
oaire.citation.startPage3
oaire.citation.titlePARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT I
oaire.citation.volume9573

Files