Publication: Exploring Memory Error Vulnerability for Parallel Programming Models
| dc.contributor.authors | Oz, Isil; Gil, Marisa; Utrera, Gladys; Martorell, Xavier | |
| dc.contributor.editor | Wyrzykowski, R | |
| dc.contributor.editor | Deelman, E | |
| dc.contributor.editor | Dongarra, J | |
| dc.contributor.editor | Karczewski, K | |
| dc.contributor.editor | Kitowski, J | |
| dc.contributor.editor | Wiatr, K | |
| dc.date.accessioned | 2022-03-12T16:16:27Z | |
| dc.date.accessioned | 2026-01-11T14:28:08Z | |
| dc.date.available | 2022-03-12T16:16:27Z | |
| dc.date.issued | 2016 | |
| dc.description.abstract | Transistor size reduction and more aggressive power modes in HPC platforms make chip components more error prone. In this context, HPC applications can have a diverse level of tolerance to memory errors that may change the execution in different ways. As the tolerance to memory errors depends on write frequency and access patterns, different programming models may exhibit a different behavior in the rate of failures and alleviate the performance loss caused by the overhead of fault-tolerance mechanisms. In this paper, we explore how tolerant to memory errors are two main parallel programming models, message-passing and shared memory: we perform a memory vulnerability analysis and also conduct error propagation experiments to observe the effect of memory errors through program flow. Our results show the need for soft error resiliency methods based on memory behavior of programs, and the evaluation of the tradeoffs between performance and reliability. | |
| dc.identifier.doi | 10.1007/978-3-319-32149-3_1 | |
| dc.identifier.eissn | 1611-3349 | |
| dc.identifier.isbn | 978-3-319-32149-3; 978-3-319-32148-6 | |
| dc.identifier.issn | 0302-9743 | |
| dc.identifier.uri | https://hdl.handle.net/11424/225755 | |
| dc.identifier.wos | WOS:000400134500001 | |
| dc.language.iso | eng | |
| dc.publisher | SPRINGER INTERNATIONAL PUBLISHING AG | |
| dc.relation.ispartof | PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT I | |
| dc.relation.ispartofseries | Lecture Notes in Computer Science | |
| dc.rights | info:eu-repo/semantics/closedAccess | |
| dc.subject | Memory errors | |
| dc.subject | Reliability | |
| dc.subject | SDC | |
| dc.subject | Programming models | |
| dc.title | Exploring Memory Error Vulnerability for Parallel Programming Models | |
| dc.type | conferenceObject | |
| dspace.entity.type | Publication | |
| oaire.citation.endPage | 11 | |
| oaire.citation.startPage | 3 | |
| oaire.citation.title | PARALLEL PROCESSING AND APPLIED MATHEMATICS, PPAM 2015, PT I | |
| oaire.citation.volume | 9573 |
