Dr. Paul V. Gratz, an assistant professor in the Department of Electrical and Computer Engineering at Texas A&M University, received a High-Performance and Embedded Architectures and Compilers (HiPEAC) Paper Award from the HiPEAC Network of Excellence.
Gratz, along with his student Hyungjun Kim, his colleague, Professor Vassos Soteriou at the Cyprus University of Technology, and postdoc Dr. Arseniy Vitkovskiy, received the award for their paper titled "Use it or Lose it: Wear-out and Lifetime in Future Chip Multiprocessors," which appeared in The 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013.
Their paper discusses how in tomorrow's multi-core Chip Multi-Processors (CMPs) with 10s or even 100s of interconnected cores or tiles, prolonged operational stress will give rise to accelerated wearout and failure. This is due to several physical failure mechanisms, including Hot Carrier Injection (HCI) and Negative Bias Temperature Instability (NBTI). Each failure mechanism correlates with different usage-based stresses, all of which can eventually generate permanent faults.
While the wearout of an individual core in many-core CMPs may not necessarily be catastrophic for the system, a single fault in the inter-processor Network-on-Chip (NoC) fabric could render the entire chip useless, as it could lead to protocol-level deadlocks, or even partition away vital components such as the memory controller or other critical I/O.
In their paper, critical path models for HCI- and NBTI-induced wear due to the actual stresses caused by real workloads are developed and applied onto the interconnect microarchitecture. A key finding from this modeling is that wearout in the CMP on-chip interconnect is correlated with lack of load observed in the NoC routers, rather than high load. A novel wearout-decelerating scheme in which routers under low load have their wearout-sensitive components exercised, without significantly impacting cycle time, pipeline depth, area or power consumption of the overall router is developed. The proposed design yields a 13.8x-65x increase in CMP lifetime.
Gratz is a member of the computer engineering and systems group. He received his Ph.D. in electrical and computer engineering from the University of Texas at Austin in 2008. His research interests include power, reliability and performance in multicore and distributed computer architectures, processor memory systems and on-chip interconnection networks. Honors include receiving a Teaching Excellence Award from The Texas A&M University System and a Best Paper Award from the ASPLOS'09 conference.