DescriptionPower consumption and process variability are two important, interconnected, challenges of future generation large-scale HPC data-centers. Current production petaflop supercomputers consume more than 10 megawatts of power that costs millions of dollars every year. As HPC moves towards exascale, power consumption is expected to become a major concern. Not solely dynamic behavior of HPC applications (such as irregular or imbalanced applications) but also dynamic behavior of HPC systems (such as thermal, power, frequency variations among processors) makes it challenging to optimize the performance and power efficiency of large scale applications. Smart and adaptive runtime systems have great potential to handle these challenges transparently from the application.
In my thesis, I first analyze frequency, temperature, and power variations in large-scale HPC systems using thousands of cores and different applications. After I identify the cause of these variations, I propose solutions to mitigate them to improve performance and power efficiency.