High-Performance Mixed Precision Numerical Linear Algebra

  • Date:
  • Time: 14:00 - 15:30
  • Address:
    Online via Zoom
  • Room:
  • Speaker: Erin Claire Carson

Support for floating point arithmetic in multiple precisions is becoming increasingly common in emerging architectures. For example, half precision is now available in the NVIDIA V100 and A100 GPUs, on which it runs twice as fast as single precision with a proportional savings in energy consumption. Further, using the specialized half-precision tensor cores can provide up to 16x speedup over double precision computations. Mixed precision capabilities are already included in many machines on the TOP500 list and are expected to be a crucial hardware feature in coming exascale machines. From a computational scientists perspective, our goal is to determine how and where we can exploit mixed precision computation in our codes. This requires both an understanding of performance characteristics as well as an understanding of the numerical behavior of algorithms in finite precision arithmetic. In this talk, we discuss recent and ongoing efforts in this area. In particular, we present and analyze a general algorithm for solving nxn nonsingular linear systems Ax = b based on iterative refinement in three precisions. From this, we develop GMRES-IR, a three-precision GMRES-based iterative refinement scheme that works for even ill-conditioned systems. We discuss performance results on modern GPU architectures and present the HPL-AI benchmark, based on GMRES-IR, which runs at 445 Petaflops/s in mixed precision on the Summit supercomputer, nearly triple the 148 PFLOP/s that Summit achieved on the standard HPL benchmark used for the TOP500.