TensorCores are specialized hardware for matrix multiplication and addition and are available on the latest NVIDIA GPUs.
Converting input matrices to half precision on TensorCores results loss of accuracy.
We recover the accuracy using an error correction technique and avoiding the rounding inside Tensor Cores (RZ).
See
our paper for more detail.