Learning Triton One Kernel at a Time: Matrix Multiplication

multiplication is undoubtedly the most typical operation carried out by GPUs. It’s the elementary constructing block of linear algebra and exhibits up throughout a large spectrum of various fields resembling graphics, physics simulations and scientific computing whereas being ubiquitous in machine studying.

In at this time’s article, we’ll break down the conceptual implementation of normal matrix-matrix multiplication (GEMM) whereas introducing a number of optimisation ideas resembling tiling and reminiscence coalescing. Lastly, we’ll implement GEMM in Triton!

This text is the second of a sequence on Triton and GPU kernels, In case you are not aware of Triton or want a refresher on GPU fundamentals, try the earlier article! All of the code showcased on this article is obtainable on GitHub.