A CUDA C/C++ implementation comparing different loop unrolling strategies for matrix multiplication on GPU. This project demonstrates the performance impact of various loop unrolling factors (2, 4, 8, ...
One of the fundamental operations in machine learning is computing the inverse of a square matrix. But not all matrices have an inverse. The most common way to check if a matrix has an inverse or not ...