Whereas, CUDA programming focuses more on data parallelism. More specifically, large data can be handled using GPU where data is mapped to threads. Following diagram shows the architecture of CPU ...
Example: Each element of array a[0:N-1] is incremented by 1. On a multicore CPU: use coarse-grained parallelism; #threads = #cores. On a GPU: use fine-grained parallelism; one thread per data element!
NVIDIA’s CUDA is a general purpose parallel computing platform and programming model that accelerates deep learning and other compute-intensive apps by taking advantage of the parallel processing ...