This course will introduce you to the multiple forms of parallelism found in modern Intel architecture processors and teach you the programming frameworks for handling this parallelism in applications. You will get access to a cluster of modern manycore processors (Intel Xeon Phi architecture) for experiments with graded programming exercises.
This course can apply to various HPC and datacenter workloads and framework including artificial intelligence (AI). You will learn how to handle data parallelism with vector instructions, task parallelism in shared memory with threads, parallelism in distributed memory with message passing, and memory architecture parallelism with optimized data containers. This knowledge will help you to accelerate computational applications by orders of magnitude, all the while keeping your code portable and future-proof.
Who is this class for: For developers of machine learning libraries and frameworks. For innovators looking to combine machine learning and traditional computing in artificial intelligence systems. For engineers, students and researchers in computational disciplines interested in parallel computing.
In the Introduction we will learn…
Graded: Modern code
Graded: Hello World
Graded: Vectorizing Monte-Carlo Diffusion
Multithreading with OpenMP
Graded: Multithreaded Filtering
Graded: Memory traffic
Graded: Batch FFTs in HBM
Clusters and MPI
Graded: MPI String Vibration
ENROLL IN COURSE