Tutorial I: Parallel Programming and Performance Optimization on GPUs.

Lectures with hand-on tutorial will be given by Prof. P Sadayappan, Ohio State University and Devi Sudheer Kumar, IBM Research, India.

Objective of tutorial: The one day tutorial will focus on Parallel Programming and Performance Optimization   techniques on GPUs. 

12th December 2015

Time Sessions Topic
8:30 AM- 10:00 AM Session I
  • Review of shared-memory parallel programming with OpenMP
  • Introducing GPU architecture
  • Different types of memory on GPU Multidimensional thread space
  • Introduction to CUDA programming
10:00 AM - 10:30 AM

Tea Break

10:30 AM- 12:00 PM Session II 1 hour 10 min (lecture) + 20 min (hands-on)
  • Scheduling and synchronization on GPUs
  • Fundamental factors affecting performance:
    • Coalesced global memory access
    • Warp occupancy to tolerate global memory latency
    • Contrast in loop optimizations for CPU versus GPU
    • (Forms of the loops in the CUDA reverse of OpenMP [stride 1 vs. stride N])
Hands-on exercises
12:00 PM- 1:00 PM

Lunch Break

1:00 PM- 2:30 PM Session III 1hour (lecture) + 30 min (hands-on exercises)
  • GPU performance optimization
  • Reduction of global memory accesses via shared memory or caching
  • Thread coarsening
  • Choice of effective grid/thread-block size/shape Illustrative examples of performance optimization
Hands-on exercises
2:30 PM- 3:00 PM

Tea Break

3:00 PM- 4:30 PM Session IV GPU performance optimization
  • Minimum Thread divergence.
  • Avoiding bank conflicts
  • Illustrative examples of performance optimization
Hands-on exercises



Lecture outline:

Instructor:

Swami
Prof. P Sadayappan  

Professor, Computer Science & Engineering

Ohio State University




Swami
Devi Sudheer Kumar  

IBM Research, Delhi, India




Email for communication: iscsgpu@googlegroups.com