HP2C is closed

The content of these web pages will not be updated anymore.

Fact Sheets on HP2C

Two fact sheets presents the HP2C initiative and its projects.

HP2C Training Event: GPU Programming with CUDA Fortran and the PGI Accelerator Programming Model

Community HP2C
Wednesday 29 September 2010 - Thursday 30 September 2010

The HP2C platform announces a 2-day intensive course focused on GPU programming using CUDA Fortran and PGI directives based accelerator programming model (syllabus below). Senior members of the PGI compiler development team, Michael Wolfe and Dave Norton, will be conducting the two full days tutorial and will provide hands-on training. HP2C teams with Fortran code base are expected to participate in this training event. Note that the number of participants is limited to 20.



If you want to participate you can register here (deadline: September 19, 2010).


Prerequisites: CSCS visualization and GPGPU development cluster Eiger will be available for the hands-on training so please ensure that you have an account on this system. For further information contact help@cscs.ch. Participants are expected to bring laptops for hands-on training.


Course Syllabus

1. Introduction

  • CPU architecture vs. GPU architecture
  • CPU architecture basics
  • Multicore and multiprocessor basics
  • GPU architecture basics
  • How is parallel programming for GPUs different than for multicore?
  • What is a GPU thread and how does it execute?

2. CUDA: C and Fortran

  • The CUDA programming model
  • Host code to control GPU, allocate memory, launch kernels
  • Kernel code to execute on GPU
  • The host program
  • Declaring and allocating device memory data
  • Moving data to and from the device
  • Launching kernels
  • Writing kernels
  • What is allowed in a kernel vs. what is not allowed
  • Grids, blocks, threads, warps
  • Building and running CUDA programs
  • Compiler options
  • Running your program
  • The CUDA runtime API
  • CUDA Fortran vs. CUDA C
  • Performance tuning tips and tricks
  • Measuring performance using CUDAPROF
  • Occupancy, memory coalescing
  • Optimizing your kernels
  • Optimize communication between host and GPU
  • Optimize device memory accesses, shared memory usage
  • Optimize the kernel code
  • Debugging using emulation

3. PGI Accelerator Programming Model

  • High-level GPU programming using the PGI Accelerator model
  • What role does a high-level model play?
  • Basic concepts and directive syntax
  • Accelerator compute and data regions
  • Appropriate algorithms for a GPU
  • Building and running PGI Accelerator programs
  • Command line options
  • Enabling and interpreting compiler feedback
  • Using the PGPROF source browser
  • Data movement feedback
  • Reading kernel schedules
  • Accelerator directive details
  • Compute regions
  • Clauses on the compute region directive
  • What can appear in a compute region
  • Obstacles to successful acceleration
  • Loop directive
  • Clauses on the loop directive
  • Loop schedules
  • Data regions
  • Clauses on the data region directive
  • Performance tuning tips and tricks
  • PGI Unified Binary for multiple host or multiple accelerators
  • Performance profiling information
  • Selecting an appropriate algorithm
  • Optimizing data movement between host and GPU
  • Optimizing kernel performance
  • Tuning the kernel schedule
  • Optimizing initialization time

4. Wrap-up and Questions

  • Accelerators in HPC
  • Past, present, future role of accelerators in HPC
  • Past, present, future of programming models for accelerators
  • How to reach an exaflop