Large scale Density Functional Electronic Structure Calculations in a Systematic Wavelet Basis Set
Solving the electronic Schroedinger equation is the basis for the solution of the majority of problems in chemistry, solid state physics, materials science, nano sciences and molecular biology. Even though density functional calculations are nowadays standard for small systems they are at present too slow for large systems with more than some thousand atoms.
The BigDFT electronic structure code is a recently developed DFT electronic structure code which uses Daubechies wavelets as a basis set. Wavelets combine the advantages of Gaussian basis sets and plane wave basis sets. Wavelets are adaptive and localized in real space as are Gaussians and at the same time they form like plane waves a systematic basis set. A systematic basis set is a basis set that allows all quantities to be calculated with arbitrarily high accuracy for a sufficiently large basis set.
The BigDFT code is at present a mixed MPI/OpenMP code. Since one MPI process treats one or several Kohn-Sham orbitals the number of MPI processes can not exceed the number of orbitals. In order to scale this code to some $10^5$ or $10^6$ cores the following possibilities exist. In the present MPI/OpenMP parallelization one MPI process is executed on more cores of a node. Since the number of cores per node will increase strongly the code will run faster once more cores become available. Given the modest speedups of our (and other) codes within OpenMP, it is however questionable whether OpenMP is a promising approach. A large part of the version of the code for periodic boundary conditions has already been ported to GPU's. On a GPU multicore architecture the speedup is considerable ( between 20 and 30 at present) in double precision.
When the computing speed of a single node increases by a large amount by using GPU's, the relative cost of the communication part will increase strongly in our code unless the bandwidth of the communication network increases by an equal amount as the single node speed. This will presumably not happen. In order to reduce the time to solution for constant system sizes one of the most important task in this project will therefore beto design and implement new communication algorithms which will accelerate the communication part.
Scaling to a very large number of cores can also be achieved if the system size, i.e. the number of electrons is increased. For more than some 1000 atoms a linear scaling approach is recommended. In this case the electronic orbitals are not any more extended over the whole system but localized in smaller subvolumes. Such a localization will lead to new communication patterns which are less global than in a traditional cubically scaling algorithms.
A second, and relatively easy, method to scale to a very large number of processors is to combine a electronic structure calculation with a specific application which will lead to an additional level of parallelization. We will combine a parallel global optimization algorithm with the BigDFT program.
- Prof. Stefan Goedecker, University of Basel
- Stephan Mohr