- Cholla-PM Tests on Summit
Cholla-PM Tests on Summit
This website documents the procedure for performing the Cholla-PM tests on Summit.
Changes to Cholla-PM for Summit Tests
Currently, we are lacking an initial conditions generator for Cholla for tests at scale. The issue is that we have been using MUSIC, which does not use MPI and therefore is limited to a single system memory.
This issue is compounded by the need to run on arbitrary even numbers of processes without keeping a cubic domain. We can subdivide a single cubic initial conditions, but that would require a large number of particles and a reduced timestep (for the hydro).
The plan instead is to simply replicate either a 128^3 or 256^3 box onto every process and use a computational domain that maps onto the domain of mpi processes determined via the usual functions in Cholla. This would make rectangular domains.
I have gone through the code and added “TILING” preprocessor definitions that alter the behavior of the code. Basically I have each process read in a single snapshot + particle file and replicate it locally, shifting appropriately for the processor location within the domain. This also requires simlinks to be created for each process to the single snapshot files, in the “0.h5.?” and “0_part.h5.?” formats.
Installation and Tests on Sparkle
First, I installed and tested Cholla-PM on sparkle. Here is the procedure
git clone https://github.com/bvillasen/cholla.git cd cholla git checkout -b particles git pull origin particles
Note we will build from the cholla-pm.tiling.tar.gz tarball on Summit.
wget http://www.fftw.org/fftw-3.3.7.tar.gz tar -zxvf fftw-3.3.7.tar.gz cd fftw-3.3.7/ ./bootstrap.sh ./configure --enable-mpi --enable-openmp --enable-threads --disable-shared (Add a path) make -j 20 make install
unzip pfft-master.zip cd pfft-master autoreconf --install ./configure --disable-fortran (Add a path) make -j 20 make install ./configure --enable-openmp --disable-fortran --disable-shared (Add a path, remake with openmp) make -j 20 make install
wget https://support.hdfgroup.org/ftp/HDF5/current18/src/hdf5-1.8.20.tar.gz tar -zxvf hdf5-1.8.20.tar.gz cd hdf5-1.8.20/ ./configure --enable-cxx --disable-shared (Add a path) make -j 20 make install
On sparkle, I’m using /home/brant/github/bruno/makefile_tiling. Note we require libz when compiling hdf5 with –disable-shared. And some of the compiling sequences needed to be changed.
Run simple tests
I have run simple tests with the set of 256^3 ICs that Bruno provided, using 2 and 4 processes on sparkle. The weak scaling is fine, with each step taking about 5 seconds.
Installation and Tests on Summit
Below I detail the process for installing cholla-pm and running tests on Summit.
Connecting to Summit
There is connection information on the Summit website.
Currently, you have to connect to an internal OLCF system first, via, e.g.,
home.ccs.ornl.gov. Connection passcode is pin + RSA key. Note that summit has a different address,
Source code location
Also copied at:
Modules for compilation
module load cuda module load hdf5 module load spectrum-mpi
Well, the FFTW module on Summit does not have mpi enabled! So we have a harder route.
wget http://www.fftw.org/fftw-3.3.7.tar.gz tar -zxvf fftw-3.3.7.tar.gz cd fftw-3.3.7/ ./bootstrap.sh ./configure --enable-mpi --enable-openmp --enable-threads --disable-shared --prefix=/ccs/home/brantr/code/fftw make -j 20 make install
First, load modules. Then:
cd ~/code/pfft unzip pfft-master.zip cd pfft-master ./bootstrap.sh ./configure --disable-fortran --disable-shared --with-fftw3=/ccs/home/brantr/code/fftw --prefix=/ccs/home/brantr/code/pfft make -j 20 make install
Compiling Cholla on Summit
I used the makefile at:
which is also at:
I had to change N_OMP_THREADS to 6 in global.h.
Reminders about LSF
To submit a job:
To check on your jobs
To kill a job
Running Cholla-PM on Summit
The tests were run out of the directory:
This directory contained the cholla-mp executable, the cosmo_tiling.txt file:
########################################## # number of grid cells in the x dimension nx=256 # number of grid cells in the y dimension ny=256 # number of grid cells in the z dimension nz=256 # output time tout=10000000000 # how often to output outstep=10000 times_output=output_list.txt # value of gamma gamma=1.66666667 # name of initial conditions init=Read_Grid nfile=0 n_parts_initFiles=1 time_max=1000000000 # domain properties xmin=0.0 ymin=0.0 zmin=0.0 xlen=115000.0 ylen=115000.0 zlen=115000.0 # type of boundary conditions xl_bcnd=1 xu_bcnd=1 yl_bcnd=1 yu_bcnd=1 zl_bcnd=1 zu_bcnd=1 outdir=./dat/ indir=./ics/ics_256/
The output_list.txt file:
There was a subdirectory
ics/ics_256/, which contained the
0_parts.h5 files. I created symlinks for the process ICs using the following script
#!/bin/bash I=$1 while [ $I -lt $(($2+1)) ]; do $BASE `printf "ln -s 0.h5 0.h5.%d" $I` $BASE `printf "ln -s 0_parts.h5 0_parts.h5.%d" $I` I=$(($I+1)) done
An ICs symlink needs to be made for each process.
To run the test, I used the following LSF script:
#!/bin/bash #BSUB -P CSC275robertson #BSUB -W 0:30 #BSUB -nnodes 1 #BSUB -alloc_flags gpumps #BSUB -J scaling_6 #BSUB -o scaling_6.%J #BSUB -e scaling_6.%J module load cuda module load hdf5 module load spectrum-mpi date cd /ccs/proj/csc275/brantr/projwork export OMP_NUM_THREADS=6 jsrun -n6 -r6 -a1 -g1 -c6 -b packed:6 -d packed -l GPU-CPU ./cholla-pm cosmo_tiling.txt mv tiling_timing.txt tiling_timing.6.txt