.. hpc SDP HPC ===================== The SDP has the ability to run parallel simulation codes in the configured HPC. This is intended for generating data by the SD models. ----------------------- HPC Environment ----------------------- OpenHPC Modules ----------------------- Two HPC environment modules have been deployed in the SDP HPC: the intel and gcc compliers with OpenMPI4 as the MPI library. To use the intel or GNU environment, one should load the ``ohpc-intel`` or ``ohpc-gnu12`` module, respectively. Then the associated submodules (like ``fftw``, ``netcdf``) will be seen and can be loaded. One example of using FFTW in the intel environment is given as follows. .. code-block:: bash [xiangliu@localhost ~]$ module overview -------------------------------- /opt/ohpc/pub/modulefiles --------------------------------- ohpc-intel (1) ohpc-gnu12 (1) [xiangliu@localhost ~]$ module load ohpc-intel Loading compiler version 2023.1.0 [xiangliu@localhost ~]$ ml ov ------------------------- /opt/ohpc/pub/moduledeps/intel-openmpi4 -------------------------- fftw (1) netcdf-cxx (1) netcdf (1) phdf5 (1) [xiangliu@localhost ~]$ ml fftw [xiangliu@localhost ~]$ ml Currently Loaded Modules: 1) compiler/2023.1.0 2) fftw/3.3.10 3) mkl/2023.1.0 4) intel/2023.1.0 5) openmpi4/4.1.4 6) ohpc-intel Temporary Storage: Scratch --------------------------- The SDP HPC equips the temporary storage to store the large amount of data generated by the simulation codes. User must store the large simulation data into the scratch directory. The scratch directory can be automatically created by ``cdscratch`` command with ``hpc-tools`` module being loaded. After calling this command, a soft link to the user scratch directory is created under the root of the user's home directory. .. code-block:: bash [xiangliu@localhost ~]$ ml hpc-tools [xiangliu@localhost ~]$ ml Currently Loaded Modules: 1) hpc-tools [xiangliu@localhost ~]$ cdscratch User scratch created: /data/scratch/xiangliu Soft link created: /home/xiangliu/scratch Now you can type “cd ~/scratch” to change to your scratch directory. [xiangliu@localhost ~]$ ll lrwxrwxrwx 1 xiangliu xiangliu 18 Jul 26 scratch -> /data/scratch/xiangliu [xiangliu@localhost ~]$ cd scratch/ [xiangliu@localhost scratch]$ pwd /home/xiangliu/scratch .. warning:: The scratch directory will be cleaned periodically. So, please don't save your important files in this directory and backup your useful simulation data. ----------------------- SLURM Workload Manager ----------------------- The SDP HPC utilizes the `SLURM `_ workload manager. Large scale of computing codes should submit the computing tasks to the SDP HPC through SLURM. The detailed guide of using SLURM can be found in the `SLURM Documentation `_. In this guide, a brief tutorial will be introduced based on the `OpenHPC Installation Guide `_. .. hint:: The following table provides approximate command equivalences between SLURM and OpenPBS: .. csv-table:: :header: "Command", "OpenPBS", "SLURM" "Submit batch job", "qsub [job script]", "sbatch [job script]" "Request interactive shell", "qsub -I /bin/bash", "salloc" "Delete job", "qdel [job id]", "scancel [job id]" "Queue status", "qstat -q", "sinfo" "User's Job list", "qstat -u user_name", "squeue -u user_name" "Job status", "qstat -f [job id]", "scontrol show job [job id]" "Node status", "pbsnodes [node name]", "scontrol show node [node id]" Interactive Task Submission --------------------------- If you want to **test a simulation code** or **run heavy task** with :ref:`programming ide`, a temporary connection with a limited time (wall time) can be requested. Here, an example will be given with hello world program. Switch to the scratch directory (make sure you have followed :ref:`temporary storage: scratch`) and copy the hello program into your scratch directory with ``cphello`` command. .. code-block:: bash [xiangliu@localhost ~]$ ml hpc-tools [xiangliu@localhost ~]$ cd scratch/ [xiangliu@localhost scratch]$ cphello [xiangliu@localhost scratch]$ ls hello.c Then you can compile the hello program with ``mpicc`` with ``ohpc-intel`` module being loaded. If you run the complied program, you will see that it runs on the administration node. .. code-block:: bash [xiangliu@localhost scratch]$ ml ohpc-intel [xiangliu@localhost scratch]$ mpicc -O3 hello.c [xiangliu@localhost scratch]$ ls hello.c a.out [xiangliu@localhost scratch]$ ./a.out Hello, world (1 procs total) --> Process # 0 of 1 is alive. -> admin Now you can run the hello program interactively after requesting the computing resource by ``salloc`` command. .. code-block:: bash [xiangliu@localhost scratch]$ salloc -n4 -N2 # "-n" and "-N" specify the number of processors and the number of the nodes being requested salloc: Granted job allocation 41 [xiangliu@localhost scratch]$ prun $SCRATCHPATH/a.out # use absolute path of the scratch directory, hpc-tools module should be loaded. Hello, world (4 procs total) --> Process # 0 of 4 is alive. -> cpu1 --> Process # 1 of 4 is alive. -> cpu1 --> Process # 2 of 4 is alive. -> cpu2 --> Process # 3 of 4 is alive. -> cpu2 [xiangliu@localhost scratch]$ exit exit salloc: Relinquishing job allocation 41 salloc: Job allocation 41 has been revoked. Batch File Submission ----------------------- One can also use a batch file to submit the task. Following the above hello program. Use ``cpslurmjob`` to copy the batch job file into your scratch directory. Then use ``sbatch`` command to submit the task. .. code-block:: [xiangliu@localhost ~]$ ml hpc-tools [xiangliu@localhost ~]$ cd scratch/ [xiangliu@localhost scratch]$ cpslurmjob [xiangliu@localhost scratch]$ ls hello.c job.sh a.out [xiangliu@localhost scratch]$ sbatch job.sh Submitted batch job 69 [xiangliu@localhost scratch]$ ls hello.c job.sh a.out hellow.69.out hellow.69.err [xiangliu@localhost scratch]$ cat hellow.69.out Hello, world (4 procs total) --> Process # 0 of 4 is alive. -> cpu1 --> Process # 1 of 4 is alive. -> cpu1 --> Process # 2 of 4 is alive. -> cpu2 --> Process # 3 of 4 is alive. -> cpu2 SLURM Task Management ----------------------- The queue info can be viewed by ``sinfo`` command. .. code-block:: bash [xiangliu@localhost scratch]$ sinfo PARTITION AVAIL TIMELIMIT NODES STATE NODELIST The task status can be viewed by ``squeue`` command. .. code-block:: bash [xiangliu@localhost scratch]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 70 p interact xiangliu R 0:06 1 c1 The submitted task can be cancelled by ``scancel`` command. .. code-block:: bash [xiangliu@localhost scratch]$ salloc -n2 -N1 salloc: Granted job allocation 70 [xiangliu@localhost scratch]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 70 p interact xiangliu R 0:06 1 c1 [xiangliu@localhost scratch]$ scancel 70 salloc: Job allocation 70 has been revoked. Hangup ----------------------- Simulation Codes ----------------------- Several simulation codes have been compiled and are ready to use on SDP. The compiled codes and their sources are located at ``/home/imas-public/hpc-codes``. The folder with suffix ``_bin`` is the compiled codes. Before using it, one should load the compiler module. .. code-block:: [xiangliu@localhost ~] module load compiler/GCC CQL3D Code ----------------------- The documentation can be found at `CQL3D Documentation `_ GENRAY Code ----------------------- The documentation can be found at `GENRAY Documentation `_