are allocated as a resource.
As a result,1000 processes are evenly allocated to 16VHs.
Since 1000 / 16 = 62.5, 63 processes are allocated to first 15 VHs,
and remaining 55 processes are allocated to the last VH (job: 0015).
As a result more than 8 processes are allocated to each VE of the last VH.
When the number of VEs per VH and the number of process per VH are uneven as in this case,
MPI allocate 8 processes to each of 125VEs(1000 / 125 = 8) by assigning VES as hosts with mpirun -venode -np 1000 ./a.out.
Is there any possibility for MPI communication performance to be improved
on 8VE models of Intel XEON processors (A300-8, A311-8, B300-8, A500-64, A511-64, etc.).
Communication performance may be improved by changing to execution on 4VE / logical node,
for example,
* Interactive execution
mpirun -ve 0-7 -np 64 ve.out
would be:
NMPI_EXEC_LNODE=ON mpirun -host host_0 -ve 0-3 -np 32 -host host_0/A -ve 4-7 -np 32 ve.out
* NQSV batch execution
#PBS -b 2
#PBS --venum-lhost=8
mpirun -np 128 ve.out
would be:
#PBS -b 4
#PBS --venum-lhost=4
#PBS --use-hca=2
mpirun -np 128 ve.out
About PROGINF/FTRACE
- When thread parallelism is performed, FTRACE shows that the amount of calculation has decreased.
Why does it happen?
When a function that is automatically parallelized or OpenMP parallelized is compiled without specifying the -ftrace option,
performance information other than the master thread is not included in the analysis result.
In addition, when the parallelization function of NLC (NEC Numeric Library Collection) is used,
the performance information of threads other than the master thread which are generated by the function is not included in the analysis result.
Please refer to 2.4 Notes of PROGINF/FTRACE User's Guide for this note.
- If the program terminates abnormally, can the standard error, PROGINF, and FTRACE results be output to the end of all ranks?
If the program terminates abnormally, the output of standard error, PROGINF, and FTRACE results are not guaranteed.
Therefore, all parts or some parts of these outputs may not be output.
About InfiniBand
- Is there any possibility of a problem in a MPI program execution over InfiniBand with different generations of InfiniBand HCA?
MPI program cannot be executed between different generations of InfiniBand HCA in certain cases.
As there is some incompatibility with Infiniband communication
between EDR HCA and HDR HCA (HDR100), and between EDR HCA and NDR HCA,
MPI program cannot be executed in such cases.
Several Aurora models are defined with different generations of InfiniBand HCAs,
EDR HCA, HDR HCA (HDR100), and NDR HCA.
Please find below the Aurora models and equipped InfiniBand HCAs.
Aurora models and equipped InfiniBand HCAs
EDR |
HDR |
NDR |
(Rack Mount)
A300-2 A300-4 A300-8
(Supercomputer)
A500-64 A511-64 |
(Rack Mount)
A311-4 A311-8 B300-8 A412-8 B401-8 B302-8 |
(Rack Mount)
C401-8 |
Revision history
2021/09/30 New release
2021/12/24 Updated information on questions and answers about NEC MPI Operating Procedures
2023/09/28 Updated information on questions and answers about NEC MPI, PROGINF/FTRACE, InfiniBand