ページの先頭です。
ここから本文です。

FAQ

[SX-Aurora TSUBASA] NEC MPI for SX-Aurora TSUBASA FAQ

Question

About NEC MPI Operating Procedures

  1. How can we get the physical VE number?

  2. Please let me know how to redirect when we use NEC MPI with NQSV qsub.

  3. When using Accelerated I/O, the following error occurs in allocating VH memory.
    # mpid(7): Allocate_system_v_shared_memory: key = 0xxxxxxxxxx, len = xxxxxxxxxx
    # shmget allocation: Cannot allocate memory

  4. The following error occurs when executing a batch job.
    host1: mpid(0): bind or listen failed in listen_port_range: Address already in use
    mpirun: cannot find fifo file: /tmp/mpi2mpid_fifo.xx_xxxxxxx; jid xxxxxxx
    mpirun: fatal error : cannot find fifo file (not created by mpid)

  5. For two scripts, necmpivars.csh and necmpivars.sh the ,
    "set echo" command leads to the warning in the case of go.csh but dose not in the case of go.sh

    necmpivars.sh: Warning: invalid argument. LD_LIBRARY_PATH is not updated.
    Note: "necmpivars.sh [gnu|intel] [version]" format should only be used
    at runtime in order to use VH MPI shared libraries other than those specified
    by RUNPATH embedded in a MPI program executable by the MPI compile command.
    In other cases, "source /opt/nec/ve/mpi/2.x.0/bin/necmpivars.sh"should be used without arguments.
    version is a directory name in the following directory:
    /opt/nec/ve/mpi/2.x.0/lib64/vh/gnu (if gnu is specified)
    /opt/nec/ve/mpi/2.x.0/lib64/vh/intel (if intel is specified)

  6. In our system more than one NEC MPI version have been installed.
    First I sourced a version A setup script.
    Can I switch the NEC MPI version to version B and execute a MPI program by sourcing the version B setup script at runtime?


  7. Why does the following problem occur?
    the note in "2.13 Nonblocking MPI Procedures in Fortran Programs" of NEC MPI User's Guide describes
    that we can not correct result or cause an abnormality in the program execution
    if we set one of the following items as an actual argument of the communication buffer or I/O buffer of the non-blocking MPI procedure.
    ・array section
    ・array expression
    ・array pointer
    ・assumed shape array

  8. I set the number of VEs and the number of processes, but the number of processes allocated to each VE is not equal to one another.
    I specified the number of VEs and the number of processes, but the processes assigned to each VE were not equal.

    #PBS --venode=125
    #PBS --venum-lhost=8
    mpirun -np 1000 ./a.out
    When I submit a request under the above conditions, the following state appears the occur.
    ・8 processes are not allocated to each VE
    ・In particular more than 8 processes are allocated to VE0 ~ VE3 of job:0015.

  9. II want to improve MPI communication performance on 8VE models of Intel XEON processors (A300-8, A311-8, B300-8, A500-64, A511-64, etc.).

About PROGINF/FTRACE

  1. When thread parallelism is performed, FTRACE shows that the amount of calculation has decreased.

  2. If the program terminates abnormally, can the standard error, PROGINF, and FTRACE results be output to the end of all ranks?

About InfiniBand

  1. Is there any possibility of a problem in a MPI execution of different generation models of InfiniBand?

Answer

Operating Procedures

  1. If you set the -v option for mpirun, the process generation information is output.
    You can find the physical VE number in this information.

    % mpirun -v -np 8 a.out
    mpid: Creating 8 process of './a.out' on VE 0 of local host
     ↑ Indicates that 8 processes have been created on physical VE number 0. 

    If you want to get it , you can find the it by referring to the environment variable VE_NODE_NUMBER.(getenv("VE_NODE_NUMBER"))

  2. You can use mpisep.sh to redirect the output of each rank to a file.
    export NMPI_SEPSELECT=4 
    mpirun -np 4 /opt/nec/ve/bin/mpisep.sh ./a.out 
    By the above procedure the standard output and error output are redirected to the file "std. Universe number: rank number"
    (The Universe number is normally 0. the number of processes generated by MPI_Comm_spawn etc. is more than 1.)
    Details can be described in 3.3 Standard Output and Standard Error of MPI Programs of the NEC MPI User's Guide.
    If the NEC MPI process manager is hydra in the queue settings of the NQSV you which you using
    you can control the output destination by setting the environment variable NMPI_OUTPUT_COLLECT when executing the MPI program as follows.

    ・NMPI_OUTPUT_COLLECT=ON
    The output of the MPI program is output to the standard output and standard error output of the MPI execution command.
    ・NMPI_OUTPUT_COLLECT=OFF(Default)
    The output of the MPI program similar to mpd is output for each logical node.

  3. It seems that HugePage is missing. In addition to MPI, set HugePage for high-speed I/O .
    HugePage configuration is described in the following chapters of the SX-Aurora TSUBASA Installation Guide . 4.11 Configuration of HugePages

  4. The default external connection listening port of NEC MPI described in 4.7.2 Firewall of SX-Aurora TSUBASA Installation Guide are 10 ports of 25257-25266 (assuming 10VE model).
    It is possible that 11 or more MPI daemons have been started at the same time,
    and the 11th and subsequent MPI daemons have failed to secure the port.
    If you want to start 11 or more MPI daemons on one node at the same time,
    you need to use the environment variable NMPI_PORT_RANGE to increase the number of ports used.
    For example, if you want to start 20 MPI daemons on one node at the same time, specify NMPI_PORT_RANGE = 25257: 25276.
    It is also possible to set the system configured or set #PBS -v in the job.
    The number of MPI daemons are determined by the number of jobs to be executed in the request set by #PBS -b or qsub -b.

  5. You can safely ignore the warning, see 3.11 Miscellaneous (19) in the NEC MPI User's Guide.

  6. By default, most NEC MPI libraries, including key features, are statically linked.
    Therefore, the version A library is used regardless of the version of the NEC MPI setup script that is sourced when the MPI program is run.
    If you want to change the NEC MPI version when running an MPI program, you need to dynamically link all libraries by setting the -shared-mpi option at the time of compiling and linking.
    By this procedure You can switch the NEC MPI version to version B and execute an MPI programs by sourcing the version B setup script at runtime.
    Please note that if all NEC MPI libraries are dynamically linked, the MPI communication performance may be reduced compared to the case of static linking.

  7. The contents described in 2.13 Nonblocking MPI Procedures in Fortran Programs of NEC MPI User's Guide are due to the MPI specifications (argument specifications of MPI procedures) and the optimization processing of the compiler.
    The compiler optimization process here is not unique to the NEC compiler,
    but refers to widely and generally performed optimization processes such as instruction order replacement and memory access reduction (register optimization).
    It is the content explained in the MPI specification (MPI: A Message-Passing Interface Standard, Version 3.1, June 4, 2015),
    17.1.17 Problems with Code Movement and Register Optimization.

  8. In order to allocate the same number of processes to all VEs, you need to explicitly specify VEs as a hosts with the -venode option.
    If you do not specify them, VHs are considered as hosts and processes are evenly allocated to VHs.
    As a result, if the number of VEs per VH is different from one another, the number of processes allocated to each VE is different from one another.
    Options for mpirun including -venode are explained in NEC MPI User's Guide 3.2.2 Runtime Options.
    In this example VEs are not explicitly assigned as hosts, VHs are assumed as hosts,
    On the other hand, by the NQSV conditions
     ・15 VHs with 8VEs
     ・1 VH with 5VEs
    are allocated as a resource.
    As a result,1000 processes are evenly allocated to 16VHs.
    Since 1000 ÷ 16 = 62.5, 63 processes are allocated to first 15 VHs, and remaining 55 processes are allocated to the last VH (job: 0015).
    As a result more than 8 processes are allocated to each VE of the last VH.
    When the number of VEs per VH and the number of process per VH are uneven as in this case,
    MPI allocate 8 processes to each of 125VEs(1000 ÷ 125 = 8) by assigning VES as hosts with mpirun -venode -np 1000 ./a.out.

  9. Communication performance may be improved by changing to execution on 4VE / logical node.
    ・Interactive execution
    mpirun -ve 0-7 -np 64 ve.out

    NMPI_EXEC_LNODE=ON mpirun -host host_0 -ve 0-3 -np 32 -host host_0/A -ve 4-7 -np 32 ve.out
    ・NQSV batch execution
    #PBS -b 2
    #PBS --venum-lhost=8
    mpirun -np 128 ve.out

    #PBS -b 4
    #PBS --venum-lhost=4
    #PBS --use-hca=2
    mpirun -np 128 ve.out

About PROGINF/FTRACE

  1. When a function that is automatically parallelized or OpenMP parallelized is compiled without specifying the -ftrace option,
    performance information other than the master thread is not included in the analysis result.
    In addition, when the parallelization function of NLC (NEC Numeric Library Collection) is used, the performance information of threads other than the master thread which are generated by the function is not included in the analysis result.
    Please refer to 2.4 Notes of PROGINF/FTRACE User’s Guide for this note.

  2. If the program terminates abnormally, the output of standard error, PROGINF, and FTRACE results are not guaranteed.
    Therefore, all parts or some parts of these outputs may not be output.

About InfiniBand

  1. MPI cannot be executed between different generation models of InfiniBand.
    The 1st generation is equipped with EDR HCA, and the 2nd generation is equipped with HDR (HDR100) HCA,
    but this is because some InfiniBand communication is not compatible between EDR / HDR.
    Please refer to the table below for models equipped with EDR and HDR.

    EDR, HDR equipped model table
    Model Name EDR HDR
    Rack Mount A300-2,
    A300-4,
    A300-8
    A311-4,
    A311-8,
    B300-8,
    A412-8,
    B401-8,
    B302-8
    Supercomputer A500-64,
    A511-64

Product Name

SX-Aurora TSUBASA Software

Note

Revision history

2021/09/30 New release
2021/12/24 Updated information on questions and answers about NEC MPI Operating Procedures

  • Content ID: 4150101113
  • Release date: 2021/09/30
  • Last updated:2021/12/24

Top

ここからページ共通メニューです。 ページ共通メニューを読み飛ばす。
ページ共通メニューここまで。