Hi
I want to generate timing log on mpi functions. I am using "export I_MPI_STATS=20" to enable log. This is capturing timing info only on one node. How to get similar information from all nodes that are used in execution.
Thanks
Biren
Hi
I want to generate timing log on mpi functions. I am using "export I_MPI_STATS=20" to enable log. This is capturing timing info only on one node. How to get similar information from all nodes that are used in execution.
Thanks
Biren
Our school project needs MPI, OpenGL, but in our attempt, we failed to create OpenGL window and system shared memory in Intel MPI process. Could anyone help us.
Our os is Windows 10.
Hello!
I am running a quad precision code in MPI. However, when I perform MPI_ALLREDUCE with MPI_REAL16 as datatype, the code gives a segmentation fault. How do I incorporate quad precision reduction operations in MPI. Any advice would be greatly appreciated.
Regards
Suman Vajjala
I'm attempting a PETSc 3.8 build with Intel Parallel Studio 2017.0.5. The build fails without much information, but it appears to be an internal compiler error.
Some key output:
...
/home/cchang/Packages/petsc-3.8/src/vec/is/sf/impls/basic/sfbasic.c(528): (col. 1) remark: FetchAndInsert__blocktype_int_4_1 has been targeted for automatic cpu dispatch
": internal error: 0_76
compilation aborted for /home/cchang/Packages/petsc-3.8/src/vec/is/sf/impls/basic/sfbasic.c (code 4)
gmake[2]: *** [impi-intel/obj/src/vec/is/sf/impls/basic/sfbasic.o] Error 4
Could you tell me what this error 0_76 is? I can provide log files or environment info if these will help.
Thanks,
Chris
Gentlemen, could you please help with an issue?
I´m using intel compiler ifort version 18.0.2 and intel mpi version 2018.2.199 in a attempt to run WRF model on HPE (former SGI) ICE X machine.
wrfoperador@dpns31:~> ifort -v
ifort version 18.0.2
wrfoperador@dpns31:~> mpirun -V
Intel(R) MPI Library for Linux* OS, Version 2018 Update 2 Build 20180125 (id: 18157)
Copyright 2003-2018 Intel Corporation.
wrfoperador@dpns31:~>
When I run the executable I receive the following message:
/opt/intel/intel_2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/bin/mpirun r1i1n0 12 /home/wrfoperador/wrf/wrf_metarea5/WPS/geogrid.exe
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
Could you help me to solve this problem?
Thanks for your attention. I´m looking forward to your reply.
All,
I'm hoping the Intel MPI gurus can help with this. Recently I've tried transitioning some code I help maintain (GEOS, a climate model) from using HPE MPT (2.17, in this case) to Intel MPI (18.0.1; 18.0.2 I'll test soon). In both cases, the compiler (Intel 18.0.1) is the same, both running on the same set of Haswell nodes on an SGI/HPE cluster. The only difference is the MPI stack.
Now one part of the code (AGCM, the physics/dynamics part) is actually a little bit faster with Intel MPI than MPT, even on an SGI machine. That is nice. It's maybe 5-10% faster in some cases. Huzzah!
But, another code (GSI, analysis of observation data) really, really, really does not like Intel MPI. This code displays two issues. First, after the code starts (both launch very fast) it eventually hits a point at which, we believe, the first collective occurs at which point the whole code stalls as it...initializes buffers? Something with Infiniband maybe? We don't know. MPT slows a bit too, but doesn't show this issue nearly as badly as IMPI. We had another place like this in the AGCM where moving from a collective to an Isend/Recv/Wait type paradigm really helped. This "stall" is annoying and, worse, it gets longer and longer as the number of cores increase. (We might have a reproducer for this one.)
But, that is minor really. A minute or so, compared to the overall performance. On 240 cores, MPT 2.17 runs this code in 15:03 (minutes:seconds), Intel MPI 18.0.1, 28:12. On 672 cores, MPT 2.17 runs the code in 12:02 and Intel MPI 18.0.2 in 21:47; doesn't scale well overall for either.
Using I_MPI_STATS, the code is seen to be ~60% MPI in Alltoallv (20% of wall) at 240 cores; at 672, Barrier starts to win, but Alltoallv is still 40% MPI, 23% walltime. I've tried running by setting both I_MPI_ADJUST_ALLTOALLV options (1 and 2) and it does little at all (28:44 and 28:25 at 240).
I'm going to try and see if I can request/reserve a set of nodes for a long time to do an mpitune run, but since each run is ~30 minutes...mpitune will not be fun as it'd be 90 minutes for each option test.
Any ideas on what might be happening? Any advice for flags/environment variables to try? I understand that HPE MPT might/should work best on an SGI/HPE machine (like how Intel compilers seem to do best with Intel chips), but this seems a bit beyond the usual difference. I've requested MVAPICH2 be installed as well for another comparison.
Matt
I have been trying to setup NFS over RDMA on OmniPath following instruction in official document. IPoIB works fine, but I cannot get NFS over RDMA working. I have modified /etc/rdma/rdma.conf and added
NFSoRDMA_LOAD=yes
NFSoRDMA_PORT=2050
I have also loaded appropriate modules ( sunrpc on the client, xprtrdma on the server). However, the NFS mount fails (connection refused) when using RDMA to mount, note that it works fine if I do not specify rdma.
It appears that the 2050 port for NFSoRDMA does not get created, when I do rpcinfo from the client to examine the server I see ports 2049 for nfs, but nothing for 2050.
This is on CentOS 7.4. Any ideas/suggestions what may be wrong?
Hi, Intel support guys,
I am running tests on our SKYLAKE computers. I am surprise to see there are 4 cores/pkg gone. Where are they?
Our computer system information is below:
Process: Intel Xeon Gold 6148 CPU@2.40GHz 2.39GHz (2 processors)
Installed memory: 384GB
System type: 64-bit operating system x64-based processor
OS: Windows server 2016 standard
Please see the following outputs and you will see that 4 cores per package are gone. where are these 8 cores in total?
I am looking forward to hearing from you.
Thanks in advance
Best regards,
Dingjun
Computer Modelling Group Ltd.
Calgary, AB, Canada
VECTOR_SIMD_OPENMP_TEST
OMP: Info #211: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #209: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}
OMP: Info #156: KMP_AFFINITY: 32 available OS procs
OMP: Info #158: KMP_AFFINITY: Nonuniform topology
OMP: Info #179: KMP_AFFINITY: 2 packages x 20 cores/pkg x 1 threads/core (32 total cores)
OMP: Info #213: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 8
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 9
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 10
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 0 core 11
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 0 core 12
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 0 core 16
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 0 core 17
OMP: Info #171: KMP_AFFINITY: OS proc 12 maps to package 0 core 18
OMP: Info #171: KMP_AFFINITY: OS proc 13 maps to package 0 core 19
OMP: Info #171: KMP_AFFINITY: OS proc 14 maps to package 0 core 20
OMP: Info #171: KMP_AFFINITY: OS proc 15 maps to package 0 core 24
OMP: Info #171: KMP_AFFINITY: OS proc 16 maps to package 0 core 25
OMP: Info #171: KMP_AFFINITY: OS proc 17 maps to package 0 core 26
OMP: Info #171: KMP_AFFINITY: OS proc 18 maps to package 0 core 27
OMP: Info #171: KMP_AFFINITY: OS proc 19 maps to package 0 core 28
OMP: Info #171: KMP_AFFINITY: OS proc 20 maps to package 1 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 21 maps to package 1 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 22 maps to package 1 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 23 maps to package 1 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 24 maps to package 1 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 25 maps to package 1 core 8
OMP: Info #171: KMP_AFFINITY: OS proc 26 maps to package 1 core 9
OMP: Info #171: KMP_AFFINITY: OS proc 27 maps to package 1 core 10
OMP: Info #171: KMP_AFFINITY: OS proc 28 maps to package 1 core 11
OMP: Info #171: KMP_AFFINITY: OS proc 29 maps to package 1 core 12
OMP: Info #171: KMP_AFFINITY: OS proc 30 maps to package 1 core 16
OMP: Info #171: KMP_AFFINITY: OS proc 31 maps to package 1 core 17
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 4004 thread 0 bound to OS proc set {0}
The number of processors available = 32
The number of threads available = 20
HELLO from process 0
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8956 thread 1 bound to OS proc set {1}
HELLO from process 1
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8820 thread 2 bound to OS proc set {2}
HELLO from process 2
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9292 thread 3 bound to OS proc set {3}
HELLO from process 3
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9752 thread 4 bound to OS proc set {4}
HELLO from process 4
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 3776 thread 5 bound to OS proc set {5}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8464 thread 6 bound to OS proc set {6}
HELLO from process 5
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 1416 thread 7 bound to OS proc set {7}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 3868 thread 8 bound to OS proc set {8}
HELLO from process 6
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 7396 thread 9 bound to OS proc set {9}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9772 thread 10 bound to OS proc set {10}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9280 thread 11 bound to OS proc set {11}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9948 thread 12 bound to OS proc set {12}
HELLO from process 7
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8712 thread 13 bound to OS proc set {13}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 6092 thread 14 bound to OS proc set {14}
HELLO from process 11
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8532 thread 15 bound to OS proc set {15}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9892 thread 16 bound to OS proc set {16}
HELLO from process 12
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 10640 thread 17 bound to OS proc set {17}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9060 thread 18 bound to OS proc set {18}
HELLO from process 14
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8908 thread 19 bound to OS proc set {19}
HELLO from process 18
HELLO from process 16
HELLO from process 19
HELLO from process 13
HELLO from process 8
HELLO from process 17
HELLO from process 15
HELLO from process 10
HELLO from process 9
matrix multiplication completed
Elapsed wall clock time 2 = 133.379
Join the Intel® Parallel Studio XE 2019 Beta Program today and—for a limited time—get early access to new features and get an open invitation to tell us what you really think.
We want YOU to tell us what to improve so we can create high-quality software tools that meet your development needs.
Top New Features in Intel® Parallel Studio XE 2019 Beta
New Features in Intel® MPI Library
New Features in Intel® Cluster Checker
To learn more, visit Intel® Parallel Studio XE 2019 Beta page.
Then sign up to get started.
Dear All, I compiled VASP package with IMPI succcessfully, when I run the program, the program stops with some MPI errors listed as below. Could anybody tell me how to fix it. Thanks! Xiang YE [0] MPI startup(): Intel(R) MPI Library, Version 2017 Update 4 Build 20170817 (id: 17752) [0] MPI startup(): Copyright (C) 2003-2017 Intel Corporation. All rights reserved. [0] MPI startup(): Multi-threaded optimized library [3] MPI startup(): Found 1 IB devices [5] MPI startup(): Found 1 IB devices [24] MPI startup(): Found 1 IB devices [15] MPI startup(): Found 1 IB devices [25] MPI startup(): Found 1 IB devices [14] MPI startup(): Found 1 IB devices [26] MPI startup(): Found 1 IB devices [9] MPI startup(): Found 1 IB devices [16] MPI startup(): Found 1 IB devices [27] MPI startup(): Found 1 IB devices [7] MPI startup(): Found 1 IB devices [1] MPI startup(): Found 1 IB devices [17] MPI startup(): Found 1 IB devices [18] MPI startup(): Found 1 IB devices [8] MPI startup(): Found 1 IB devices [10] MPI startup(): Found 1 IB devices [19] MPI startup(): Found 1 IB devices [20] MPI startup(): Found 1 IB devices [0] MPI startup(): Found 1 IB devices [11] MPI startup(): Found 1 IB devices [21] MPI startup(): Found 1 IB devices [2] MPI startup(): Found 1 IB devices [22] MPI startup(): Found 1 IB devices [13] MPI startup(): Found 1 IB devices [12] MPI startup(): Found 1 IB devices [23] MPI startup(): Found 1 IB devices [4] MPI startup(): Found 1 IB devices [6] MPI startup(): Found 1 IB devices [0] MPI startup(): Open 0 IB device: mlx5_0 [11] MPI startup(): Open 0 IB device: mlx5_0 [20] MPI startup(): Open 0 IB device: mlx5_0 [2] MPI startup(): Open 0 IB device: mlx5_0 [21] MPI startup(): Open 0 IB device: mlx5_0 [9] MPI startup(): Open 0 IB device: mlx5_0 [3] MPI startup(): Open 0 IB device: mlx5_0 [22] MPI startup(): Open 0 IB device: mlx5_0 [25] MPI startup(): Open 0 IB device: mlx5_0 [1] MPI startup(): Open 0 IB device: mlx5_0 [23] MPI startup(): Open 0 IB device: mlx5_0 [13] MPI startup(): Open 0 IB device: mlx5_0 [4] MPI startup(): Open 0 IB device: mlx5_0 [10] MPI startup(): Open 0 IB device: mlx5_0 [15] MPI startup(): Open 0 IB device: mlx5_0 [26] MPI startup(): Open 0 IB device: mlx5_0 [14] MPI startup(): Open 0 IB device: mlx5_0 [16] MPI startup(): Open 0 IB device: mlx5_0 [27] MPI startup(): Open 0 IB device: mlx5_0 [19] MPI startup(): Open 0 IB device: mlx5_0 [8] MPI startup(): Open 0 IB device: mlx5_0 [18] MPI startup(): Open 0 IB device: mlx5_0 [6] MPI startup(): Open 0 IB device: mlx5_0 [24] MPI startup(): Open 0 IB device: mlx5_0 [7] MPI startup(): Open 0 IB device: mlx5_0 [12] MPI startup(): Open 0 IB device: mlx5_0 [17] MPI startup(): Open 0 IB device: mlx5_0 [5] MPI startup(): Open 0 IB device: mlx5_0 [0] MPI startup(): Start 1 ports per adapter [20] MPI startup(): Start 1 ports per adapter [11] MPI startup(): Start 1 ports per adapter [9] MPI startup(): Start 1 ports per adapter [3] MPI startup(): Start 1 ports per adapter [21] MPI startup(): Start 1 ports per adapter [2] MPI startup(): Start 1 ports per adapter [1] MPI startup(): Start 1 ports per adapter [25] MPI startup(): Start 1 ports per adapter [22] MPI startup(): Start 1 ports per adapter [23] MPI startup(): Start 1 ports per adapter [4] MPI startup(): Start 1 ports per adapter [10] MPI startup(): Start 1 ports per adapter [15] MPI startup(): Start 1 ports per adapter [13] MPI startup(): Start 1 ports per adapter [26] MPI startup(): Start 1 ports per adapter [14] MPI startup(): Start 1 ports per adapter [27] MPI startup(): Start 1 ports per adapter [16] MPI startup(): Start 1 ports per adapter [12] MPI startup(): Start 1 ports per adapter [18] MPI startup(): Start 1 ports per adapter [24] MPI startup(): Start 1 ports per adapter [6] MPI startup(): Start 1 ports per adapter [19] MPI startup(): Start 1 ports per adapter [8] MPI startup(): Start 1 ports per adapter [5] MPI startup(): Start 1 ports per adapter [17] MPI startup(): Start 1 ports per adapter [7] MPI startup(): Start 1 ports per adapter [11] MPID_nem_ofacm_init(): Init [0] MPID_nem_ofacm_init(): Init [20] MPID_nem_ofacm_init(): Init [9] MPID_nem_ofacm_init(): Init [3] MPID_nem_ofacm_init(): Init [21] MPID_nem_ofacm_init(): Init [2] MPID_nem_ofacm_init(): Init [1] MPID_nem_ofacm_init(): Init [22] MPID_nem_ofacm_init(): Init [25] MPID_nem_ofacm_init(): Init [23] MPID_nem_ofacm_init(): Init [11] MPI startup(): ofa data transfer mode [0] MPI startup(): ofa data transfer mode [20] MPI startup(): ofa data transfer mode [4] MPID_nem_ofacm_init(): Init [10] MPID_nem_ofacm_init(): Init [26] MPID_nem_ofacm_init(): Init [14] MPID_nem_ofacm_init(): Init [15] MPID_nem_ofacm_init(): Init [13] MPID_nem_ofacm_init(): Init [27] MPID_nem_ofacm_init(): Init [16] MPID_nem_ofacm_init(): Init [12] MPID_nem_ofacm_init(): Init [9] MPI startup(): ofa data transfer mode [18] MPID_nem_ofacm_init(): Init [3] MPI startup(): ofa data transfer mode [8] MPID_nem_ofacm_init(): Init [24] MPID_nem_ofacm_init(): Init [19] MPID_nem_ofacm_init(): Init [5] MPID_nem_ofacm_init(): Init [21] MPI startup(): ofa data transfer mode [17] MPID_nem_ofacm_init(): Init [1] MPI startup(): ofa data transfer mode [7] MPID_nem_ofacm_init(): Init [2] MPI startup(): ofa data transfer mode [6] MPID_nem_ofacm_init(): Init [22] MPI startup(): ofa data transfer mode [25] MPI startup(): ofa data transfer mode [23] MPI startup(): ofa data transfer mode [10] MPI startup(): ofa data transfer mode [14] MPI startup(): ofa data transfer mode [15] MPI startup(): ofa data transfer mode [26] MPI startup(): ofa data transfer mode [4] MPI startup(): ofa data transfer mode [13] MPI startup(): ofa data transfer mode [27] MPI startup(): ofa data transfer mode [16] MPI startup(): ofa data transfer mode [12] MPI startup(): ofa data transfer mode [18] MPI startup(): ofa data transfer mode [24] MPI startup(): ofa data transfer mode [19] MPI startup(): ofa data transfer mode [8] MPI startup(): ofa data transfer mode [5] MPI startup(): ofa data transfer mode [6] MPI startup(): ofa data transfer mode [7] MPI startup(): ofa data transfer mode [17] MPI startup(): ofa data transfer mode Fatal error in MPI_Init: Other MPI error, error stack: MPIR_Init_thread(805): fail failed MPID_Init(1866)......: fail failed MPIR_Comm_commit(711): fail failed (unknown)(): Other MPI error
Hi Everyone,
I just installed the newest version of Intel MPI library (2018.2.199), previously I was using openMPI. I am using ifort 15.0.3.
I am trying to compile the following test program:
program main use mpi_f08 implicit none integer :: rank, size, len character(len=MPI_MAX_LIBRARY_VERSION_STRING) :: version call MPI_INIT() call MPI_COMM_RANK(MPI_COMM_WORLD, rank) call MPI_COMM_SIZE(MPI_COMM_WORLD, size) call MPI_GET_LIBRARY_VERSION(version, len) print *, "rank:", rank print *, "size:",size print *, "version: "//version print *, ' No Errors' call MPI_FINALIZE() end
When I use openMPI it works fine. However, I am getting the following errors with Intel MPI:
% mpiifort test_F08.f90 test_F08.f90(2): error #7012: The module file cannot be read. Its format requires a more recent F90 compiler. [MPI_F08] use mpi_f08 --------^ test_F08.f90(8): error #6404: This name does not have a type, and must have an explicit type. [MPI_COMM_WORLD] call MPI_COMM_RANK(MPI_COMM_WORLD, rank) -----------------------^ test_F08.f90(5): error #6279: A specification expression object must be a dummy argument, a COMMON block object, or an object accessible through host or use association. [MPI_MAX_LIBRARY_VERSION_STRING] character(len=MPI_MAX_LIBRARY_VERSION_STRING) :: version ------------------^ test_F08.f90(5): error #6591: An automatic object is invalid in a main program. [VERSION] character(len=MPI_MAX_LIBRARY_VERSION_STRING) :: version -----------------------------------------------------^ compilation aborted for test_F08.f90 (code 1)
So, do I have to use the same version of ifort that was used during compilation of Intel MPI modules? That is not listed in the requirements to use intel MPI.
Why does Intel MPI not create new modules files using the Fortran compiler available in the system?
Is there anything that I can do to use Intel MPI with my compiler?
Thanks for your help,
Hector
Does Intel MPI support GPUdirect RDMA, with NVIDIA drivers and Cudatoolkit 9.x installed?
Is there any documentation on what drivers to install, and what fabric select env vars to set?
Thanks
Ron
My understanding is that Intel allows one to sell and transfer your license to someone else. I am a small scale open-source developer and can't afford the price for the latest cluster edition linux compilers. Send me a note if you have an older version you wouldn't mind transferring to me. For my needs anything 2015 and newer would suffice.
Hi,
Is there a need to re-install Intel Studio even in the case then I added further compute nodes to my cluster? There exists two infiniband -islands, ibstat is:
CA 'mlx4_0', CA type: MT4099 and CA 'mlx4_1', CA type: MT26428. The latest compute nodes are associated to MT4099.
These provider -errors are only present in the 'newer node context'
[2] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
[10] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
[12] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
node009:UCM:2d97:570fa700: 1249 us(1249 us): open_hca: device mlx4_0 not found
node009:UCM:2d9f:1626f700: 1262 us(1262 us): open_hca: device mlx4_0 not found
node009:UCM:2da1:7f214700: 1102 us(1102 us): open_hca: device mlx4_0 not found
Regards
Gert
Adjunto | Tamaño |
---|---|
Descargar![]() | 5.87 KB |
Hello,
I'm just getting started with Intel MPI and am trying to understand how to use Trace Analyzer. My understanding is that linking with vt.lib and running an mpi application is sufficient to cause a *.stf file to be emitted. I have a simple Hello World MPI application. After linking with vt.lib and running through mpiexec, I see no stf output.
There's not much more information to add. The setup could not be simpler. What am I missing?
Jeff
Hello,
I am experiencing issues while using MPI_Sendrecv on multiple machines. In the code I am sending a vector in the circular manner in parallel. Each process is sending data to the subsequent process and receiving data from preceding process. Surprisingly, in the first execution of SEND_DATA routine the output is correct. While for the second execution the output is incorrect. The code and the output are below.
PROGRAM SENDRECV_REPROD USE MPI USE ISO_FORTRAN_ENV,ONLY: INT32 IMPLICIT NONE INTEGER(KIND=INT32) :: STATUS(MPI_STATUS_SIZE) INTEGER(KIND=INT32) :: RANK,NUM_PROCS,IERR CALL MPI_INIT(IERR) CALL MPI_COMM_RANK(MPI_COMM_WORLD,RANK,IERR) CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NUM_PROCS,IERR) CALL SEND_DATA(RANK,NUM_PROCS) CALL SEND_DATA(RANK,NUM_PROCS) CALL MPI_BARRIER(MPI_COMM_WORLD,IERR) CALL MPI_FINALIZE(IERR) END PROGRAM SUBROUTINE SEND_DATA(RANK,NUM_PROCS) USE ISO_FORTRAN_ENV,ONLY: INT32,REAL64 USE MPI IMPLICIT NONE INTEGER(KIND=INT32),INTENT(IN) :: RANK INTEGER(KIND=INT32),INTENT(IN) :: NUM_PROCS INTEGER(KIND=INT32) :: IERR,ALLOC_ERROR INTEGER(KIND=INT32) :: VEC_SIZE,I_RANK,RANK_DESTIN,RANK_SOURCE,TAG_SEND,TAG_RECV REAL(KIND=REAL64), ALLOCATABLE :: COMM_BUFFER(:),VEC1(:) INTEGER(KIND=INT32) :: MPI_COMM_STATUS(MPI_STATUS_SIZE) ! Allocate communication arrays. VEC_SIZE = 374454 ALLOCATE(COMM_BUFFER(VEC_SIZE),STAT=ALLOC_ERROR) ALLOCATE(VEC1(VEC_SIZE),STAT=ALLOC_ERROR) ! Define destination and source ranks for sending and receiving messages. RANK_DESTIN = MOD((RANK+1),NUM_PROCS) RANK_SOURCE = MOD((RANK+NUM_PROCS-1),NUM_PROCS) TAG_SEND = RANK+1 TAG_RECV = RANK IF (RANK==0) TAG_RECV=NUM_PROCS VEC1=RANK COMM_BUFFER=0.0_REAL64 CALL MPI_BARRIER(MPI_COMM_WORLD,IERR) DO I_RANK=1,NUM_PROCS IF (RANK==I_RANK-1) WRITE(*,*) 'R',RANK, VEC1(1),'B', COMM_BUFFER(1) ENDDO CALL MPI_SENDRECV(VEC1(1),VEC_SIZE,MPI_DOUBLE_PRECISION,RANK_DESTIN,TAG_SEND,COMM_BUFFER(1),& VEC_SIZE,MPI_DOUBLE_PRECISION,RANK_SOURCE,TAG_RECV,MPI_COMM_WORLD,MPI_COMM_STATUS,IERR) DO I_RANK=1,NUM_PROCS IF (RANK==I_RANK-1) WRITE(*,*) 'R' , RANK , VEC1(1),'A', COMM_BUFFER(1) ENDDO END SUBROUTINE SEND_DATA
Output of four processes run on four machines:
R 0 0.000000000000000E+000 B 0.000000000000000E+000
R 1 1.00000000000000 B 0.000000000000000E+000
R 2 2.00000000000000 B 0.000000000000000E+000
R 3 3.00000000000000 B 0.000000000000000E+000
R 0 0.000000000000000E+000 A 3.00000000000000
R 1 1.00000000000000 A 0.000000000000000E+000
R 2 2.00000000000000 A 1.00000000000000
R 3 3.00000000000000 A 2.00000000000000
R 0 0.000000000000000E+000 B 0.000000000000000E+000
R 1 1.00000000000000 B 0.000000000000000E+000
R 2 2.00000000000000 B 0.000000000000000E+000
R 3 3.00000000000000 B 0.000000000000000E+000
R 0 0.000000000000000E+000 A 2.00000000000000
R 1 1.00000000000000 A 3.00000000000000
R 2 2.00000000000000 A 0.000000000000000E+000
R 3 3.00000000000000 A 1.00000000000000
As you see the output of first SEND_DATA execution is different from the second. The results are correct if I run the reproducer on single machine with multiple processes. I am compiling the code with: mpiifort for the Intel(R) MPI Library 2017 Update 3 for Linux* ifort version 17.0.4
and running with mpirun version Intel(R) MPI Library for Linux* OS, Version 2017 Update 3 Build 20170405.
Do you have any idea what could be a source of this issue?
Thank you,
Piotr
Hi,
I've been running on 5 (distributed memory) nodes (each has 20 processors) by using mpirun -n 5 -ppn 1 -hosts nd1,nd2,nd3,nd4,nd5.
Sometimes it works, sometimes, it gives inaccurate results, and sometimes it crashes with the error:
"[0:nd1] unexpected disconnect completion event from [35:nd2] Fatal error in PMPI_Comm_dup: Internal MPI error!, error stack ...".
Any suggestion to fix this communication error while running on multiple nodes with mpi (2017 update 2)?
I already set the stacksize to unlimited in my .rc. file. I tested this for two different applications (one is the famous distributed-memory solver, MUMPS). I have the same issue with both. This is not a very memory-demanding job. mpirun works perfectly on 1 node, this only happens on multiple nodes (even 2).
Thanks
Hi!
I am testing Intel Parallel Studio 2019.0.045 beta for windows. The Intel MPI library that comes with it does not support the Fortran module mpi_f08. Whereas the the Linux version provides such module. Why does this module is not supported in Windows?
Are you planning to support the mpi_f08 module for Windows in the future?
Thanks for your help,
Hector
Hi,
I have observed that when trying to trace the following program with mpiexec -trace everything work fine
as long as I stick with "use mpi". If I change that to "use mpi_f08" I do not get a tracefile.
The reason I'm interested in using mpi_f08 is because I have an application to trace that uses
the shared memory MPI model and it seems that the call to
MPI_Comm_split_type
that is used below is only possible with the mpi_f08 module, right?
Any hints on why I cannot trace that program when using "use mpi_f08"?
Some extra Info:
$ mpiifort -o shm shm.f90
$ mpiifort --version
ifort (IFORT) 18.0.2 20180210$ mpiexec -trace -np 4 shm
program nicks_program ! use mpi_f08 use mpi implicit none integer :: wrank, wsize, sm_rank, sm_size, ierr, send type(MPI_COMM) :: MPI_COMM_SHARED call MPI_Init(ierr) call MPI_comm_rank(MPI_COMM_WORLD, wrank, ierr) call MPI_comm_size(MPI_COMM_WORLD, wsize, ierr) ! call MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, MPI_INFO_NULL, MPI_COMM_SHARED, ierr) send = wrank call MPI_Bcast( send, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr ) ! call MPI_Bcast( send, 1, MPI_INTEGER, 0, MPI_COMM_SHARED, ierr ) write(*,*) 'send = ', send write(*,*) 'ierr = ', ierr call MPI_Finalize(ierr) end
I'm working at a site configured with IMPI (2016.4.072) / Slurm (17.11.4). The MpiDefault is none.
When I run my MPICH2 code (defaulting to --mpi=none)
srun -N 2 -n 4 -l -vv ...
I get (trimming out duplicate error messages from other ranks)
0: PMII_singinit: execv failed: No such file or directory
0: [unset]: This singleton init program attempted to access some feature
0: [unset]: for which process manager support was required, e.g. spawn or universe_size.
0: [unset]: But the necessary mpiexec is not in your path.
0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18014_0 key=P2-hostname
0: :
0: system msg for write_line failure : Bad file descriptor
0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18014_0 key=P3-hostname
0: :
0: system msg for write_line failure : Bad file descriptor
0: 2018-05-25 09:00:14 2: MPI startup(): Multi-threaded optimized library
0: 2018-05-25 09:00:14 2: DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u
0: 2018-05-25 09:00:14 2: MPI startup(): DAPL provider ofa-v2-mlx4_0-1u
0: 2018-05-25 09:00:14 2: MPI startup(): shm and dapl data transfer modes
0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18417_0 key=P1-businesscard-0
0: :
0: system msg for write_line failure : Bad file descriptor
0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=foobar key=foobar
0: :
0: system msg for write_line failure : Bad file descriptor
0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18417_0 key=P1-businesscard-0
0: :
0: system msg for write_line failure : Bad file descriptor
0: Fatal error in PMPI_Init_thread: Other MPI error, error stack:
0: MPIR_Init_thread(784).................:
0: MPID_Init(1332).......................: channel initialization failed
0: MPIDI_CH3_Init(141)...................:
0: dapl_rc_setup_all_connections_20(1388): generic failure with errno = 872614415
0: getConnInfoKVS(849)...................: PMI_KVS_Get failed
If I run the same code with
srun --mpi=pmi2 ...
it works fine.
A couple of questions/comments:
1. In neither case do I set I_MPI_PMI_LIBRARY, which I thought I needed to -- how else does IMPI find the Slurm PMI? This might be why --mpi=none is failing, but for the moment, I can't set the variable because I can't find libpmi[1,2,x].so.
2. I would think that since none is the default, it should work. Under what conditions would none fail, but pmi2 work? Is it because IMPI supports pmi2?
3. If I do need to set I_MPI_PMI_LIBRARY, why does pmi2 still work without setting I_MPI_PMI_LIBRARY? Or do I not need to set it when using IMPI?
4. I'm still trying to understand a bit more of the correlation between libpmi.so and mpi_*.so. libpmi.so is the Slurm PMI library, correct? And mpi_* are the Slurm plug-in libraries (e.g. mpi_none, mpi_pmi2, etc.). How do these libraries fit together?
Thanks,
Raymond