MPI stat

March 15, 2018, 10:01 pm

Latest and popular articles on Intel Technologies

≫ Next: How to use Intel MPI to create system resources such as Opengl windows, system shared memory on Windows 10?

≪ Previous: Intel MPI segmentation fault bug

I want to generate timing log on mpi functions. I am using "export I_MPI_STATS=20" to enable log. This is capturing timing info only on one node. How to get similar information from all nodes that are used in execution.

Thanks

Biren

↧

How to use Intel MPI to create system resources such as Opengl windows, system shared memory on Windows 10?

March 16, 2018, 1:08 am

Latest and popular articles on Intel Technologies

≫ Next: Issue with MPI_ALLREDUCE with MPI_REAL16

≪ Previous: MPI stat

Our school project needs MPI, OpenGL, but in our attempt, we failed to create OpenGL window and system shared memory in Intel MPI process. Could anyone help us.

Our os is Windows 10.

↧

Issue with MPI_ALLREDUCE with MPI_REAL16

March 20, 2018, 5:24 am

Latest and popular articles on Intel Technologies

≫ Next: PETSc 3.8 build: internal error: 0_76

≪ Previous: How to use Intel MPI to create system resources such as Opengl windows, system shared memory on Windows 10?

Hello!

I am running a quad precision code in MPI. However, when I perform MPI_ALLREDUCE with MPI_REAL16 as datatype, the code gives a segmentation fault. How do I incorporate quad precision reduction operations in MPI. Any advice would be greatly appreciated.

Regards

Suman Vajjala

↧

PETSc 3.8 build: internal error: 0_76

March 22, 2018, 3:35 pm

Latest and popular articles on Intel Technologies

≫ Next: execvp error

≪ Previous: Issue with MPI_ALLREDUCE with MPI_REAL16

I'm attempting a PETSc 3.8 build with Intel Parallel Studio 2017.0.5. The build fails without much information, but it appears to be an internal compiler error.

Some key output:

...
/home/cchang/Packages/petsc-3.8/src/vec/is/sf/impls/basic/sfbasic.c(528): (col. 1) remark: FetchAndInsert__blocktype_int_4_1 has been targeted for automatic cpu dispatch
": internal error: 0_76
compilation aborted for /home/cchang/Packages/petsc-3.8/src/vec/is/sf/impls/basic/sfbasic.c (code 4)
gmake[2]: *** [impi-intel/obj/src/vec/is/sf/impls/basic/sfbasic.o] Error 4

Could you tell me what this error 0_76 is? I can provide log files or environment info if these will help.

Thanks,

Chris

↧

execvp error

April 1, 2018, 5:09 pm

Latest and popular articles on Intel Technologies

≫ Next: Help With Very Slow Intel MPI Performance

≪ Previous: PETSc 3.8 build: internal error: 0_76

Gentlemen, could you please help with an issue?

I´m using intel compiler ifort version 18.0.2 and intel mpi version 2018.2.199 in a attempt to run WRF model on HPE (former SGI) ICE X machine.

wrfoperador@dpns31:~> ifort -v
ifort version 18.0.2

wrfoperador@dpns31:~> mpirun -V
Intel(R) MPI Library for Linux* OS, Version 2018 Update 2 Build 20180125 (id: 18157)
Copyright 2003-2018 Intel Corporation.
wrfoperador@dpns31:~>

When I run the executable I receive the following message:

/opt/intel/intel_2018/compilers_and_libraries_2018.2.199/linux/mpi/intel64/bin/mpirun r1i1n0 12 /home/wrfoperador/wrf/wrf_metarea5/WPS/geogrid.exe
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)
[proxy:0:0@dpns31] HYDU_create_process (../../utils/launch/launch.c:825): execvp error on file r1i1n0 (No such file or directory)

Could you help me to solve this problem?

Thanks for your attention. I´m looking forward to your reply.

↧

Help With Very Slow Intel MPI Performance

April 5, 2018, 5:38 pm

Latest and popular articles on Intel Technologies

≫ Next: Problem with NFS Over RDMA on OmniPath

≪ Previous: execvp error

All,

I'm hoping the Intel MPI gurus can help with this. Recently I've tried transitioning some code I help maintain (GEOS, a climate model) from using HPE MPT (2.17, in this case) to Intel MPI (18.0.1; 18.0.2 I'll test soon). In both cases, the compiler (Intel 18.0.1) is the same, both running on the same set of Haswell nodes on an SGI/HPE cluster. The only difference is the MPI stack.

Now one part of the code (AGCM, the physics/dynamics part) is actually a little bit faster with Intel MPI than MPT, even on an SGI machine. That is nice. It's maybe 5-10% faster in some cases. Huzzah!

But, another code (GSI, analysis of observation data) really, really, really does not like Intel MPI. This code displays two issues. First, after the code starts (both launch very fast) it eventually hits a point at which, we believe, the first collective occurs at which point the whole code stalls as it...initializes buffers? Something with Infiniband maybe? We don't know. MPT slows a bit too, but doesn't show this issue nearly as badly as IMPI. We had another place like this in the AGCM where moving from a collective to an Isend/Recv/Wait type paradigm really helped. This "stall" is annoying and, worse, it gets longer and longer as the number of cores increase. (We might have a reproducer for this one.)

But, that is minor really. A minute or so, compared to the overall performance. On 240 cores, MPT 2.17 runs this code in 15:03 (minutes:seconds), Intel MPI 18.0.1, 28:12. On 672 cores, MPT 2.17 runs the code in 12:02 and Intel MPI 18.0.2 in 21:47; doesn't scale well overall for either.

Using I_MPI_STATS, the code is seen to be ~60% MPI in Alltoallv (20% of wall) at 240 cores; at 672, Barrier starts to win, but Alltoallv is still 40% MPI, 23% walltime. I've tried running by setting both I_MPI_ADJUST_ALLTOALLV options (1 and 2) and it does little at all (28:44 and 28:25 at 240).

I'm going to try and see if I can request/reserve a set of nodes for a long time to do an mpitune run, but since each run is ~30 minutes...mpitune will not be fun as it'd be 90 minutes for each option test.

Any ideas on what might be happening? Any advice for flags/environment variables to try? I understand that HPE MPT might/should work best on an SGI/HPE machine (like how Intel compilers seem to do best with Intel chips), but this seems a bit beyond the usual difference. I've requested MVAPICH2 be installed as well for another comparison.

Matt

↧

Problem with NFS Over RDMA on OmniPath

April 5, 2018, 10:21 pm

Latest and popular articles on Intel Technologies

≫ Next: where are the others 4 cores?

≪ Previous: Help With Very Slow Intel MPI Performance

I have been trying to setup NFS over RDMA on OmniPath following instruction in official document. IPoIB works fine, but I cannot get NFS over RDMA working. I have modified /etc/rdma/rdma.conf and added

NFSoRDMA_LOAD=yes
NFSoRDMA_PORT=2050

I have also loaded appropriate modules ( sunrpc on the client, xprtrdma on the server). However, the NFS mount fails (connection refused) when using RDMA to mount, note that it works fine if I do not specify rdma.

It appears that the 2050 port for NFSoRDMA does not get created, when I do rpcinfo from the client to examine the server I see ports 2049 for nfs, but nothing for 2050.

This is on CentOS 7.4. Any ideas/suggestions what may be wrong?

↧

where are the others 4 cores?

April 17, 2018, 7:29 am

Latest and popular articles on Intel Technologies

≫ Next: Announcing the Intel® Parallel Studio XE 2019 Beta Program

≪ Previous: Problem with NFS Over RDMA on OmniPath

Hi, Intel support guys,

I am running tests on our SKYLAKE computers. I am surprise to see there are 4 cores/pkg gone. Where are they?

Our computer system information is below:

Process: Intel Xeon Gold 6148 CPU@2.40GHz 2.39GHz (2 processors)

Installed memory: 384GB

System type: 64-bit operating system x64-based processor

OS: Windows server 2016 standard

Please see the following outputs and you will see that 4 cores per package are gone. where are these 8 cores in total?

I am looking forward to hearing from you.

Thanks in advance

Best regards,

Dingjun

Computer Modelling Group Ltd.

Calgary, AB, Canada

VECTOR_SIMD_OPENMP_TEST
OMP: Info #211: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #209: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31}
OMP: Info #156: KMP_AFFINITY: 32 available OS procs
OMP: Info #158: KMP_AFFINITY: Nonuniform topology
OMP: Info #179: KMP_AFFINITY: 2 packages x 20 cores/pkg x 1 threads/core (32 total cores)
OMP: Info #213: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 0 maps to package 0 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 1 maps to package 0 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 3 maps to package 0 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 0 core 8
OMP: Info #171: KMP_AFFINITY: OS proc 6 maps to package 0 core 9
OMP: Info #171: KMP_AFFINITY: OS proc 7 maps to package 0 core 10
OMP: Info #171: KMP_AFFINITY: OS proc 8 maps to package 0 core 11
OMP: Info #171: KMP_AFFINITY: OS proc 9 maps to package 0 core 12
OMP: Info #171: KMP_AFFINITY: OS proc 10 maps to package 0 core 16
OMP: Info #171: KMP_AFFINITY: OS proc 11 maps to package 0 core 17
OMP: Info #171: KMP_AFFINITY: OS proc 12 maps to package 0 core 18
OMP: Info #171: KMP_AFFINITY: OS proc 13 maps to package 0 core 19
OMP: Info #171: KMP_AFFINITY: OS proc 14 maps to package 0 core 20
OMP: Info #171: KMP_AFFINITY: OS proc 15 maps to package 0 core 24
OMP: Info #171: KMP_AFFINITY: OS proc 16 maps to package 0 core 25
OMP: Info #171: KMP_AFFINITY: OS proc 17 maps to package 0 core 26
OMP: Info #171: KMP_AFFINITY: OS proc 18 maps to package 0 core 27
OMP: Info #171: KMP_AFFINITY: OS proc 19 maps to package 0 core 28
OMP: Info #171: KMP_AFFINITY: OS proc 20 maps to package 1 core 0
OMP: Info #171: KMP_AFFINITY: OS proc 21 maps to package 1 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 22 maps to package 1 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 23 maps to package 1 core 3
OMP: Info #171: KMP_AFFINITY: OS proc 24 maps to package 1 core 4
OMP: Info #171: KMP_AFFINITY: OS proc 25 maps to package 1 core 8
OMP: Info #171: KMP_AFFINITY: OS proc 26 maps to package 1 core 9
OMP: Info #171: KMP_AFFINITY: OS proc 27 maps to package 1 core 10
OMP: Info #171: KMP_AFFINITY: OS proc 28 maps to package 1 core 11
OMP: Info #171: KMP_AFFINITY: OS proc 29 maps to package 1 core 12
OMP: Info #171: KMP_AFFINITY: OS proc 30 maps to package 1 core 16
OMP: Info #171: KMP_AFFINITY: OS proc 31 maps to package 1 core 17
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 4004 thread 0 bound to OS proc set {0}
The number of processors available =       32
The number of threads available    =       20
HELLO from process        0
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8956 thread 1 bound to OS proc set {1}
HELLO from process        1
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8820 thread 2 bound to OS proc set {2}
HELLO from process        2
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9292 thread 3 bound to OS proc set {3}
HELLO from process        3
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9752 thread 4 bound to OS proc set {4}
HELLO from process        4
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 3776 thread 5 bound to OS proc set {5}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8464 thread 6 bound to OS proc set {6}
HELLO from process        5
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 1416 thread 7 bound to OS proc set {7}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 3868 thread 8 bound to OS proc set {8}
HELLO from process        6
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 7396 thread 9 bound to OS proc set {9}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9772 thread 10 bound to OS proc set {10}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9280 thread 11 bound to OS proc set {11}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9948 thread 12 bound to OS proc set {12}
HELLO from process        7
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8712 thread 13 bound to OS proc set {13}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 6092 thread 14 bound to OS proc set {14}
HELLO from process       11
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8532 thread 15 bound to OS proc set {15}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9892 thread 16 bound to OS proc set {16}
HELLO from process       12
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 10640 thread 17 bound to OS proc set {17}
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 9060 thread 18 bound to OS proc set {18}
HELLO from process       14
OMP: Info #249: KMP_AFFINITY: pid 10068 tid 8908 thread 19 bound to OS proc set {19}
HELLO from process       18
HELLO from process       16
HELLO from process       19
HELLO from process       13
HELLO from process        8
HELLO from process       17
HELLO from process       15
HELLO from process       10
HELLO from process        9
matrix multiplication completed
Elapsed wall clock time 2 =    133.379

↧

Announcing the Intel® Parallel Studio XE 2019 Beta Program

April 19, 2018, 9:39 am

Latest and popular articles on Intel Technologies

≫ Next: IMPI run error

≪ Previous: where are the others 4 cores?

Join the Intel® Parallel Studio XE 2019 Beta Program today and—for a limited time—get early access to new features and get an open invitation to tell us what you really think.

We want YOU to tell us what to improve so we can create high-quality software tools that meet your development needs.

Top New Features in Intel® Parallel Studio XE 2019 Beta

Scale and perform on the path to exascale. Enable greater scalability and improve latency with the latest Intel^® MPI Library.
Get better answers with less overhead. Focus more fully on useful data, CPU utilization of physical cores, and more using new data-selection support from Intel^® VTune^™ Amplifier’s Application Performance Snapshot.
Visualize parallelism. Interactively build, validate, and visualize algorithms using Intel® Advisor’s Flow Graph Analyzer.
Stay up-to-date with the latest standards:
- Expanded C++17 and Fortran 2018 support
- Full OpenMP* 4.5 and expanded OpenMP 5.0 support
- Python* 3.6 and 2.7

New Features in Intel® MPI Library

Updated architecture to streamline fabric utilization through libfabrics.
Implemented support for Intel® Omni-Path Architecture Multiple Endpoints (Multi-EP)./li>
Cleaned up directory structure.
New format for MPI tuner.
Added impi_info utility as a technical preview feature.
Updated Hydra process manager.

New Features in Intel® Cluster Checker

Simplified execution of Intel® Cluster Checker with a single command.
New ‘-X’ option to get details of data collected and analysis test.
New feature to compare two snapshots of a cluster state to identify changes.
New option to refresh any missing or old data before analysis.
Added auto-node discovery when using SLURM.

To learn more, visit Intel^® Parallel Studio XE 2019 Beta page.

Then sign up to get started.

↧

IMPI run error

April 23, 2018, 9:49 pm

Latest and popular articles on Intel Technologies

≫ Next: Problem compiling with Intel MPI 2018.2 and ifort 15.0.3

≪ Previous: Announcing the Intel® Parallel Studio XE 2019 Beta Program

Dear All,
    I compiled VASP package with IMPI succcessfully, when I run the program, the program stops with 
some MPI errors listed as below.  Could anybody tell me how to fix it. Thanks!

Xiang YE

[0] MPI startup(): Intel(R) MPI Library, Version 2017 Update 4  Build 20170817 (id: 17752)
[0] MPI startup(): Copyright (C) 2003-2017 Intel Corporation.  All rights reserved.
[0] MPI startup(): Multi-threaded optimized library
[3] MPI startup(): Found 1 IB devices
[5] MPI startup(): Found 1 IB devices
[24] MPI startup(): Found 1 IB devices
[15] MPI startup(): Found 1 IB devices
[25] MPI startup(): Found 1 IB devices
[14] MPI startup(): Found 1 IB devices
[26] MPI startup(): Found 1 IB devices
[9] MPI startup(): Found 1 IB devices
[16] MPI startup(): Found 1 IB devices
[27] MPI startup(): Found 1 IB devices
[7] MPI startup(): Found 1 IB devices
[1] MPI startup(): Found 1 IB devices
[17] MPI startup(): Found 1 IB devices
[18] MPI startup(): Found 1 IB devices
[8] MPI startup(): Found 1 IB devices
[10] MPI startup(): Found 1 IB devices
[19] MPI startup(): Found 1 IB devices
[20] MPI startup(): Found 1 IB devices
[0] MPI startup(): Found 1 IB devices
[11] MPI startup(): Found 1 IB devices
[21] MPI startup(): Found 1 IB devices
[2] MPI startup(): Found 1 IB devices
[22] MPI startup(): Found 1 IB devices
[13] MPI startup(): Found 1 IB devices
[12] MPI startup(): Found 1 IB devices
[23] MPI startup(): Found 1 IB devices
[4] MPI startup(): Found 1 IB devices
[6] MPI startup(): Found 1 IB devices
[0] MPI startup(): Open 0 IB device: mlx5_0
[11] MPI startup(): Open 0 IB device: mlx5_0
[20] MPI startup(): Open 0 IB device: mlx5_0
[2] MPI startup(): Open 0 IB device: mlx5_0
[21] MPI startup(): Open 0 IB device: mlx5_0
[9] MPI startup(): Open 0 IB device: mlx5_0
[3] MPI startup(): Open 0 IB device: mlx5_0
[22] MPI startup(): Open 0 IB device: mlx5_0
[25] MPI startup(): Open 0 IB device: mlx5_0
[1] MPI startup(): Open 0 IB device: mlx5_0
[23] MPI startup(): Open 0 IB device: mlx5_0
[13] MPI startup(): Open 0 IB device: mlx5_0
[4] MPI startup(): Open 0 IB device: mlx5_0
[10] MPI startup(): Open 0 IB device: mlx5_0
[15] MPI startup(): Open 0 IB device: mlx5_0
[26] MPI startup(): Open 0 IB device: mlx5_0
[14] MPI startup(): Open 0 IB device: mlx5_0
[16] MPI startup(): Open 0 IB device: mlx5_0
[27] MPI startup(): Open 0 IB device: mlx5_0
[19] MPI startup(): Open 0 IB device: mlx5_0
[8] MPI startup(): Open 0 IB device: mlx5_0
[18] MPI startup(): Open 0 IB device: mlx5_0
[6] MPI startup(): Open 0 IB device: mlx5_0
[24] MPI startup(): Open 0 IB device: mlx5_0
[7] MPI startup(): Open 0 IB device: mlx5_0
[12] MPI startup(): Open 0 IB device: mlx5_0
[17] MPI startup(): Open 0 IB device: mlx5_0
[5] MPI startup(): Open 0 IB device: mlx5_0
[0] MPI startup(): Start 1 ports per adapter
[20] MPI startup(): Start 1 ports per adapter
[11] MPI startup(): Start 1 ports per adapter
[9] MPI startup(): Start 1 ports per adapter
[3] MPI startup(): Start 1 ports per adapter
[21] MPI startup(): Start 1 ports per adapter
[2] MPI startup(): Start 1 ports per adapter
[1] MPI startup(): Start 1 ports per adapter
[25] MPI startup(): Start 1 ports per adapter
[22] MPI startup(): Start 1 ports per adapter
[23] MPI startup(): Start 1 ports per adapter
[4] MPI startup(): Start 1 ports per adapter
[10] MPI startup(): Start 1 ports per adapter
[15] MPI startup(): Start 1 ports per adapter
[13] MPI startup(): Start 1 ports per adapter
[26] MPI startup(): Start 1 ports per adapter
[14] MPI startup(): Start 1 ports per adapter
[27] MPI startup(): Start 1 ports per adapter
[16] MPI startup(): Start 1 ports per adapter
[12] MPI startup(): Start 1 ports per adapter
[18] MPI startup(): Start 1 ports per adapter
[24] MPI startup(): Start 1 ports per adapter
[6] MPI startup(): Start 1 ports per adapter
[19] MPI startup(): Start 1 ports per adapter
[8] MPI startup(): Start 1 ports per adapter
[5] MPI startup(): Start 1 ports per adapter
[17] MPI startup(): Start 1 ports per adapter
[7] MPI startup(): Start 1 ports per adapter
[11] MPID_nem_ofacm_init(): Init
[0] MPID_nem_ofacm_init(): Init
[20] MPID_nem_ofacm_init(): Init
[9] MPID_nem_ofacm_init(): Init
[3] MPID_nem_ofacm_init(): Init
[21] MPID_nem_ofacm_init(): Init
[2] MPID_nem_ofacm_init(): Init
[1] MPID_nem_ofacm_init(): Init
[22] MPID_nem_ofacm_init(): Init
[25] MPID_nem_ofacm_init(): Init
[23] MPID_nem_ofacm_init(): Init
[11] MPI startup(): ofa data transfer mode
[0] MPI startup(): ofa data transfer mode
[20] MPI startup(): ofa data transfer mode
[4] MPID_nem_ofacm_init(): Init
[10] MPID_nem_ofacm_init(): Init
[26] MPID_nem_ofacm_init(): Init
[14] MPID_nem_ofacm_init(): Init
[15] MPID_nem_ofacm_init(): Init
[13] MPID_nem_ofacm_init(): Init
[27] MPID_nem_ofacm_init(): Init
[16] MPID_nem_ofacm_init(): Init
[12] MPID_nem_ofacm_init(): Init
[9] MPI startup(): ofa data transfer mode
[18] MPID_nem_ofacm_init(): Init
[3] MPI startup(): ofa data transfer mode
[8] MPID_nem_ofacm_init(): Init
[24] MPID_nem_ofacm_init(): Init
[19] MPID_nem_ofacm_init(): Init
[5] MPID_nem_ofacm_init(): Init
[21] MPI startup(): ofa data transfer mode
[17] MPID_nem_ofacm_init(): Init
[1] MPI startup(): ofa data transfer mode
[7] MPID_nem_ofacm_init(): Init
[2] MPI startup(): ofa data transfer mode
[6] MPID_nem_ofacm_init(): Init
[22] MPI startup(): ofa data transfer mode
[25] MPI startup(): ofa data transfer mode
[23] MPI startup(): ofa data transfer mode
[10] MPI startup(): ofa data transfer mode
[14] MPI startup(): ofa data transfer mode
[15] MPI startup(): ofa data transfer mode
[26] MPI startup(): ofa data transfer mode
[4] MPI startup(): ofa data transfer mode
[13] MPI startup(): ofa data transfer mode
[27] MPI startup(): ofa data transfer mode
[16] MPI startup(): ofa data transfer mode
[12] MPI startup(): ofa data transfer mode
[18] MPI startup(): ofa data transfer mode
[24] MPI startup(): ofa data transfer mode
[19] MPI startup(): ofa data transfer mode
[8] MPI startup(): ofa data transfer mode
[5] MPI startup(): ofa data transfer mode
[6] MPI startup(): ofa data transfer mode
[7] MPI startup(): ofa data transfer mode
[17] MPI startup(): ofa data transfer mode
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(805): fail failed
MPID_Init(1866)......: fail failed
MPIR_Comm_commit(711): fail failed
(unknown)(): Other MPI error

↧

Problem compiling with Intel MPI 2018.2 and ifort 15.0.3

May 6, 2018, 8:46 pm

Latest and popular articles on Intel Technologies

≫ Next: Support for NVIDIA GPUdirect RDMA?

≪ Previous: IMPI run error

Hi Everyone,

I just installed the newest version of Intel MPI library (2018.2.199), previously I was using openMPI. I am using ifort 15.0.3.
I am trying to compile the following test program:

program main
    use mpi_f08
    implicit none
    integer :: rank, size, len
    character(len=MPI_MAX_LIBRARY_VERSION_STRING) :: version

    call MPI_INIT()
    call MPI_COMM_RANK(MPI_COMM_WORLD, rank)
    call MPI_COMM_SIZE(MPI_COMM_WORLD, size)
    call MPI_GET_LIBRARY_VERSION(version, len)

    print *, "rank:", rank
    print *, "size:",size
    print *, "version: "//version
    print *, ' No Errors'

    call MPI_FINALIZE()
end

When I use openMPI it works fine. However, I am getting the following errors with Intel MPI:

% mpiifort test_F08.f90 
test_F08.f90(2): error #7012: The module file cannot be read.  Its format requires a more recent F90 compiler.   [MPI_F08]
    use mpi_f08
--------^
test_F08.f90(8): error #6404: This name does not have a type, and must have an explicit type.   [MPI_COMM_WORLD]
    call MPI_COMM_RANK(MPI_COMM_WORLD, rank)
-----------------------^
test_F08.f90(5): error #6279: A specification expression object must be a dummy argument, a COMMON block object, or an object accessible through host or use association.   [MPI_MAX_LIBRARY_VERSION_STRING]
    character(len=MPI_MAX_LIBRARY_VERSION_STRING) :: version
------------------^
test_F08.f90(5): error #6591: An automatic object is invalid in a main program.   [VERSION]
    character(len=MPI_MAX_LIBRARY_VERSION_STRING) :: version
-----------------------------------------------------^
compilation aborted for test_F08.f90 (code 1)

So, do I have to use the same version of ifort that was used during compilation of Intel MPI modules? That is not listed in the requirements to use intel MPI.
Why does Intel MPI not create new modules files using the Fortran compiler available in the system?
Is there anything that I can do to use Intel MPI with my compiler?

Thanks for your help,

Hector

↧

Support for NVIDIA GPUdirect RDMA?

May 8, 2018, 2:31 pm

Latest and popular articles on Intel Technologies

≫ Next: Interested in buying a "used" cluster edition compiler for linux

≪ Previous: Problem compiling with Intel MPI 2018.2 and ifort 15.0.3

Does Intel MPI support GPUdirect RDMA, with NVIDIA drivers and Cudatoolkit 9.x installed?

Is there any documentation on what drivers to install, and what fabric select env vars to set?

Thanks

Ron

↧

Interested in buying a "used" cluster edition compiler for linux

May 10, 2018, 6:56 am

Latest and popular articles on Intel Technologies

≫ Next: adding further compute nodes

≪ Previous: Support for NVIDIA GPUdirect RDMA?

My understanding is that Intel allows one to sell and transfer your license to someone else. I am a small scale open-source developer and can't afford the price for the latest cluster edition linux compilers. Send me a note if you have an older version you wouldn't mind transferring to me. For my needs anything 2015 and newer would suffice.

↧

adding further compute nodes

May 16, 2018, 6:12 am

Latest and popular articles on Intel Technologies

≫ Next: Visual Studio project settings to instrument for Trace Analyzer

≪ Previous: Interested in buying a "used" cluster edition compiler for linux

Hi,

Is there a need to re-install Intel Studio even in the case then I added further compute nodes to my cluster? There exists two infiniband -islands, ibstat is:

CA 'mlx4_0', CA type: MT4099 and CA 'mlx4_1', CA type: MT26428. The latest compute nodes are associated to MT4099.

These provider -errors are only present in the 'newer node context'

[2] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
[10] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
[12] MPI startup(): dapl fabric is not available and fallback fabric is not enabled
node009:UCM:2d97:570fa700: 1249 us(1249 us): open_hca: device mlx4_0 not found
node009:UCM:2d9f:1626f700: 1262 us(1262 us): open_hca: device mlx4_0 not found
node009:UCM:2da1:7f214700: 1102 us(1102 us): open_hca: device mlx4_0 not found

Regards

Gert

Adjunto	Tamaño
Descargar ib_provider.txt	5.87 KB

↧

Visual Studio project settings to instrument for Trace Analyzer

May 16, 2018, 6:09 pm

Latest and popular articles on Intel Technologies

≫ Next: Issue with MPI_Sendrecv

≪ Previous: adding further compute nodes

Hello,

I'm just getting started with Intel MPI and am trying to understand how to use Trace Analyzer. My understanding is that linking with vt.lib and running an mpi application is sufficient to cause a *.stf file to be emitted. I have a simple Hello World MPI application. After linking with vt.lib and running through mpiexec, I see no stf output.

There's not much more information to add. The setup could not be simpler. What am I missing?

Jeff

↧

Issue with MPI_Sendrecv

May 21, 2018, 7:25 am

Latest and popular articles on Intel Technologies

≫ Next: mpirun: unexpected disconnect completion event

≪ Previous: Visual Studio project settings to instrument for Trace Analyzer

Hello,

I am experiencing issues while using MPI_Sendrecv on multiple machines. In the code I am sending a vector in the circular manner in parallel. Each process is sending data to the subsequent process and receiving data from preceding process. Surprisingly, in the first execution of SEND_DATA routine the output is correct. While for the second execution the output is incorrect. The code and the output are below.

PROGRAM SENDRECV_REPROD
USE MPI
USE ISO_FORTRAN_ENV,ONLY: INT32
IMPLICIT NONE
INTEGER(KIND=INT32) :: STATUS(MPI_STATUS_SIZE) 
INTEGER(KIND=INT32) :: RANK,NUM_PROCS,IERR

CALL MPI_INIT(IERR)
CALL MPI_COMM_RANK(MPI_COMM_WORLD,RANK,IERR)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD,NUM_PROCS,IERR)

CALL SEND_DATA(RANK,NUM_PROCS)
CALL SEND_DATA(RANK,NUM_PROCS)

CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)  
CALL MPI_FINALIZE(IERR)

END PROGRAM

SUBROUTINE SEND_DATA(RANK,NUM_PROCS)
USE ISO_FORTRAN_ENV,ONLY: INT32,REAL64
USE MPI
IMPLICIT NONE
INTEGER(KIND=INT32),INTENT(IN) :: RANK
INTEGER(KIND=INT32),INTENT(IN) :: NUM_PROCS
INTEGER(KIND=INT32) :: IERR,ALLOC_ERROR
INTEGER(KIND=INT32) :: VEC_SIZE,I_RANK,RANK_DESTIN,RANK_SOURCE,TAG_SEND,TAG_RECV
REAL(KIND=REAL64), ALLOCATABLE :: COMM_BUFFER(:),VEC1(:)
INTEGER(KIND=INT32) :: MPI_COMM_STATUS(MPI_STATUS_SIZE) 



! Allocate communication arrays.

VEC_SIZE = 374454
ALLOCATE(COMM_BUFFER(VEC_SIZE),STAT=ALLOC_ERROR)
ALLOCATE(VEC1(VEC_SIZE),STAT=ALLOC_ERROR)



! Define destination and source ranks for sending and receiving messages.

RANK_DESTIN = MOD((RANK+1),NUM_PROCS)
RANK_SOURCE = MOD((RANK+NUM_PROCS-1),NUM_PROCS)

TAG_SEND = RANK+1
TAG_RECV = RANK
IF (RANK==0) TAG_RECV=NUM_PROCS

VEC1=RANK
COMM_BUFFER=0.0_REAL64
        
    
CALL MPI_BARRIER(MPI_COMM_WORLD,IERR)
        
DO I_RANK=1,NUM_PROCS
    IF (RANK==I_RANK-1) WRITE(*,*) 'R',RANK, VEC1(1),'B', COMM_BUFFER(1)
ENDDO

CALL MPI_SENDRECV(VEC1(1),VEC_SIZE,MPI_DOUBLE_PRECISION,RANK_DESTIN,TAG_SEND,COMM_BUFFER(1),&
                    VEC_SIZE,MPI_DOUBLE_PRECISION,RANK_SOURCE,TAG_RECV,MPI_COMM_WORLD,MPI_COMM_STATUS,IERR)
        
DO I_RANK=1,NUM_PROCS
    IF (RANK==I_RANK-1) WRITE(*,*) 'R' ,  RANK , VEC1(1),'A', COMM_BUFFER(1)
ENDDO



END SUBROUTINE SEND_DATA

Output of four processes run on four machines:

R 0 0.000000000000000E+000 B 0.000000000000000E+000

R 1 1.00000000000000 B 0.000000000000000E+000

R 2 2.00000000000000 B 0.000000000000000E+000

R 3 3.00000000000000 B 0.000000000000000E+000

R 0 0.000000000000000E+000 A 3.00000000000000

R 1 1.00000000000000 A 0.000000000000000E+000

R 2 2.00000000000000 A 1.00000000000000

R 3 3.00000000000000 A 2.00000000000000

R 0 0.000000000000000E+000 B 0.000000000000000E+000

R 1 1.00000000000000 B 0.000000000000000E+000

R 2 2.00000000000000 B 0.000000000000000E+000

R 3 3.00000000000000 B 0.000000000000000E+000

R 0 0.000000000000000E+000 A 2.00000000000000

R 1 1.00000000000000 A 3.00000000000000

R 2 2.00000000000000 A 0.000000000000000E+000

R 3 3.00000000000000 A 1.00000000000000

As you see the output of first SEND_DATA execution is different from the second. The results are correct if I run the reproducer on single machine with multiple processes. I am compiling the code with: mpiifort for the Intel(R) MPI Library 2017 Update 3 for Linux* ifort version 17.0.4

and running with mpirun version Intel(R) MPI Library for Linux* OS, Version 2017 Update 3 Build 20170405.

Do you have any idea what could be a source of this issue?

Thank you,
Piotr

↧

mpirun: unexpected disconnect completion event

May 22, 2018, 2:50 pm

Latest and popular articles on Intel Technologies

≫ Next: Suport for mpi_f08 Fortran module in MPI 2019.0.045 beta

≪ Previous: Issue with MPI_Sendrecv

Hi,

I've been running on 5 (distributed memory) nodes (each has 20 processors) by using mpirun -n 5 -ppn 1 -hosts nd1,nd2,nd3,nd4,nd5.

Sometimes it works, sometimes, it gives inaccurate results, and sometimes it crashes with the error:

"[0:nd1] unexpected disconnect completion event from [35:nd2] Fatal error in PMPI_Comm_dup: Internal MPI error!, error stack ...".

Any suggestion to fix this communication error while running on multiple nodes with mpi (2017 update 2)?

I already set the stacksize to unlimited in my .rc. file. I tested this for two different applications (one is the famous distributed-memory solver, MUMPS). I have the same issue with both. This is not a very memory-demanding job. mpirun works perfectly on 1 node, this only happens on multiple nodes (even 2).

Thanks

↧

Suport for mpi_f08 Fortran module in MPI 2019.0.045 beta

May 23, 2018, 12:20 pm

Latest and popular articles on Intel Technologies

≫ Next: Trace Collector + Fortran 2008

≪ Previous: mpirun: unexpected disconnect completion event

Hi!

I am testing Intel Parallel Studio 2019.0.045 beta for windows. The Intel MPI library that comes with it does not support the Fortran module mpi_f08. Whereas the the Linux version provides such module. Why does this module is not supported in Windows?

Are you planning to support the mpi_f08 module for Windows in the future?

Thanks for your help,

Hector

↧

Trace Collector + Fortran 2008

May 24, 2018, 4:45 am

Latest and popular articles on Intel Technologies

≫ Next: IMPI w/ Slurm

≪ Previous: Suport for mpi_f08 Fortran module in MPI 2019.0.045 beta

Hi,

I have observed that when trying to trace the following program with mpiexec -trace everything work fine
as long as I stick with "use mpi". If I change that to "use mpi_f08" I do not get a tracefile.
The reason I'm interested in using mpi_f08 is because I have an application to trace that uses
the shared memory MPI model and it seems that the call to

MPI_Comm_split_type

that is used below is only possible with the mpi_f08 module, right?

Any hints on why I cannot trace that program when using "use mpi_f08"?

Some extra Info:

$ mpiifort -o shm shm.f90
$ mpiifort --version
ifort (IFORT) 18.0.2 20180210
$ mpiexec -trace -np 4 shm

program nicks_program

   ! use mpi_f08
   use mpi 

   implicit none

   integer :: wrank, wsize, sm_rank, sm_size, ierr, send
   type(MPI_COMM) :: MPI_COMM_SHARED 

   call MPI_Init(ierr)
   call MPI_comm_rank(MPI_COMM_WORLD, wrank, ierr)
   call MPI_comm_size(MPI_COMM_WORLD, wsize, ierr)

   ! call MPI_Comm_split_type(MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0, MPI_INFO_NULL, MPI_COMM_SHARED, ierr)
   send = wrank


   call MPI_Bcast( send, 1, MPI_INTEGER, 0, MPI_COMM_WORLD, ierr )
   ! call MPI_Bcast( send, 1, MPI_INTEGER, 0, MPI_COMM_SHARED, ierr )

   write(*,*) 'send = ', send
   write(*,*) 'ierr = ', ierr

   call MPI_Finalize(ierr)
end

↧

IMPI w/ Slurm

May 25, 2018, 9:46 am

Latest and popular articles on Intel Technologies

≫ Next: Cannot use MPI 2019.0.045 beta

≪ Previous: Trace Collector + Fortran 2008

I'm working at a site configured with IMPI (2016.4.072) / Slurm (17.11.4). The MpiDefault is none.

When I run my MPICH2 code (defaulting to --mpi=none)

srun -N 2 -n 4 -l -vv ...

I get (trimming out duplicate error messages from other ranks)

0: PMII_singinit: execv failed: No such file or directory

0: [unset]: This singleton init program attempted to access some feature

0: [unset]: for which process manager support was required, e.g. spawn or universe_size.

0: [unset]: But the necessary mpiexec is not in your path.

0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18014_0 key=P2-hostname

0: :

0: system msg for write_line failure : Bad file descriptor

0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18014_0 key=P3-hostname

0: :

0: system msg for write_line failure : Bad file descriptor

0: 2018-05-25 09:00:14 2: MPI startup(): Multi-threaded optimized library

0: 2018-05-25 09:00:14 2: DAPL startup(): trying to open DAPL provider from I_MPI_DAPL_PROVIDER: ofa-v2-mlx4_0-1u

0: 2018-05-25 09:00:14 2: MPI startup(): DAPL provider ofa-v2-mlx4_0-1u

0: 2018-05-25 09:00:14 2: MPI startup(): shm and dapl data transfer modes

0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18417_0 key=P1-businesscard-0

0: :

0: system msg for write_line failure : Bad file descriptor

0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=foobar key=foobar

0: :

0: system msg for write_line failure : Bad file descriptor

0: [unset]: write_line error; fd=-1 buf=:cmd=get kvsname=singinit_kvs_18417_0 key=P1-businesscard-0

0: :

0: system msg for write_line failure : Bad file descriptor

0: Fatal error in PMPI_Init_thread: Other MPI error, error stack:

0: MPIR_Init_thread(784).................:

0: MPID_Init(1332).......................: channel initialization failed

0: MPIDI_CH3_Init(141)...................:

0: dapl_rc_setup_all_connections_20(1388): generic failure with errno = 872614415

0: getConnInfoKVS(849)...................: PMI_KVS_Get failed

If I run the same code with

srun --mpi=pmi2 ...

it works fine.

A couple of questions/comments:

1. In neither case do I set I_MPI_PMI_LIBRARY, which I thought I needed to -- how else does IMPI find the Slurm PMI? This might be why --mpi=none is failing, but for the moment, I can't set the variable because I can't find libpmi[1,2,x].so.

2. I would think that since none is the default, it should work. Under what conditions would none fail, but pmi2 work? Is it because IMPI supports pmi2?

3. If I do need to set I_MPI_PMI_LIBRARY, why does pmi2 still work without setting I_MPI_PMI_LIBRARY? Or do I not need to set it when using IMPI?

4. I'm still trying to understand a bit more of the correlation between libpmi.so and mpi_*.so. libpmi.so is the Slurm PMI library, correct? And mpi_* are the Slurm plug-in libraries (e.g. mpi_none, mpi_pmi2, etc.). How do these libraries fit together?

Thanks,

Raymond

↧