Quantcast
Channel: Clusters and HPC Technology
Viewing all 952 articles
Browse latest View live

Errors when compiling MUMPS

$
0
0

Hi,

I installed MUMPS  using intel parallel libraries, but when I run exaples it shows fatal errors. Can you help me with the problem? Thanks in advance.

 

Here is the error:

[mlin4@min-workstation examples]$ ./dsimpletest < input_simpletest_real
Abort(1094543) on node 0 (rank 0 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
MPIR_Init_thread(649)......: 
MPID_Init(863).............: 
MPIDI_NM_mpi_init_hook(705): OFI addrinfo() failed (ofi_init.h:705:MPIDI_NM_mpi_init_hook:No data available)
 

Here is the make.inc file:

#
#  This file is part of MUMPS 5.1.2, released
#  on Mon Oct  2 07:37:01 UTC 2017
#
#Begin orderings

# NOTE that PORD is distributed within MUMPS by default. It is recommended to
# install other orderings. For that, you need to obtain the corresponding package
# and modify the variables below accordingly.
# For example, to have Metis available within MUMPS:
#          1/ download Metis and compile it
#          2/ uncomment (suppress # in first column) lines
#             starting with LMETISDIR,  LMETIS
#          3/ add -Dmetis in line ORDERINGSF
#             ORDERINGSF  = -Dpord -Dmetis
#          4/ Compile and install MUMPS
#             make clean; make   (to clean up previous installation)
#
#          Metis/ParMetis and SCOTCH/PT-SCOTCH (ver 6.0 and later) orderings are recommended.
#

#SCOTCHDIR  = ${HOME}/scotch_6.0
#ISCOTCH    = -I$(SCOTCHDIR)/include
#
# You have to choose one among the following two lines depending on
# the type of analysis you want to perform. If you want to perform only
# sequential analysis choose the first (remember to add -Dscotch in the ORDERINGSF
# variable below); for both parallel and sequential analysis choose the second 
# line (remember to add -Dptscotch in the ORDERINGSF variable below)

#LSCOTCH    = -L$(SCOTCHDIR)/lib -lesmumps -lscotch -lscotcherr
#LSCOTCH    = -L$(SCOTCHDIR)/lib -lptesmumps -lptscotch -lptscotcherr

LPORDDIR = $(topdir)/PORD/lib/
IPORD    = -I$(topdir)/PORD/include/
LPORD    = -L$(LPORDDIR) -lpord

LMETISDIR = /home/mlin4/metis/build/Linux-x86_64/libmetis
IMETIS    = /home/mlin4/metis/include

# You have to choose one among the following two lines depending on
# the type of analysis you want to perform. If you want to perform only
# sequential analysis choose the first (remember to add -Dmetis in the ORDERINGSF
# variable below); for both parallel and sequential analysis choose the second 
# line (remember to add -Dparmetis in the ORDERINGSF variable below)

LMETIS    = -L$(LMETISDIR) -lmetis
#LMETIS    = -L$(LMETISDIR) -lparmetis -lmetis

# The following variables will be used in the compilation process.
# Please note that -Dptscotch and -Dparmetis imply -Dscotch and -Dmetis respectively.
# If you want to use Metis 4.X or an older version, you should use -Dmetis4 instead of -Dmetis
# or in addition with -Dparmetis (if you are using parmetis 3.X or older).
#ORDERINGSF = -Dscotch -Dmetis -Dpord -Dptscotch -Dparmetis
#ORDERINGSF  = -Dpord -Dmetis -Dparmetis
ORDERINGSF  = -Dpord -Dmetis
ORDERINGSC  = $(ORDERINGSF)

LORDERINGS = $(LMETIS) $(LPORD) $(LSCOTCH)
IORDERINGSF = $(ISCOTCH)
IORDERINGSC = $(IMETIS) $(IPORD) $(ISCOTCH)

#End orderings
########################################################################
################################################################################

PLAT    =
LIBEXT  = .a
OUTC    = -o 
OUTF    = -o 
RM = /bin/rm -f
CC = mpiicc
FC = mpiifort
FL = mpiifort
AR = ar vr 
#RANLIB = ranlib
RANLIB  = echo
# Make this variable point to the path where the Intel MKL library is
# installed. It is set to the default install directory for Intel MKL.
MKLROOT=/home/mlin4/opt/intel/mkl/lib/intel64
LAPACK = -L$(MKLROOT) -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core
SCALAP = -L$(MKLROOT) -lmkl_scalapack_lp64 -lmkl_blacs_intelmpi_lp64

LIBPAR = $(SCALAP) $(LAPACK)

INCSEQ = -I$(topdir)/libseq
LIBSEQ  = $(LAPACK) -L$(topdir)/libseq -lmpiseq

LIBBLAS = -L$(MKLROOT) -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core 
LIBOTHERS = -lpthread

#Preprocessor defs for calling Fortran from C (-DAdd_ or -DAdd__ or -DUPPER)
CDEFS   = -DAdd_

#Begin Optimized options
OPTF    = -O -nofor_main -DBLR_MT -qopenmp # or -openmp for old compilers
OPTL    = -O -nofor_main -qopenmp
OPTC    = -O -qopenmp
#End Optimized options
 
INCS = $(INCPAR)
LIBS = $(LIBPAR)
LIBSEQNEEDED =

 

 

Best,

Min

 


How could I download previous version of Intel MPI library?

$
0
0

Hi,

I hope I can download a Intel MPI library of 4.1.0 version for 32bit application on Windows but I find nothing on this websit of Intel.  I don't know if there is any backups about that Intel MPI library on the weksit of Intel. I would appreciate it if some one could share me some clues.

Please,Please,Please, I am waiting for that.

amplxe: Error: Ftrace is already in use.

$
0
0

Hi,
i am trying to run vtune amplifier 2019 to collect system-overview as - 

export NPROCS=36
export OMP_NUM_THREADS=1
mpirun -genv OMP_NUM_THREADS $OMP_NUM_THREADS -np $NPROCS  amplxe-cl -collect system-overview  -result-dir /home/puneet/run_node02_impi2019_profiler_systemoverview/profiles/attempt1_p${NPROCS}_t${OMP_NUM_THREADS}  -quiet $INSTALL_ROOT/main/wrf.exe

I had collected hpc-performance data without any issue. Afterwards , i ran aforementioned command but had to kill it (result dir was incorrect.). when i re-ran the amplxe-cl, i am getting following error messages - 
 

amplxe: Error: Ftrace is already in use. Make sure to stop previous collection first.
amplxe: Error: Ftrace is already in use. Make sure to stop previous collection first.
amplxe: Error: Ftrace is already in use. Make sure to stop previous collection first.

I have tried deleting the /home/puneet/run_node02_impi2019_profiler_systemoverview/profiles/* and i have also rebooted the node.
even then those error messages are showing up.
Please advice.
 

Bug report: mpicc illegally pre-pends my LIBRARY_PATH

$
0
0

When trying to link my application using the mpicc wrapper of Intel MPI 2018.4, it prepends several paths to my LIBRARY_PATH. I set this variable to use a custom library instead of the one installed on my system. However, since the path of the system's library is also pre-pended my program is silently linked against the wrong library. There is nothing I can do about this but specifying the library path explicitly via an -L option during linking, but I don't want to do this!

In my opinion, all wrapper scripts should only POST-pend their paths to the environment variables! The user-specified paths must always win!

Georg

 

I_MPI_AUTH_METHOD not working with IMPI 19 Update 3

$
0
0

Hi,

I moved from Intel MPI 2018 Update 4 to Intel MPI 2019 Update 3 and it seems that the new version ignores setting the user authorization method for mpirun via environment variable I_MPI_AUTH_METHOD=delegate . Setting it directly with mpiexec -delegate still works fine. Can anyone confirm that?

https://software.intel.com/en-us/mpi-developer-reference-windows-user-au...

Thanks and kind regards,

Volker Jacht

Wrong MPICH_NUMVERSION in mpi.h on Windows?

$
0
0

Hi,

we've recently moved to Intel 19 Update 3 MPI and noticed that the MPICH_VERSION macro in mpi.h is wrong/confusing on Windows and is also different than on Linux:

2019.3.203/intel64/include/mpi.h:504 (windows)
#define MPICH_VERSION "0.0"
#define MPICH_NUMVERSION 300

on Linux:

#define MPICH_VERSION "3.3b3"
#define MPICH_NUMVERSION 30300103

The latter case is much more reasonable and works fine, while the first case seems to be broken.

In the end, that 3 digit version number makes PETSc assume that it is using an old MPICH version and therefore compares "0.0" with MPI_Get_library_version(), which will eventually fail on Windows.
https://bitbucket.org/petsc/petsc/src/f03f29e6b9f50a9f9419f7d348de13f7c6...

Can you please fix the version number of windows?

Thank you and kind regards,

Volker Jacht

Can not find impi.dll on Windows

$
0
0

Hello again,

I have Intel MPI 2019 Update 3 SDK installed on Windows and set up a "hello world" example like this:
https://software.intel.com/en-us/mpi-developer-guide-windows-configuring...

But when I try to start my program with mpiexec.exe, it can not find impi.dll.

After some research I noticed that in prior versions impi.dll was located in the same folder as mpiexec (C:\Program Files (x86)\IntelSWTools\mpi\2019.3.203\intel64\bin) and it works right out of the box, but in the current version it seems to be missing?

I have also noticed that all the library symlinks in C:\Program Files (x86)\IntelSWTools\mpi\2019.3.203\intel64\lib\ are missing, too.

Thanks for your help and kind regards,

Volker Jacht

I_MPI_STARTUP_MODE documentation is missing

$
0
0

When trying the new release of Intel(R) MPI Library for Linux* OS, Version 2019 Update 3 Build 20190214 I has found that the enviroment variable I_MPI_WAIT_MODE is not supported anymore. The run-time diagnostics suggests  the new variable I_MPI_STARTUP_MODE as a substitute. However, I could not find any documentation on the new variable except the mentioning in the release notes that it was introduced. This is my notice to those people who in charge of the Intel MPI documentation.

While waiting for the documentation update, could anybody shed a light on possible values  for the new varaible I_MPI_STARTUP_MODE?

 


Seg fault in fortran MPI_COMM_CREATE_GROUP, works with Open MPI and MPICH

$
0
0

I'm having a segmentation fault that I can not really understand in a simple code, that just:

  • calls the MPI_INIT
  • duplicates the global communicator, via MPI_COMM_DUP
  • creates a group with half of processes of the global communicator, via MPI_COMM_GROUP
  • finally from this group creates a new communicator via MPI_COMM_CREATE_GROUP

Specifically I use this last call, instead of just using MPI_COMM_CREATE, because it's only collective over the group of processes contained in group, while MPI_COMM_CREATE is collective over every process in COMM. The code is attached.

If instead of duplicating the COMM_WORLD, I directly create the group from the global communicator (commented line), everything works just fine.

The parallel debugger I'm using traces back the seg fault to a call to MPI_GROUP_TRANSLATE_RANKS, but, as far as I know, the MPI_COMM_DUP duplicates all the attributes of the copied communicator, ranks numbering included.

I am using the ifort version 18.0.5, but I also tried with the 17.0.4, and 19.0.2 with no better results.
On the contrary, using Open MPI and MPICH 3.3 this program is working jsut fine.

AdjuntoTamaño
Descargarapplication/octet-streammpi_comm_create_group.F901021 bytes

bug: mpiexec segmentation fault

$
0
0

Hello,

Starting from Parallel Studio 2019 Update 1, mpiexec fails to run any executable. Example: "mpiexec -np 1 /bin/ls". Any call to mpiexec (except calls like "mpiexec -help") results in Segmentation fault.

Please help. I can provide additional information if necessary. However testing is a bit complicated because I had to revert to the Initial release, and Updates cannot be installed concurrently AFAIK, so please request testing only if absolutely necessary.

 

Note 1: It is on Linux Mint 19. As you may know, this distribution is heavily based on Ubuntu 18.04. By "heavily" I mean that only cosmetic packages differ, like the desktop environment packages. System packages (libc and the like) are taken directly from the Ubuntu repositories.

Note 2: This problem was originally reported in the C++ compiler forum, here. It was spotted on Opensuse (which shares most code with SLES, a distribution completely independent of Ubuntu).

MPI spawn placement of processes

$
0
0

Hi

I am trying to spawn processes across nodes using intel mpi with the following code:

testmanager.py:

 

from mpi4py import MPI
import mpi4py
import sys
import argparse
import os
import distutils.spawn

def check_mpi():
    mpiexec_path, _ = os.path.split(distutils.spawn.find_executable("mpiexec"))
    for executable, path in mpi4py.get_config().items():
        if executable not in ['mpicc', 'mpicxx', 'mpif77', 'mpif90', 'mpifort']:
             continue
        if mpiexec_path not in path:
             raise ImportError("mpi4py may not be configured against the same version of 'mpiexec' that you are using. The 'mpiexec' path is {mpiexec_path} and mpi4py.get_config() returns:\n{mpi4py_config}\n".format(mpiexec_path=mpiexec_path, mpi4py_config=mpi4py.get_config()))
#        if 'Open MPI' not in MPI.get_vendor():
#           raise ImportError("mpi4py must have been installed against Open MPI in order for StructOpt to function correctly.")
        vendor_number = ".".join([str(x) for x in MPI.get_vendor()[1]])
        if vendor_number not in mpiexec_path:
           print(MPI.get_vendor(), mpiexec_path)
        print(MPI.get_vendor(), mpiexec_path)

           #raise ImportError("The MPI version that mpi4py was compiled against does not match the version of 'mpiexec'. mpi4py's version number is {}, and mpiexec's path is {}".format(MPI.get_vendor(), mpiexec_path))



def main():
#    parser = argparse.ArgumentParser()
#    parser.add_argument('worker_count', type=int)
    worker_count = 20
#    args = parser.parse_args()
    check_mpi()
    mpi_info = MPI.Info.Create()
    mpi_info.Set("add-hostfile", "slurm.hosts")
    mpi_info.Set("host", "slurm.hosts")

    #print("about to spawn")
    comm = MPI.COMM_SELF.Spawn(sys.executable,
                               args=['testworker.py'], maxprocs=worker_count,
                               info=mpi_info).Merge()
    process_rank = comm.Get_rank()
    process_count = comm.Get_size()
    process_host = MPI.Get_processor_name()
    print('manager',process_rank, process_count, process_host)

main()

testworker.py:

from mpi4py import MPI

def main():
    print("Spawned")
    comm = MPI.Comm.Get_parent().Merge()

    process_rank = comm.Get_rank()
    process_count = comm.Get_size()
    process_host = MPI.Get_processor_name()

    print('worker', process_rank,process_count,process_host)

main()

 

I would like to know how to distribute the spawned processes, as when I run the job as:

mpirun -hostfile slurm.hosts -np 1 python3 ./testmanager.py
 

with, for example, the following slurm.hosts:

 

node-105:16
node-114:16
node-127:16

I end up with the manager running on a single process on node-105, and the workers running on the other nodes. If I increase the number of workers beyond that of the total number of slots in the non-manager nodes then the job hangs. I want to be able to run on all available slots on the three nodes.

Thanks!

 

Quantum Espresso job dies silently

$
0
0

Hi,

 

Summary

While testing scalability of the Quantum Espresso HPC software package, I stumbled on a very strange and annoying problem: when the jobs are run on too many cores, they undergo "sudden death" at some point. "Sudden death" means the job stops with no error message at all, and no core dump. "Too many" and "some point" mean: if the job is run with parallelization parameters above a given limit, it will stop during a given cycle; the higher the parameters, the sooner it will stop. The -np option is the most influential parameter.

Details

I'm compiling and running the PWscf software from Quantum Espresso 6.2.1. The server is one NUMA node with 4 sockets equipped with "Intel(R) Xeon(R) Platinum 8168 CPU @ 2.70GHz" and, initially, 1 DIMM of 16 GB on channel 0 of each socket.

Initially I used Parallel Studio 2019 Initial Release to compile and run, with only Intel MPI parallelization (no threading). Years ago we added these variables to the starting script: OMPI_MCA_mpi_yield_when_idle=1 OMPI_MCA_mpi_paffinity_alone=1 OMP_NUM_THREADS=1 (I guess they're irrelevant, but they're here).

To ensure isolation of the PWscf tasks from other software running on the node, I use the "cset" package tools (Linux Mint 19, clone of Ubuntu 18.04). Sockets 1-3 are devoted to PWscf, socket 0 is devoted to OS and other software. All tests are done with a multiple-of-3 number N of tasks, globally tied to N cores evenly distributed on sockets 1-3 (I don't do anything special to bind each task to a specific core).

I have a reproducible test case: repetition of a few examples showed that the running time is precise to approx. ±10s (it ranges from approx. 1h30 to 3h depending on N) for jobs that complete. For failing jobs, the failure always occurs in the same cycle, ±1, for given values of the -np mpirun option and of the -ndiag PWscf option.

Test results

First series: all jobs with -np values ≥51 fail, irrespective of -ndiag; they fail in the 16th or 17th cycle, irrespective of -ndiag.

We then upgraded the node to 3 DIMMs of 16, 8, and 8 GB on channels 0, 1, and 2 of each socket.

Second series: timings are better (confirming our hypothesis of bandwidth limiting performance, and of our provider misconfiguring the node by populating only 1 DIMM per socket). Jobs complete with -np values up to 54 or 57, depending on the -ndiag value. Failing jobs fail sooner when -np is higher. E.g. -np 57 leads to failure in the 59th cycle, -np 72 to failure in the 44th cycle (for a given -ndiag).

Thus it seems like the problem has to do with the amount of memory.

I then thought I'd try with an updated Parallel Studio, but stumbled on this bug I reported. With the workaround suggested there (I_MPI_HYDRA_TOPOLIB=ipl), I ran a third series, PWscf compiled with PS 2019 update 3: things are worse, the program outputs nothing at all, although the tasks use 100% CPU!

Fourth series: PWscf compiled with PS 2019 update 1: with I_MPI_HYDRA_TOPOLIB=ipl the program runs as usual, but the sudden deaths are still here and now they look quite unpredictable (yet still reproducible): -np 33 fails but -np 57 (or 60) completes (for a given -ndiag). I haven't done extensive tests for all -np values, but I'm not very inclined to do so since this series seems to have worse outcomes.

 

Please advise on what to do next. Thanks.

Intel MPI library

$
0
0

Dear, all. 
I can't handle follow issue. I was linking GAMESS with intel mpi, and received the follow error massage: 
ld: cannot find -lmpigf 
ld: cannot find -lmpigi 
Please, help me solve this problem.

Before it I had installed Intel parallel studio.

Default switch-over limits for Intel MPI Library 2019

$
0
0

Hi,
The Intel MPI library 2019 uses PSM 2.x interface for Omni-Path fabric and PSM 1.x interface for QDR fabric. For very smaller message size, there is a switch-over in MPI implementation for Omni-Path network, but not for Infiniband interconnect network? Further, what are the default eager limits for intra-node and inter-node communication for Intel 2019? And specific control variables to tune these values? Thanks

Mpiexec issue order of machines

$
0
0

Hi all,

I am having an issue with mpiexec, I have a bundled install with Fire Dynamics Simulator (FDS) and I am attempting to run a simple hello world script that is bundled with FDS called test_mpi link: https://github.com/firemodels/fds/blob/master/Utilities/test_mpi/test_mp...

The issue I have is if I run:

 

'mpiexec -hosts 2 non-local-machine 1 local-machine 1 test_mpi'

I get the hello work with the rank, however, if I swap such that the local-machine is first, I only get the localhost machine reported with the non-local-machine never replying.

Should this be an expected result or is there an issue somewhere?


Wrong limit on disp parameter in ILP64 version of MPI_File_set_view?

$
0
0

Hi,

Both with some older (17.0.2) and the newest (19.0.3) Intel Fortran Compiler + Intel MPI I experience problems with the function MPI_File_set_view when using the 64-bit integer Fortran interface of Intel MPI (aka ILP64, using compiler switch -i8).

Whenever I set the write offset argument "disp" of this function outside of the int32 range, the call fails with the error code 201389836. However, the argument is of kind MPI_OFFSET_KIND, so it is supposed to support large values without any problems.

Curiously, this failure happens only with the ILP64 version, not with the LP64 version, which is the other way round than one would expect. As if there was an erroneous range check somewhere in the ILP64 interface before calling the underlying LP64 implementation, which actually supports large offsets.

Below is an example program that demonstrates the issue. The program writes an exactly 2-GiB integer array to a file, starting at offset 0. Then it attempts to position the next writing view at the end of the just written chunk and write one extra integer. It works well with Open MPI 4.0.0 ILP64 and Intel MPI 17.0.2/19.0.3 LP64 (prints 0) but fails with Intel MPI 17.0.2/19.0.3 ILP64 (prints 201389836). NB: This sample program is intended to be executed in single process only.

Did I hit a bug in Intel MPI?

 

program mpi_io_offset

    use iso_fortran_env, only: int32
    use mpi

    implicit none

    integer(int32), parameter :: mpiint = kind(MPI_COMM_WORLD)
    integer(int32), parameter :: mpiofs = MPI_OFFSET_KIND

    integer(mpiint) :: ierr, fh, stat(MPI_STATUS_SIZE), one = 1, num = 2**29
    integer(mpiofs) :: zero = 0, two_GiB_bytes

    integer(int32)              :: four_B_int = -1
    integer(int32), allocatable :: two_GiB_array(:)

    allocate (two_GiB_array(num))
    two_GiB_array(:) = 0
    two_GiB_bytes = num * 4_mpiofs

    call MPI_Init(ierr)
    call MPI_File_open(MPI_COMM_WORLD, 'file.bin', MPI_MODE_CREATE + MPI_MODE_WRONLY, MPI_INFO_NULL, fh, ierr)
    call MPI_File_set_size(fh, zero, ierr)
    call MPI_File_set_view(fh, zero, MPI_INTEGER4, MPI_INTEGER4, 'native', MPI_INFO_NULL, ierr)
    call MPI_File_write_all(fh, two_GiB_array, num, MPI_INTEGER4, stat, ierr)
    call MPI_File_set_view(fh, two_GiB_bytes, MPI_INTEGER4, MPI_INTEGER4, 'native', MPI_INFO_NULL, ierr)

    print *, ierr

    call MPI_File_write_all(fh, four_B_int, one, MPI_INTEGER4, stat, ierr)
    call MPI_File_close(fh, ierr)
    call MPI_Finalize(ierr)

end program mpi_io_offset

 

ITAC Tool Not Tracing ISend and IRecv MPI Messages

$
0
0

Hi, I am using Intel ITAC 2019.3.032, and testing the NAS parallel benchmarks with MPI. It appears that the ITAC tool is not collecting message logs for non-blocking send/receive (I_Send/I_Recv). The message profile simply is blank. When I run programs which have blocking send/receive, the message profile is created with data. When I run the program, the following warning is generated:

 [0] Intel(R) Trace Collector WARNING: message logging: Intel(R) Trace Collector could not find pairs for 19272 (50.0%) sends and 19272 (50.0%) receives

Is there a way to get this to log the non-blocking sends/receives? 

Furthermore - is there a way to log all messages with timestamps, size etc, instead of just getting a summary of total count and max/min/avg?

Thank you!

Intel MPI 2018.4 error

$
0
0

Hi 

    Is there any way of diagnosing what might be causing the following error? 

PANIC in ../../src/mpid/ch3/channels/nemesis/netmod/ofa/cm/dapl/common/dapl_evd_cq_async_error_callb.c:71:dapl_evd_cq_async_error_callback
NULL == context
 

 Intel MPI 2018.4 run using release_mt version of libmpi.so

 I_MPI_FABRICS=shm:ofa

 Running with MPI_THREAD_MULTIPLE on Centos 7.2 with mlx_5 hardware

Thanks

 Jamil

 

Internal MPI Errors

$
0
0

Hi all,

I'm using Intel MPI Library 2017 Update 1 (v.2017.1.143) for Windows on Windows Server 2012 R2 Standard 64-bit nodes. I'm using 2 identical nodes and each node has the following specs:

CPU:  Intel Xeon CPU E5-2450 v2 @ 2.50GHz

RAM: DDR3 49086 MBytes Triple Channels (800 Mhz.)

GPU: NVIDIA Tesla K40c (driver version 24.21.14.1229)

Network: Mellanox ConnectX-3 Pro Ethernet Adapter (2)

  • Driver: Mellanox Infiniband 40Gbit ConnectX 3 Pro HBA driver, Version 5.10 (MLNX_VPI_WinOF-5_10_All_win2012R2_x64.exe)

I'm using fabrics as dapl:dapl or shm:tcp. Dapl version is "DAPL-ND - DAPL NetworkDirect Stand Alone installer v1.4.5    [06-02-2016]"

When using shm:tcp, I'm getting "read from socket error". Here is the full trace:

mpiexec -l -genv I_MPI_FABRICS shm:tcp -genv I_MPI_PIN_DOMAIN=omp -genv I_MPI_WAIT_MODE=1 -genv I_MPI_DEBUG=1000 -n 1 -host 10.0.0.1 ../ped/Release/Cape.exe master --do stitching --dted --exposure --config-path ..\input\cape\cape_fl_veo_50_dual.xml --mqtt-ip 10.4.1.121 --mqtt-port 1883 --mqtt-id CAPE50 --dds-id 121 -i 10.4.1.* --gcp-db-ip 10.4.1.122 --group-name EO50 : -n 1 -host 10.0.0.1 ../ped/Release/Cape.exe slave --do stabilization --group-name EO50 : -n 1 -host 10.0.0.1 ../ped/Release/Cape.exe slave --do bgsubtraction --group-name EO50 : -n 1 -host 10.0.0.1 ../ped/Release/Cape.exe slave --do tracking --group-name EO50 : -n 1 -host 10.0.0.2 ../ped/Release/Cape.exe slave -d -o dds file --group-name EO50 : -n 1 -host 10.0.0.1 ../ped/Release/Cape.exe master --do stitching --exposure --config-path ..\input\cape\cape_fl_veo_100_dual.xml --mqtt-ip 10.4.1.121 --mqtt-port 1883 --mqtt-id CAPE100 --dds-id 121 -i 10.4.1.* --gcp-db-ip 10.4.1.122 --group-name EO100 : -n 1 -host 10.0.0.1 ../ped/Release/Cape.exe slave --do stabilization --group-name EO100 : -n 1 -host 10.0.0.1 ../ped/Release/Cape.exe slave --do bgsubtraction --group-name EO100 : -n 1 -host 10.0.0.1 ../ped/Release/Cape.exe slave --do tracking --group-name EO100 : -n 1 -host 10.0.0.2 ../ped/Release/Cape.exe slave -d -o dds file --group-name EO100 : -n 1 -host 10.0.0.1 ../ped/Release/CameraController.exe --config-path ..\input\cc\cc_veo.xml --mqtt-ip 10.4.1.121 --mqtt-port 1883 --mqtt-id CC : -n 1 -host 10.0.0.1 ../ped/Release/CameraGroupProxy.exe --config-path ..\input\cgp\configuration.xml --mqtt-ip 10.4.1.121 --mqtt-port 1883 --mqtt-id CGP : -n 1 -host 10.0.0.2 ../ped/Release/GroupMetadataSynchronizer.exe --config-path ..\input\gms\configuration.xml --mqtt-ip 10.4.1.121 --mqtt-port 1883 --mqtt-id GMS --dds-reader-id 121 --dds-reader-allow-interface 10.4.1.* --dds-writer-id 122 --dds-writer-allow-interface 10.4.1.*
[11] WARNING: Logging before InitGoogleLogging() is written to STDERR
[11] I0430 19:24:42.494819 13396 ArgParser.cpp:66] CGP MQTT IP set to 10.4.1.121
[11] I0430 19:24:42.494819 13396 ArgParser.cpp:73] CGP MQTT Port set to 1883
[11] I0430 19:24:42.494819 13396 ArgParser.cpp:80] CGP MQTT ID set to CGP
[10] WARNING: Logging before InitGoogleLogging() is written to STDERR
[10] I0430 19:24:42.501842 3068 ArgParser.cpp:93] CC Config path set to ..\input\cc\cc_veo.xml
[10] I0430 19:24:42.504853 3068 ArgParser.cpp:181] CC MQTT IP set to 10.4.1.121
[10] I0430 19:24:42.504853 3068 ArgParser.cpp:188] CC MQTT Port set to 1883
[10] I0430 19:24:42.504853 3068 ArgParser.cpp:195] CC MQTT ID set to CC
[10] I0430 19:24:42.504853 3068 Executor.cpp:53] initMpi
[2] WARNING: Logging before InitGoogleLogging() is written to STDERR
[3] WARNING: Logging before InitGoogleLogging() is written to STDERR
[3] W0430 19:24:42.506860 9924 ArgParser.cpp:81] No end process provided. Setting cape to execute only start process: tracking
[3] I0430 19:24:42.506860 9924 ArgParser.cpp:406] Group name set to EO50
[6] WARNING: Logging before InitGoogleLogging() is written to STDERR
[6] W0430 19:24:42.506860 5404 ArgParser.cpp:81] No end process provided. Setting cape to execute only start process: stabilization
[6] I0430 19:24:42.506860 5404 ArgParser.cpp:406] Group name set to EO100
[6] I0430 19:24:42.506860 5404 Executor.cpp:53] initMpi
[2] W0430 19:24:42.506860 16752 ArgParser.cpp:81] No end process provided. Setting cape to execute only start process: bgsubtraction
[2] I0430 19:24:42.506860 16752 ArgParser.cpp:406] Group name set to EO50
[2] I0430 19:24:42.506860 16752 Executor.cpp:53] initMpi
[3] I0430 19:24:42.506860 9924 Executor.cpp:53] initMpi
[5] WARNING: Logging before InitGoogleLogging() is written to STDERR
[8] WARNING: Logging before InitGoogleLogging() is written to STDERR
[8] W0430 19:24:42.506860 11428 ArgParser.cpp:81] No end process provided. Setting cape to execute only start process: tracking
[8] I0430 19:24:42.506860 11428 ArgParser.cpp:406] Group name set to EO100
[8] I0430 19:24:42.506860 11428 Executor.cpp:53] initMpi
[1] WARNING: Logging before InitGoogleLogging() is written to STDERR
[1] W0430 19:24:42.506860 14572 ArgParser.cpp:81] No end process provided. Setting cape to execute only start process: stabilization
[1] I0430 19:24:42.506860 14572 ArgParser.cpp:406] Group name set to EO50
[5] W0430 19:24:42.506860 16108 ArgParser.cpp:81] No end process provided. Setting cape to execute only start process: stitching
[1] I0430 19:24:42.506860 14572 Executor.cpp:53] initMpi
[5] I0430 19:24:42.506860 16108 ArgParser.cpp:356] Config path is set to ..\input\cape\cape_fl_veo_100_dual.xml
[5] I0430 19:24:42.506860 16108 ArgParser.cpp:363] MQTT IP is set to 10.4.1.121
[5] I0430 19:24:42.506860 16108 ArgParser.cpp:370] MQTT Port is set to 1883
[5] I0430 19:24:42.506860 16108 ArgParser.cpp:377] MQTT ID is set to CAPE100
[5] I0430 19:24:42.506860 16108 ArgParser.cpp:384] DDS ID is set to 121
[5] I0430 19:24:42.506860 16108 ArgParser.cpp:391] DDS Allow Interface is set to 10.4.1.*
[5] I0430 19:24:42.506860 16108 ArgParser.cpp:398] GCP DB IP set to 10.4.1.122
[5] I0430 19:24:42.506860 16108 ArgParser.cpp:406] Group name set to EO100
[5] I0430 19:24:42.506860 16108 ArgParser.cpp:184] Exposure will be executed along with provided processes
[7] WARNING: Logging before InitGoogleLogging() is written to STDERR
[7] W0430 19:24:42.506860 1656 ArgParser.cpp:81] No end process provided. Setting cape to execute only start process: bgsubtraction
[7] I0430 19:24:42.506860 1656 ArgParser.cpp:406] Group name set to EO100
[5] I0430 19:24:42.506860 16108 Executor.cpp:53] initMpi
[7] I0430 19:24:42.506860 1656 Executor.cpp:53] initMpi
[0] WARNING: Logging before InitGoogleLogging() is written to STDERR
[0] W0430 19:24:42.506860 14212 ArgParser.cpp:81] No end process provided. Setting cape to execute only start process: stitching
[0] I0430 19:24:42.507864 14212 ArgParser.cpp:356] Config path is set to ..\input\cape\cape_fl_veo_50_dual.xml
[0] I0430 19:24:42.507864 14212 ArgParser.cpp:363] MQTT IP is set to 10.4.1.121
[0] I0430 19:24:42.507864 14212 ArgParser.cpp:370] MQTT Port is set to 1883
[0] I0430 19:24:42.507864 14212 ArgParser.cpp:377] MQTT ID is set to CAPE50
[0] I0430 19:24:42.507864 14212 ArgParser.cpp:384] DDS ID is set to 121
[0] I0430 19:24:42.507864 14212 ArgParser.cpp:391] DDS Allow Interface is set to 10.4.1.*
[0] I0430 19:24:42.507864 14212 ArgParser.cpp:398] GCP DB IP set to 10.4.1.122
[0] I0430 19:24:42.507864 14212 ArgParser.cpp:406] Group name set to EO50
[0] I0430 19:24:42.507864 14212 ArgParser.cpp:184] Exposure will be executed along with provided processes
[0] I0430 19:24:42.507864 14212 ArgParser.cpp:240] CAPE will attempt to calculate elevation matrix from dted file if any processes require it.
[0] I0430 19:24:42.507864 14212 Executor.cpp:53] initMpi
[12] WARNING: Logging before InitGoogleLogging() is written to STDERR
[12] I0430 19:24:42.523279 261596 ArgParser.cpp:56] GMS MQTT IP set to 10.4.1.121
[12] I0430 19:24:42.523279 261596 ArgParser.cpp:63] GMS MQTT Port set to 1883
[12] I0430 19:24:42.523279 261596 ArgParser.cpp:70] GMS MQTT ID set to GMS
[12] I0430 19:24:42.523279 261596 ArgParser.cpp:77] GMS DDS Reader Domain ID set to 121
[12] I0430 19:24:42.523279 261596 ArgParser.cpp:84] GMS DDS Writer Domain ID set to 122
[12] I0430 19:24:42.523279 261596 ArgParser.cpp:91] GMS DDS Allow Interface for Reader set to 10.4.1.*
[12] I0430 19:24:42.523279 261596 ArgParser.cpp:98] GMS DDS Allow Interface for Writer set to 10.4.1.*
[4] WARNING: Logging before InitGoogleLogging() is written to STDERR
[4] I0430 19:24:42.537328 260632 ArgParser.cpp:133] Dissemination Medium Type is set as: 'DDS+FILE'
[4] I0430 19:24:42.537328 260632 ArgParser.cpp:406] Group name set to EO50
[4] I0430 19:24:42.537328 260632 Executor.cpp:53] initMpi
[9] WARNING: Logging before InitGoogleLogging() is written to STDERR
[9] I0430 19:24:42.538331 259540 ArgParser.cpp:133] Dissemination Medium Type is set as: 'DDS+FILE'
[9] I0430 19:24:42.538331 259540 ArgParser.cpp:406] Group name set to EO100
[9] I0430 19:24:42.538331 259540 Executor.cpp:53] initMpi
[11] I0430 19:24:43.496381 13396 Manager.cpp:150] Application Mode : INITIALIZING published!
[11] I0430 19:24:43.496381 13396 Executor.cpp:53] initMpi
[12] I0430 19:24:43.524725 261596 GMSApplication.cpp:134] DDS Initialization ...
[12] I0430 19:24:44.087589 261596 GMSApplication.cpp:144] DDS Reader initialized ! [121 - 10.4.1.*][12]
[12] I0430 19:24:44.087589 261596 GMSApplication.cpp:145] DDS Writer initialized ! [122 - 10.4.1.*]
[12] I0430 19:24:44.087589 261596 GMSApplication.cpp:87] MPI Initialization ...
[12] I0430 19:24:44.087589 261596 Executor.cpp:53] initMpi
[0] [0] MPI startup(): Intel(R) MPI Library, Version 2017 Update 1 Build 20161016[0]
[0] [0] MPI startup(): Copyright (C) 2003-2016 Intel Corporation. All rights reserved.
[0] [0] MPI startup(): Multi-threaded optimized library
[12] [12] MPI startup(): shm and tcp data transfer modes
[4] [4] MPI startup(): shm and tcp data transfer modes[4]
[9] [9] MPI startup(): shm and tcp data transfer modes[2] [2] MPI startup(): shm and tcp data transfer modes[2]
[9]
[3] [3] MPI startup(): shm and tcp data transfer modes
[1] [1] MPI startup(): shm and tcp data transfer modes
[0] [0] MPI startup(): shm and tcp data transfer modes
[11] [11] MPI startup(): shm and tcp data transfer modes
[5] [5] MPI startup(): shm and tcp data transfer modes
[7] [7] MPI startup(): shm and tcp data transfer modes
[6] [6] MPI startup(): shm and tcp data transfer modes
[10] [10] MPI startup(): shm and tcp data transfer modes
[8] [8] MPI startup(): shm and tcp data transfer modes
[10] Fatal error in PMPI_Init_thread: Other MPI error, error stack:
[10] MPIR_Init_thread(805)......................: fail failed
[10] MPID_Init(1783)............................: channel initialization failed
[10] MPIDI_CH3_Init(147)........................: fail failed
[10] MPID_nem_tcp_post_init(351)................: fail failed
[10] MPID_nem_newtcp_module_connpoll(3116)......: fail failed
[10] recv_id_or_tmpvc_info_success_handler(1336): read from socket failed - No error

When using dapl:dapl I'm getting "MPIR_Init_Thread" error. Here is the full trace:

mpiexec -l -genv I_MPI_FABRICS dapl:dapl -genv I_MPI_PIN_DOMAIN=omp -genv I_MPI_WAIT_MODE=1           -n 1 -host 10.0.0.1 ../Cape-3.5.0-78-SNAPSHOT-windows-amd64-vc14/bin/Cape.exe master --do stitching --dted --exposure --config-path ..\input\cape\cape_fl_veo_50_dual.xml  --mqtt-ip 10.4.1.121  --mqtt-port 1883  --mqtt-id CAPE50  --dds-id 121  -i 10.4.1.*  --gcp-db-ip 10.4.1.122 --group-name EO50          : -n 1 -host 10.0.0.1 ../Cape-3.5.0-78-SNAPSHOT-windows-amd64-vc14/bin/Cape.exe slave --do stabilization --group-name EO50          : -n 1 -host 10.0.0.1 ../Cape-3.5.0-78-SNAPSHOT-windows-amd64-vc14/bin/Cape.exe slave --do bgsubtraction --group-name EO50          : -n 1 -host 10.0.0.1 ../Cape-3.5.0-78-SNAPSHOT-windows-amd64-vc14/bin/Cape.exe slave --do tracking --group-name EO50          : -n 1 -host 10.0.0.2 ../Cape-3.5.0-78-SNAPSHOT-windows-amd64-vc14/bin/Cape.exe slave -d -o dds file --group-name EO50          : -n 1 -host 10.0.0.1 ../Cape-3.5.0-78-SNAPSHOT-windows-amd64-vc14/bin/Cape.exe master --do stitching --exposure --config-path ..\input\cape\cape_fl_veo_100_dual.xml  --mqtt-ip 10.4.1.121  --mqtt-port 1883  --mqtt-id CAPE100  --dds-id 121  -i 10.4.1.*  --gcp-db-ip 10.4.1.122 --group-name EO100          : -n 1 -host 10.0.0.1 ../Cape-3.5.0-78-SNAPSHOT-windows-amd64-vc14/bin/Cape.exe slave --do stabilization --group-name EO100          : -n 1 -host 10.0.0.1 ../Cape-3.5.0-78-SNAPSHOT-windows-amd64-vc14/bin/Cape.exe slave --do bgsubtraction --group-name EO100          : -n 1 -host 10.0.0.1 ../Cape-3.5.0-78-SNAPSHOT-windows-amd64-vc14/bin/Cape.exe slave --do tracking --group-name EO100          : -n 1 -host 10.0.0.2 ../Cape-3.5.0-78-SNAPSHOT-windows-amd64-vc14/bin/Cape.exe slave -d -o dds file --group-name EO100          : -n 1 -host 10.0.0.1 ../CameraController-2.3.1-48-SNAPSHOT-windows-amd64-vc14/bin/CameraController.exe --config-path ..\input\cc\cc_veo.xml  --mqtt-ip 10.4.1.121  --mqtt-port 1883  --mqtt-id CC          : -n 1 -host 10.0.0.1 ../CameraGroupProxy-0.0.2-41-SNAPSHOT-windows-amd64-vc14/bin/CameraGroupProxy.exe --config-path ..\input\cgp\configuration.xml  --mqtt-ip 10.4.1.121  --mqtt-port 1883  --mqtt-id CGP          : -n 1 -host 10.0.0.2 ../GroupMetadataSynchronizer-0.0.2-42-SNAPSHOT-windows-amd64-vc14/bin/GroupMetadataSynchronizer.exe --config-path ..\input\gms\configuration.xml  --mqtt-ip 10.4.1.121  --mqtt-port 1883  --mqtt-id GMS  --dds-reader-id 121  --dds-reader-allow-interface 10.4.1.*  --dds-writer-id 122  --dds-writer-allow-interface 10.4.1.*
[11] WARNING: Logging before InitGoogleLogging() is written to STDERR
[11] I0503 09:44:51.214726  8132 ArgParser.cpp:66] CGP MQTT IP set to 10.4.1.121
[11] I0503 09:44:51.215728  8132 ArgParser.cpp:73] CGP MQTT Port set to 1883
[11] I0503 09:44:51.215728  8132 ArgParser.cpp:80] CGP MQTT ID set to CGP
[12] WARNING: Logging before InitGoogleLogging() is written to STDERR
[12] I0503 09:45:04.446153  7932 ArgParser.cpp:56] GMS MQTT IP set to 10.4.1.121
[12] I0503 09:45:04.446153  7932 ArgParser.cpp:63] GMS MQTT Port set to 1883
[12] I0503 09:45:04.447154  7932 ArgParser.cpp:70] GMS MQTT ID set to GMS
[12] I0503 09:45:04.447154  7932 ArgParser.cpp:77] GMS DDS Reader Domain ID set to 121
[12] I0503 09:45:04.447154  7932 ArgParser.cpp:84] GMS DDS Writer Domain ID set to 122
[12] I0503 09:45:04.447154  7932 ArgParser.cpp:91] GMS DDS Allow Interface for Reader set to 10.4.1.*
[12] I0503 09:45:04.447154  7932 ArgParser.cpp:98] GMS DDS Allow Interface for Writer set to 10.4.1.*
[9] WARNING: Logging before InitGoogleLogging() is written to STDERR
[4] WARNING: Logging before InitGoogleLogging() is written to STDERR
[4] I0503 09:45:04.471177  7412 ArgParser.cpp:133] Dissemination Medium Type is set as: 'DDS+FILE'
[4] I0503 09:45:04.471177  7412 ArgParser.cpp:406] Group name set to EO50
[9] I0503 09:45:04.471177  8356 ArgParser.cpp:133] Dissemination Medium Type is set as: 'DDS+FILE'
[9] I0503 09:45:04.471177  8356 ArgParser.cpp:406] Group name set to EO100
[12] I0503 09:45:05.447875  7932 GMSApplication.cpp:134] DDS Initialization ...
[12] I0503 09:45:06.021322  7932 GMSApplication.cpp:144] DDS Reader initialized ! [121 - 10.4.1.*]
[12] I0503 09:45:06.021322  7932 GMSApplication.cpp:145] DDS Writer initialized ! [122 - 10.4.1.*]
[12] I0503 09:45:06.021322  7932 GMSApplication.cpp:87] MPI Initialization ...
[11] I0503 09:44:53.217674  8132 Manager.cpp:150] Application Mode : INITIALIZING published!
[4] dapls_ib_get_dto_status() Unknown NT Error 0xc000021b? ret DAT_INTERNAL_ERR
[5] [5:10.0.0.1] unexpected DAPL event 0x4005
[5] Fatal error in PMPI_Init_thread: Internal MPI error!, error stack:
[5] MPIR_Init_thread(805): fail failed
[5] MPID_Init(1783)......: channel initialization failed
[5] MPIDI_CH3_Init(147)..: fail failed
[5] (unknown)(): Internal MPI error!

Any ideas?

Intelpython3

$
0
0

Hi everyone,

I'm trying desperately to install intelpython3 without success. I followed this page:
https://software.intel.com/en-us/articles/installing-intel-free-libs-and...

I'm getting various conflicts, e.g.

file /opt/intel/intelpython3/ReleaseNotes.txt from install of intelpython3-2019.3-075.x86_64 conflicts with file from package intel-python3-psxe-2019-2019.0-045.noarch

What can I do to remove these conflicts and finally install it.

Viewing all 952 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>