MV APICH21.0.3User Guide
MVAPICH Team
Network-Based Computing Laboratory Department of Computer Science and Engineering The Ohio State University
mvapich.cse.ohio-state.edu
Copyright c 2003-2008
Network-Based Computing Laboratory,
headed by Dr.D.K.Panda.
All rights reserved.
Last revised:July2,2008
Contents
1Overview of the Open-Source MV APICH Project1 2How to use this User Guide?1 3MV APICH21.0Features2
4Installation Instructions6
4.1Download MVAPICH2source code (6)
4.2Prepare MVAPICH2source code (6)
4.3Downloading MVAPICH2Source Code from Anonymous SVN (6)
4.4Build MVAPICH2 (7)
4.4.1Build MVAPICH2with OpenFabrics Gen2-IB and iWARP (7)
4.4.2Build MVAPICH2with uDAPL (8)
4.4.3Build MVAPICH2with VAPI (9)
4.4.4Build MVAPICH2with TCP/IPoIB (10)
5Basic Usage Instructions11
5.1Compile MPI Applications (11)
5.2Setting MPD Environment (11)
5.3Run MPI Applications Using mpiexec with OpenFabrics Gen2-IB or VAPI Device.12
5.3.1Run MPI Applications using Shared Library Support (13)
5.3.2Run MPI Application using TotalView Debugger Support (13)
5.4Run MPI Application using OpenFabrics Gen2-iWARP Device (14)
5.5Run MPI Application using mpiexec with uDAPL Device (14)
5.6Run MPI Application using mpiexec with TCP/IP (15)
5.6.1The MPI Job Uses IPoIB (15)
5.6.2Both MPD And the MPI Job Use IPoIB (15)
5.7Run MPI Applications using SLURM (16)
6Advanced Usage Instructions17
i
6.1Run MPI applications on Multi-Rail Configurations(for OpenFabrics Gen2-IB and
Gen2-iWARP Devices) (17)
6.2Run MPI application with Customized Optimizations(for OpenFabrics Gen2-IB and
Gen2-iWARP Devices) (17)
6.3Run MPI application with Checkpoint/Restart Support(for OpenFabrics Gen2-IB
Device) (18)
6.4Run MPI application with RDMA CM support(for OpenFabrics Gen2-IB and Gen2-
iWARP Devices) (20)
6.5Run MPI application with Shared Memory Collectives (21)
6.6Run MPI Application with Hot-Spot and Congestion Avoidance(for OpenFabrics
Gen2-IB Device) (21)
6.7Run MPI Application with Network Fault Tolerance Support(for OpenFabrics Gen2-
IB Device) (22)
7Using OSU Benchmarks23
8F AQ and Troubleshooting with MV APICH224
8.1General Questions and Troubleshooting (24)
8.1.1Invalid Communicators Error (24)
8.1.2Are fork()and system()supported? (24)
8.1.3Cannot Build with the PathScale Compiler (24)
8.1.4Cannotfif (24)
8.1.5sched setaffinity:Bad address (25)
8.1.6Multi-threaded programs seem to run sequentially (25)
8.1.7Running MPI programs built with gfortran (25)
8.2With Gen2Interface (25)
8.2.1Cannot Open HCA (25)
8.2.2Checking state of IB Link (25)
8.2.3Undefined reference to ibv get device list (26)
8.2.4Creation of CQ or QP failure (26)
8.2.5Hang with the HSAM Functionality (26)
8.2.6Failure with Automatic Path Migration (26)
8.2.7Error openingfile (26)
ii
8.2.8RDMA CM Address error (26)
8.2.9RDMA CM Route error (27)
8.3With Gen2-iWARP Interface (27)
8.3.1Error openingfile (27)
8.3.2RDMA CM Address error (27)
8.3.3RDMA CM Route error (27)
8.4With VAPI Interface (27)
8.4.1Cannot pass MPI Init (27)
8.4.2Cannot Open HCA (27)
8.4.3Cannot include vapi.h (28)
8.4.4VAPI RETRY EXEC ERROR (28)
8.4.5ld:multiple definitions of symbol calloc error on MacOS (28)
8.4.6No Fortran interface on the MacOS platform (28)
8.5With UDAPL Interface (29)
8.5.1Cannot Open IA (29)
8.5.2DAT Insufficient Resource (29)
8.5.3Cannotfind libdat.so (29)
8.5.4Cannotfif (29)
8.6The MPD mpiexec fails with“no msg recvd from mpd when expecting ack of request.”29
8.7Checkpoint/Restart (29)
9Scalable features for Large Scale Clusters and Performance Tuning31
9.1RDMA Based Point-to-Point tuning (31)
9.2Shared Receive Queue(SRQ)Tuning (31)
9.3Shared Memory Tuning (32)
8gen2发布会9.4On-demand Connection Management Tuning (32)
10MV APICH2Parameters33
10.1MV2CKPT FILE (33)
10.2MV2CKPT INTERVAL (33)
10.3MV2CKPT MAX SAVE CKPTS (33)
iii
10.4MV2CKPT MPD BASE PORT (34)
10.5MV2CKPT MPIEXEC PORT (34)
10.6MV2CKPT NO SYNC (34)
10.7MV2CM RECV BUFFERS (34)
10.8MV2CM SPIN COUNT (35)
10.9MV2CM TIMEOUT (35)
10.10MV2DAPL PROVIDER (35)
10.11MV2DEFAULT MTU (35)
10.12MV2ENABLE AFFINITY (36)
10.13MV2GET FALLBACK THRESHOLD (36)
10.14MV2IBA EAGER THRESHOLD (36)
10.15MV2INITIAL PREPOST DEPTH (37)
10.16MV2MPD RECVTIMEOUT MULTIPLIER (37)
10.17MV2NDREG ENTRIES (37)
10.18MV2NUM HCAS (37)
10.19MV2NUM PORTS (38)
10.20MV2NUM QP PER PORT (38)
10.21MV2NUM RDMA BUFFER (38)
10.22MV2ON DEMAND THRESHOLD (38)
10.23MV2PREPOST DEPTH (39)
10.24MV2PUT FALLBACK THRESHOLD (39)
10.25MV2RDMA CM ARP TIMEOUT (39)
10.26MV2RNDV PROTOCOL (39)
10.27MV2R3THRESHOLD (40)
10.28MV2R3NOCACHE THRESHOLD (40)
10.29MV2SHMEM COLL MAX MSG SIZE (40)
10.30MV2SHMEM COLL NUM COMM (40)
10.31MV2SRQ LIMIT (40)
10.32MV2SRQ SIZE (41)
10.33MV2USE APM (41)
iv
10.34MV2USE APM TEST (41)
10.35MV2USE BLOCKING (41)
10.36MV2USE COALESCE (42)
10.37MV2USE HSAM (42)
10.38MV2USE IWARP MODE (42)
10.39MV2USE LAZY MEM UNREGISTER (42)
10.40LAZY MEM UNREGISTER (43)
10.41MV2USE RDMA CM (43)
10.42MV2USE RDMA FAST PATH (43)
10.43MV2USE RDMA ONE SIDED (43)
10.44MV2USE RING STARTUP (44)
10.45MV2USE SHARED MEM (44)
10.46MV2USE SHMEM ALLREDUCE (44)
10.47MV2USE SHMEM BARRIER (44)
10.48MV2USE SHMEM COLL (44)
10.49MV2USE SHMEM REDUCE (45)
10.50MV2USE SRQ (45)
10.51MV2VBUF POOL SIZE (45)
10.52MV2VBUF SECONDARY POOL SIZE (45)
10.53MV2VBUF TOTAL SIZE (46)
10.54SMP EAGERSIZE (46)
10.55SMPI LENGTH QUEUE (46)
10.56SMP NUM SEND BUFFER (46)
10.57SMP SEND BUF SIZE (47)
v

版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系QQ:729038198,我们将在24小时内删除。