OpenMPI 3.1.X 이슈 사항

슈퍼컴퓨팅인프라센터 2019. 4. 30. 10:02

KISTI 슈퍼컴퓨터센터의 OpenMPI 3.1.X 이슈 사항 해결 팁에 대하여 소개한다.

가. 환경

  • 대상 시스템 : 뉴론

  • OS Version : Linux / CentOS 7.4

  • CPU : Intel Xeon E5-2670 v2

  • Mellanox OFED : MLNX_OFED_LINUX-4.4-2.0.7.0 (OFED-4.4-2.0.7)

  • MPI : OpenMPI-3.1.0

나. 오류 내용

[optpar02@login02 test]$ mpirun -np 2 ./host.x
ibv_exp_query_device: invalid comp_mask !!! (comp_mask = 0xffffffffffffffff valid_mask = 0x1)
[login02][[51376,1],0][btl_openib_component.c:1670:init_one_device] error obtaining device attributes for mlx4_1 errno says Invalid argument
ibv_exp_query_device: invalid comp_mask !!! (comp_mask = 0xffffffffffffffff valid_mask = 0x1)
[login02][[51376,1],1][btl_openib_component.c:1670:init_one_device] error obtaining device attributes for mlx4_1 errno says Invalid argument
--------------------------------------------------------------------------
WARNING: There was an error initializing an OpenFabrics device.
Local host: login02
Local device: mlx4_1
--------------------------------------------------------------------------
Hello, World! I am process 0 of size 2 on login02
Hello, World! I am process 1 of size 2 on login02
[login02:22081] 1 more process has sent help message help-mpi-btl-openib.txt / error in device init
[login02:22081] Set MCA parameter "orte_base_help_aggregate" to 0 to see all help / error messages

다. 오류 해결 팁

  • openmpi-3.1.0/opal/mca/btl/openib/btl_openib_component.c 파일 1666라인 수정

  • 참고https://github.com/hppritcha/ompi/commit/8126779a354b3e0c720d3e1790f7b936dd5b93b2

[수정 전]
#if HAVE_DECL_IBV_EXP_QUERY_DEVICE
device->ib_exp_dev_attr.comp_mask = IBV_EXP_DEVICE_ATTR_RESERVED - 1;
[수정 후]
#if HAVE_DECL_IBV_EXP_QUERY_DEVICE
memset(&device->ib_exp_dev_attr, 0, sizeof(device->ib_exp_dev_attr));
device->ib_exp_dev_attr.comp_mask = IBV_EXP_DEVICE_ATTR_RESERVED - 1;

Last updated