Nccl Tutorials - Search News

NCCL Segfault when initializing a process group on an NVL72

We are trying to run distributed training with Torchtitan on a GB200 NVL72 cluster but using more than 10 trays (40 GPUs) fails with a NCCL segmentation fault. Using 10 and less trays works fine. We ...

GitHub

nccl-test performance with doca on 2 or 4 HGX hosts is not good.

Sanity test: - Running GDR perf-test (ib_send_bw --use_cuda -d mlx5_0) can get 393.69 GB/s - Running nccl-test all-reduce in a single host, get up to 479.70 GB/s out-of-place Bus-BW. Here are the ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

NCCL Segfault when initializing a process group on an NVL72

nccl-test performance with doca on 2 or 4 HGX hosts is not good.

Trending now