This tutorial introduces more advanced features of Fully Sharded Data Parallel (FSDP) as part of the PyTorch 1.12 release. To get familiar with FSDP, please refer to the FSDP getting started tutorial.
The tutorial’s main goal is to help build expertise on leveraging FSDP for distributed AI training and awaits upcoming addition of new videos to the series. PyTorch has launched a series of 10 free ...
There are two ways to save and load models with FSDP. The 5th FSDP tutorial goes through a notebook with one method — full_state_dict. This is a unique model-saving process that puts together models ...
AI - Machine Learning Blog が良かったのでまとめてみた。 Azure ML上での分散学習手法としてDDPとFSDPを活用. 1台のGPUノードから3台にスケールさせることで、ファインチューニング速度が3倍に向上. V100(16GBメモリ)を複数ノードで組み合わせ、70Bパラメータのモデルを ...
Finally slogged through the FSDP paper. My intuition about it (which may be wrong, please correct me) is that it doesn't really shard model state the way we normally think about sharding. Yes, a layer ...