Distributed and Parallel Training Tutorials¶
Distributed training is a model training paradigm that involves spreading training workload across multiple worker nodes, therefore significantly improving the speed of training and model accuracy. While distributed training can be used for any type of ML model training, it is most beneficial to use it for large models and compute demanding tasks as deep learning.
There are a few ways you can perform distributed training in PyTorch with each method having their advantages in certain use cases:
Read more about these options in Distributed Overview.