espnet.distributed package¶
Initialize sub package.
espnet.distributed.__init__¶
Initialize sub package.
espnet.distributed.pytorch_backend.launch¶
This is a helper module for distributed training.
The code uses an official implementation of distributed data parallel launcher as just a reference. https://github.com/pytorch/pytorch/blob/v1.8.2/torch/distributed/launch.py One main difference is this code focuses on launching simple function with given arguments.
-
exception
espnet.distributed.pytorch_backend.launch.
MainProcessError
(*, signal_no)[source]¶ Bases:
multiprocessing.context.ProcessError
An error happened from main process.
Initialize error class.
-
property
signal_no
¶ Return signal number which stops main process.
-
property
-
exception
espnet.distributed.pytorch_backend.launch.
WorkerError
(*, msg, exitcode, worker_id)[source]¶ Bases:
multiprocessing.context.ProcessError
An error happened within each worker.
Initialize error class.
-
property
exitcode
¶ Return exitcode from worker process.
-
property
worker_id
¶ Return worker ID related to a process causes this error.
-
property
-
espnet.distributed.pytorch_backend.launch.
free_port
()[source]¶ Find free port using bind().
There are some interval between finding this port and using it and the other process might catch the port by that time. Thus it is not guaranteed that the port is really empty.