espnet2.train package¶
espnet2.train.preprocessor¶
-
class
espnet2.train.preprocessor.
CommonPreprocessor
(train: bool, token_type: str = None, token_list: Union[pathlib.Path, str, Iterable[str]] = None, bpemodel: Union[pathlib.Path, str, Iterable[str]] = None, text_cleaner: Collection[str] = None, g2p_type: str = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: str = None, rir_scp: str = None, rir_apply_prob: float = 1.0, noise_scp: str = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, speech_volume_normalize: float = None, speech_name: str = 'speech', text_name: str = 'text')[source]¶
-
class
espnet2.train.preprocessor.
CommonPreprocessor_multi
(train: bool, token_type: str = None, token_list: Union[pathlib.Path, str, Iterable[str]] = None, bpemodel: Union[pathlib.Path, str, Iterable[str]] = None, text_cleaner: Collection[str] = None, g2p_type: str = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: str = None, speech_name: str = 'speech', text_name: List[str] = ['text'])[source]¶
-
class
espnet2.train.preprocessor.
DynamicMixingPreprocessor
(train: bool, source_scp: str = None, ref_num: int = 2, dynamic_mixing_gain_db: float = 0.0, speech_name: str = 'speech_mix', speech_ref_name_prefix: str = 'speech_ref', mixture_source_name: str = None, utt2spk: str = None)[source]¶
-
class
espnet2.train.preprocessor.
EnhPreprocessor
(train: bool, rir_scp: str = None, rir_apply_prob: float = 1.0, noise_scp: str = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, speech_volume_normalize: float = None, speech_name: str = 'speech_mix', speech_ref_name_prefix: str = 'speech_ref', noise_ref_name_prefix: str = 'noise_ref', dereverb_ref_name_prefix: str = 'dereverb_ref', use_reverberant_ref: bool = False, num_spk: int = 1, num_noise_type: int = 1, sample_rate: int = 8000, force_single_channel: bool = False)[source]¶ Bases:
espnet2.train.preprocessor.CommonPreprocessor
Preprocessor for Speech Enhancement (Enh) task.
-
class
espnet2.train.preprocessor.
MutliTokenizerCommonPreprocessor
(train: bool, token_type: List[str] = [None], token_list: List[Union[pathlib.Path, str, Iterable[str]]] = [None], bpemodel: List[Union[pathlib.Path, str, Iterable[str]]] = [None], text_cleaner: Collection[str] = None, g2p_type: str = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: str = None, rir_scp: str = None, rir_apply_prob: float = 1.0, noise_scp: str = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, speech_volume_normalize: float = None, speech_name: str = 'speech', text_name: List[str] = ['text'])[source]¶
-
class
espnet2.train.preprocessor.
SLUPreprocessor
(train: bool, token_type: str = None, token_list: Union[pathlib.Path, str, Iterable[str]] = None, transcript_token_list: Union[pathlib.Path, str, Iterable[str]] = None, bpemodel: Union[pathlib.Path, str, Iterable[str]] = None, text_cleaner: Collection[str] = None, g2p_type: str = None, unk_symbol: str = '<unk>', space_symbol: str = '<space>', non_linguistic_symbols: Union[pathlib.Path, str, Iterable[str]] = None, delimiter: str = None, rir_scp: str = None, rir_apply_prob: float = 1.0, noise_scp: str = None, noise_apply_prob: float = 1.0, noise_db_range: str = '3_10', short_noise_thres: float = 0.5, speech_volume_normalize: float = None, speech_name: str = 'speech', text_name: str = 'text')[source]¶
-
espnet2.train.preprocessor.
detect_non_silence
(x: numpy.ndarray, threshold: float = 0.01, frame_length: int = 1024, frame_shift: int = 512, window: str = 'boxcar') → numpy.ndarray[source]¶ Power based voice activity detection.
- Parameters
x – (Channel, Time)
>>> x = np.random.randn(1000) >>> detect = detect_non_silence(x) >>> assert x.shape == detect.shape >>> assert detect.dtype == np.bool
espnet2.train.__init__¶
espnet2.train.class_choices¶
-
class
espnet2.train.class_choices.
ClassChoices
(name: str, classes: Mapping[str, type], type_check: type = None, default: str = None, optional: bool = False)[source]¶ Bases:
object
Helper class to manage the options for variable objects and its configuration.
Example:
>>> class A: ... def __init__(self, foo=3): pass >>> class B: ... def __init__(self, bar="aaaa"): pass >>> choices = ClassChoices("var", dict(a=A, b=B), default="a") >>> import argparse >>> parser = argparse.ArgumentParser() >>> choices.add_arguments(parser) >>> args = parser.parse_args(["--var", "a", "--var_conf", "foo=4") >>> args.var a >>> args.var_conf {"foo": 4} >>> class_obj = choices.get_class(args.var) >>> a_object = class_obj(**args.var_conf)
espnet2.train.collate_fn¶
-
class
espnet2.train.collate_fn.
CommonCollateFn
(float_pad_value: Union[float, int] = 0.0, int_pad_value: int = -32768, not_sequence: Collection[str] = ())[source]¶ Bases:
object
Functor class of common_collate_fn()
-
espnet2.train.collate_fn.
common_collate_fn
(data: Collection[Tuple[str, Dict[str, numpy.ndarray]]], float_pad_value: Union[float, int] = 0.0, int_pad_value: int = -32768, not_sequence: Collection[str] = ()) → Tuple[List[str], Dict[str, torch.Tensor]][source]¶ Concatenate ndarray-list to an array and convert to torch.Tensor.
Examples
>>> from espnet2.samplers.constant_batch_sampler import ConstantBatchSampler, >>> import espnet2.tasks.abs_task >>> from espnet2.train.dataset import ESPnetDataset >>> sampler = ConstantBatchSampler(...) >>> dataset = ESPnetDataset(...) >>> keys = next(iter(sampler) >>> batch = [dataset[key] for key in keys] >>> batch = common_collate_fn(batch) >>> model(**batch)
Note that the dict-keys of batch are propagated from that of the dataset as they are.
espnet2.train.reporter¶
Reporter module.
-
class
espnet2.train.reporter.
Average
(value: Union[float, int, complex, torch.Tensor, numpy.ndarray])[source]¶
-
class
espnet2.train.reporter.
Reporter
(epoch: int = 0)[source]¶ Bases:
object
Reporter class.
Examples
>>> reporter = Reporter() >>> with reporter.observe('train') as sub_reporter: ... for batch in iterator: ... stats = dict(loss=0.2) ... sub_reporter.register(stats)
-
check_early_stopping
(patience: int, key1: str, key2: str, mode: str, epoch: int = None, logger=None) → bool[source]¶
-
matplotlib_plot
(output_dir: Union[str, pathlib.Path])[source]¶ Plot stats using Matplotlib and save images.
-
observe
(key: str, epoch: int = None) → AbstractContextManager[espnet2.train.reporter.SubReporter][source]¶
-
-
class
espnet2.train.reporter.
SubReporter
(key: str, epoch: int, total_count: int)[source]¶ Bases:
object
This class is used in Reporter.
See the docstring of Reporter for the usage.
-
class
espnet2.train.reporter.
WeightedAverage
(value: Tuple[Union[float, int, complex, torch.Tensor, numpy.ndarray], Union[float, int, complex, torch.Tensor, numpy.ndarray]], weight: Union[float, int, complex, torch.Tensor, numpy.ndarray])[source]¶
-
espnet2.train.reporter.
aggregate
(values: Sequence[ReportedValue]) → Union[float, int, complex, torch.Tensor, numpy.ndarray][source]¶
espnet2.train.dataset¶
-
class
espnet2.train.dataset.
AdapterForSoundScpReader
(loader, dtype=None)[source]¶ Bases:
collections.abc.Mapping
-
class
espnet2.train.dataset.
ESPnetDataset
(path_name_type_list: Collection[Tuple[str, str, str]], preprocess: Callable[[str, Dict[str, numpy.ndarray]], Dict[str, numpy.ndarray]] = None, float_dtype: str = 'float32', int_dtype: str = 'long', max_cache_size: Union[float, int, str] = 0.0, max_cache_fd: int = 0)[source]¶ Bases:
espnet2.train.dataset.AbsDataset
Pytorch Dataset class for ESPNet.
Examples
>>> dataset = ESPnetDataset([('wav.scp', 'input', 'sound'), ... ('token_int', 'output', 'text_int')], ... ) ... uttid, data = dataset['uttid'] {'input': per_utt_array, 'output': per_utt_array}
espnet2.train.trainer¶
Trainer module.
-
class
espnet2.train.trainer.
Trainer
[source]¶ Bases:
object
Trainer having a optimizer.
If you’d like to use multiple optimizers, then inherit this class and override the methods if necessary - at least “train_one_epoch()”
>>> class TwoOptimizerTrainer(Trainer): ... @classmethod ... def add_arguments(cls, parser): ... ... ... ... @classmethod ... def train_one_epoch(cls, model, optimizers, ...): ... loss1 = model.model1(...) ... loss1.backward() ... optimizers[0].step() ... ... loss2 = model.model2(...) ... loss2.backward() ... optimizers[1].step()
-
classmethod
add_arguments
(parser: argparse.ArgumentParser)[source]¶ Reserved for future development of another Trainer
-
classmethod
build_options
(args: argparse.Namespace) → espnet2.train.trainer.TrainerOptions[source]¶ Build options consumed by train(), eval(), and plot_attention()
-
classmethod
plot_attention
(model: torch.nn.modules.module.Module, output_dir: Optional[pathlib.Path], summary_writer, iterator: Iterable[Tuple[List[str], Dict[str, torch.Tensor]]], reporter: espnet2.train.reporter.SubReporter, options: espnet2.train.trainer.TrainerOptions) → None[source]¶
-
static
resume
(checkpoint: Union[str, pathlib.Path], model: torch.nn.modules.module.Module, reporter: espnet2.train.reporter.Reporter, optimizers: Sequence[torch.optim.optimizer.Optimizer], schedulers: Sequence[Optional[espnet2.schedulers.abs_scheduler.AbsScheduler]], scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler], ngpu: int = 0)[source]¶
-
classmethod
run
(model: espnet2.train.abs_espnet_model.AbsESPnetModel, optimizers: Sequence[torch.optim.optimizer.Optimizer], schedulers: Sequence[Optional[espnet2.schedulers.abs_scheduler.AbsScheduler]], train_iter_factory: espnet2.iterators.abs_iter_factory.AbsIterFactory, valid_iter_factory: espnet2.iterators.abs_iter_factory.AbsIterFactory, plot_attention_iter_factory: Optional[espnet2.iterators.abs_iter_factory.AbsIterFactory], trainer_options, distributed_option: espnet2.train.distributed_utils.DistributedOption) → None[source]¶ Perform training. This method performs the main process of training.
-
classmethod
train_one_epoch
(model: torch.nn.modules.module.Module, iterator: Iterable[Tuple[List[str], Dict[str, torch.Tensor]]], optimizers: Sequence[torch.optim.optimizer.Optimizer], schedulers: Sequence[Optional[espnet2.schedulers.abs_scheduler.AbsScheduler]], scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler], reporter: espnet2.train.reporter.SubReporter, summary_writer, options: espnet2.train.trainer.TrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → bool[source]¶
-
classmethod
-
class
espnet2.train.trainer.
TrainerOptions
(ngpu: int, resume: bool, use_amp: bool, train_dtype: str, grad_noise: bool, accum_grad: int, grad_clip: float, grad_clip_type: float, log_interval: Union[int, NoneType], no_forward_run: bool, use_matplotlib: bool, use_tensorboard: bool, use_wandb: bool, output_dir: Union[pathlib.Path, str], max_epoch: int, seed: int, sharded_ddp: bool, patience: Union[int, NoneType], keep_nbest_models: Union[int, List[int]], nbest_averaging_interval: int, early_stopping_criterion: Sequence[str], best_model_criterion: Sequence[Sequence[str]], val_scheduler_criterion: Sequence[str], unused_parameters: bool, wandb_model_log_interval: int, create_graph_in_tensorboard: bool)[source]¶ Bases:
object
espnet2.train.gan_trainer¶
Trainer module for GAN-based training.
-
class
espnet2.train.gan_trainer.
GANTrainer
[source]¶ Bases:
espnet2.train.trainer.Trainer
Trainer for GAN-based training.
If you’d like to use this trainer, the model must inherit espnet.train.abs_gan_espnet_model.AbsGANESPnetModel.
-
classmethod
add_arguments
(parser: argparse.ArgumentParser)[source]¶ Add additional arguments for GAN-trainer.
-
classmethod
build_options
(args: argparse.Namespace) → espnet2.train.trainer.TrainerOptions[source]¶ Build options consumed by train(), eval(), and plot_attention().
-
classmethod
train_one_epoch
(model: torch.nn.modules.module.Module, iterator: Iterable[Tuple[List[str], Dict[str, torch.Tensor]]], optimizers: Sequence[torch.optim.optimizer.Optimizer], schedulers: Sequence[Optional[espnet2.schedulers.abs_scheduler.AbsScheduler]], scaler: Optional[torch.cuda.amp.grad_scaler.GradScaler], reporter: espnet2.train.reporter.SubReporter, summary_writer, options: espnet2.train.gan_trainer.GANTrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → bool[source]¶ Train one epoch.
-
classmethod
validate_one_epoch
(model: torch.nn.modules.module.Module, iterator: Iterable[Dict[str, torch.Tensor]], reporter: espnet2.train.reporter.SubReporter, options: espnet2.train.gan_trainer.GANTrainerOptions, distributed_option: espnet2.train.distributed_utils.DistributedOption) → None[source]¶ Validate one epoch.
-
classmethod
-
class
espnet2.train.gan_trainer.
GANTrainerOptions
(ngpu: int, resume: bool, use_amp: bool, train_dtype: str, grad_noise: bool, accum_grad: int, grad_clip: float, grad_clip_type: float, log_interval: Optional[int], no_forward_run: bool, use_matplotlib: bool, use_tensorboard: bool, use_wandb: bool, output_dir: Union[pathlib.Path, str], max_epoch: int, seed: int, sharded_ddp: bool, patience: Optional[int], keep_nbest_models: Union[int, List[int]], nbest_averaging_interval: int, early_stopping_criterion: Sequence[str], best_model_criterion: Sequence[Sequence[str]], val_scheduler_criterion: Sequence[str], unused_parameters: bool, wandb_model_log_interval: int, create_graph_in_tensorboard: bool, generator_first: bool)[source]¶ Bases:
espnet2.train.trainer.TrainerOptions
Trainer option dataclass for GANTrainer.
espnet2.train.abs_espnet_model¶
-
class
espnet2.train.abs_espnet_model.
AbsESPnetModel
[source]¶ Bases:
torch.nn.modules.module.Module
,abc.ABC
The common abstract class among each tasks
“ESPnetModel” is referred to a class which inherits torch.nn.Module, and makes the dnn-models forward as its member field, a.k.a delegate pattern, and defines “loss”, “stats”, and “weight” for the task.
If you intend to implement new task in ESPNet, the model must inherit this class. In other words, the “mediator” objects between our training system and the your task class are just only these three values, loss, stats, and weight.
Example
>>> from espnet2.tasks.abs_task import AbsTask >>> class YourESPnetModel(AbsESPnetModel): ... def forward(self, input, input_lengths): ... ... ... return loss, stats, weight >>> class YourTask(AbsTask): ... @classmethod ... def build_model(cls, args: argparse.Namespace) -> YourESPnetModel:
Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
abstract
forward
(**batch) → Tuple[torch.Tensor, Dict[str, torch.Tensor], torch.Tensor][source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
abstract
espnet2.train.iterable_dataset¶
Iterable dataset module.
-
class
espnet2.train.iterable_dataset.
IterableESPnetDataset
(path_name_type_list: Collection[Tuple[str, str, str]], preprocess: Callable[[str, Dict[str, numpy.ndarray]], Dict[str, numpy.ndarray]] = None, float_dtype: str = 'float32', int_dtype: str = 'long', key_file: str = None)[source]¶ Bases:
torch.utils.data.dataset.IterableDataset
Pytorch Dataset class for ESPNet.
Examples
>>> dataset = IterableESPnetDataset([('wav.scp', 'input', 'sound'), ... ('token_int', 'output', 'text_int')], ... ) >>> for uid, data in dataset: ... data {'input': per_utt_array, 'output': per_utt_array}
espnet2.train.distributed_utils¶
-
class
espnet2.train.distributed_utils.
DistributedOption
(distributed: bool = False, dist_backend: str = 'nccl', dist_init_method: str = 'env://', dist_world_size: Union[int, NoneType] = None, dist_rank: Union[int, NoneType] = None, local_rank: Union[int, NoneType] = None, ngpu: int = 0, dist_master_addr: Union[str, NoneType] = None, dist_master_port: Union[int, NoneType] = None, dist_launcher: Union[str, NoneType] = None, multiprocessing_distributed: bool = True)[source]¶ Bases:
object
-
dist_backend
= 'nccl'¶
-
dist_init_method
= 'env://'¶
-
dist_launcher
= None¶
-
dist_master_addr
= None¶
-
dist_master_port
= None¶
-
dist_rank
= None¶
-
dist_world_size
= None¶
-
distributed
= False¶
-
local_rank
= None¶
-
multiprocessing_distributed
= True¶
-
ngpu
= 0¶
-
-
espnet2.train.distributed_utils.
free_port
()[source]¶ Find free port using bind().
There are some interval between finding this port and using it and the other process might catch the port by that time. Thus it is not guaranteed that the port is really empty.
-
espnet2.train.distributed_utils.
get_local_rank
(prior=None, launcher: str = None) → Optional[int][source]¶
-
espnet2.train.distributed_utils.
get_master_addr
(prior=None, launcher: str = None) → Optional[str][source]¶
-
espnet2.train.distributed_utils.
get_node_rank
(prior=None, launcher: str = None) → Optional[int][source]¶ Get Node Rank.
Use for “multiprocessing distributed” mode. The initial RANK equals to the Node id in this case and the real Rank is set as (nGPU * NodeID) + LOCAL_RANK in torch.distributed.
espnet2.train.abs_gan_espnet_model¶
ESPnetModel abstract class for GAN-based training.
-
class
espnet2.train.abs_gan_espnet_model.
AbsGANESPnetModel
[source]¶ Bases:
espnet2.train.abs_espnet_model.AbsESPnetModel
,torch.nn.modules.module.Module
,abc.ABC
The common abstract class among each GAN-based task.
“ESPnetModel” is referred to a class which inherits torch.nn.Module, and makes the dnn-models “forward” as its member field, a.k.a delegate pattern. And “forward” must accept the argument “forward_generator” and Return the dict of “loss”, “stats”, “weight”, and “optim_idx”. “optim_idx” for generator must be 0 and that for discriminator must be 1.
Example
>>> from espnet2.tasks.abs_task import AbsTask >>> class YourESPnetModel(AbsGANESPnetModel): ... def forward(self, input, input_lengths, forward_generator=True): ... ... ... if forward_generator: ... # return loss for the generator ... # optim idx 0 indicates generator optimizer ... return dict(loss=loss, stats=stats, weight=weight, optim_idx=0) ... else: ... # return loss for the discriminator ... # optim idx 1 indicates discriminator optimizer ... return dict(loss=loss, stats=stats, weight=weight, optim_idx=1) >>> class YourTask(AbsTask): ... @classmethod ... def build_model(cls, args: argparse.Namespace) -> YourESPnetModel:
Initializes internal Module state, shared by both nn.Module and ScriptModule.
-
abstract
forward
(forward_generator: bool = True, **batch) → Dict[str, Union[torch.Tensor, Dict[str, torch.Tensor], int]][source]¶ Return the generator loss or the discrimiantor loss.
This method must have an argument “forward_generator” to switch the generator loss calculation and the discrimiantor loss calculation. If forward_generator is true, return the generator loss with optim_idx 0. If forward_generator is false, return the discrimiantor loss with optim_idx 1.
- Parameters
forward_generator (bool) – Whether to return the generator loss or the discrimiantor loss. This must have the default value.
- Returns
loss (Tensor): Loss scalar tensor.
stats (Dict[str, float]): Statistics to be monitored.
weight (Tensor): Weight tensor to summarize losses.
optim_idx (int): Optimizer index (0 for G and 1 for D).
- Return type
Dict[str, Any]
-
abstract