vak.config.prep.PrepConfig¶
- class vak.config.prep.PrepConfig(data_dir, output_dir, dataset_type, input_type, audio_format=None, spect_format=None, spect_params=None, annot_file=None, annot_format=None, labelset=None, audio_dask_bag_kwargs=None, train_dur=None, val_dur=None, test_dur=None, train_set_durs=None, num_replicates=None)[source]¶
Bases:
object
Class that represents
[vak.prep]
table of configuration file.- output_dir¶
Path to location where data sets should be saved. Default is None, in which case data sets are saved in the current working directory.
- Type:
- dataset_type¶
String name of the type of dataset, e.g., ‘frame_classification’. Dataset types are defined by machine learning tasks, e.g., a ‘frame_classification’ dataset would be used a
vak.models.FrameClassificationModel
model. Valid dataset types are defined asvak.prep.prep.DATASET_TYPES
.- Type:
- spect_format¶
format of files containg spectrograms as 2-d matrices. One of {‘mat’, ‘npy’}.
- Type:
- spect_params¶
Parameters for Short-Time Fourier Transform and post-processing of spectrograms. Instance of
vak.config.SpectParamsConfig
class. Optional, default is None.- Type:
vak.config.SpectParamsConfig, optional
- annot_format¶
format of annotations. Any format that can be used with the crowsetta library is valid.
- Type:
- annot_file¶
Path to a single annotation file. Default is None. Used when a single file contains annotations for multiple audio files.
- Type:
- labelset¶
of str or int, the set of labels that correspond to annotated segments that a network should learn to segment and classify. Note that if there are segments that are not annotated, e.g. silent gaps between songbird syllables, then vak will assign a dummy label to those segments – you don’t have to give them a label here. Value for
labelset
is converted to a Pythonset
usingvak.config.converters.labelset_from_toml_value
. See help for that function for details on how to specify labelset.- Type:
- audio_dask_bag_kwargs¶
Keyword arguments used when calling
dask.bag.from_sequence
insidevak.io.audio
, where it is used to parallelize the conversion of audio files into spectrograms. Option should be specified in config.toml file as an inline table, e.g.,audio_dask_bag_kwargs = { npartitions = 20 }
. Allows for finer-grained control when needed to process files of different sizes.- Type:
- train_dur¶
total duration of training set, in seconds. When creating a learning curve, training subsets of shorter duration (specified by the ‘train_set_durs’ option in the LEARNCURVE section of a config.toml file) will be drawn from this set.
- Type:
- train_set_durs¶
Durations of datasets to use for a learning curve. Float values, durations in seconds of subsets taken from training data to create a learning curve, e.g. [5., 10., 15., 20.]. Default is None. Required if config file has a learncurve section.
- Type:
list, optional
- num_replicates¶
Number of replicates to train for each training set duration in a learning curve. Each replicate uses a different randomly drawn subset of the training data (but of the same duration). Default is None. Required if config file has a learncurve section.
- Type:
int, optional
- __init__(data_dir, output_dir, dataset_type, input_type, audio_format=None, spect_format=None, spect_params=None, annot_file=None, annot_format=None, labelset=None, audio_dask_bag_kwargs=None, train_dur=None, val_dur=None, test_dur=None, train_set_durs=None, num_replicates=None) None ¶
Method generated by attrs for class PrepConfig.
Methods
__init__
(data_dir, output_dir, dataset_type, ...)Method generated by attrs for class PrepConfig.
from_config_dict
(config_dict)Return
PrepConfig
instance from adict
.is_valid_dataset_type
(attribute, value)is_valid_input_type
(attribute, value)Attributes
input_type
- classmethod from_config_dict(config_dict: dict) PrepConfig [source]¶
Return
PrepConfig
instance from adict
.The
dict
passed in should be the one found by loading a valid configuration toml file withvak.config.parse.from_toml_path()
, and then using keyprep
, i.e.,PrepConfig.from_config_dict(config_dict['prep'])
.