vak.prep.unit_dataset.unit_dataset.prep_unit_dataset¶

vak.prep.unit_dataset.unit_dataset.prep_unit_dataset(audio_format: str, output_dir: str, spect_params: dict, data_dir: list | None = None, annot_format: str | None = None, annot_file: str | Path | None = None, labelset: set | None = None, context_s: float = 0.005) → DataFrame[source]¶

Prepare a dataset of units from sequences, e.g., all syllables segmented out of a dataset of birdsong.

Parameters:

audio_format
output_dir
spect_params
data_dir
annot_format
annot_file
labelset
context_s

Returns:

unit_df (pandas.DataFrame) – A DataFrame representing all the units in the dataset.
shape (tuple) – A tuple representing the shape of all spectograms in the dataset. The spectrograms of all units are padded so that they are all as wide as the widest unit (i.e, the one with the longest duration).