Spectrogram file format¶
File type¶
vak
uses pre-computed files containing spectrograms.
For these files, it accepts two types, either .npz
or .mat
.
.npz
is a numpy
library format,
for a file that can contain multiple arrays.
.mat
is the Matlab data file format—many labs
have existing codebases that generate spectrograms using Matlab.
To work with one of these formats,
you will specify either npz
or vak
in the [PREP]
section
of your .toml
configuration file.
Note
vak
loads .mat
files with the function scipy.io.loadmat
.
That function can only load
v4 (Level 1.0), v6 and v7 to 7.2 matfiles as stated here:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html
Version 7.3 of the matfile format uses an HDF5-based format,
which is not supported by scipy
or vak
.
(For more details see
this page in the Matlab documentation
)
If you have are working with Matlab,
please either save your .mat
files
in a format that can be ready by scipy.io.loadmat
,
or convert your data to .npz
files
as described in How do I use my own spectrograms?.
Conventions¶
Regardless of whether they are .npz
files or .mat
files,
vak
expects any spectrogram files to obey the following conventions.
Content¶
A spectrogram array files should contain (at least) three items.
The spectrogram, an m x n matrix
A vector of m frequency bins, where the value of each element is the frequency at the center of the bin
A vector of n time bins, where the value each element is the time at the center of the bin
A fourth item is not required, but is suggested.
A string path to the audio file from which the spectrogram was generated.
Other arrays can be in the file, but they will be ignored.
Array naming¶
By convention each item should be associated with a string key.
The defaults built into vak are: ‘s’, ‘f’, ‘t’, and ‘audio_path’.
These defaults can be changed when preparing a dataset
by changing the corresponding options in the [SPECT_PARAMS] section
of a .toml configuration file.
If you are using Matlab to generate the spectrogram files,
then you will need to either save your workspace variables with the default names,
or tell vak
what names you used by changing the [SPECT_PARAMS] options.
As noted above, the audio_path
is not required,
but it is added by vak.prep
when generating a dataset of spectrogram files from audio.
Spectrogram file naming¶
There are two valid ways to name spectrogram files.
The first is to name each spectrogram file the same
as the name of the audio file it was created from,
with the spectrogram file format added.
E.g., if your audio file is bird1.wav
,
then the spectrogram file should be bird1.wav.npz
.
The second way is to name the spectrogram file
by replacing the audio file extension with the array file
extension, e.g., the spectrogram from bird1.wav
would be saved in bird1.npz
.
The second way may be more intuitive,
while the first allows for other .npz
files
with the same stem in the same directory,
e.g. day1/bird1.wav.npz
and day1/bird1.ftr.npz
can be found side by side.
For more detail, please see the page
File naming conventions.
Example array files that meet this spectrogram file format specification¶
Please click on this link to download a .tar.gz archive containing
spectrogram files generated by a run of vak prep
on audio data:
https://osf.io/9cedz/download
You can inspect the contents .npz
array files by loading them with numpy.load
These files are provided to demonstrate the specification described here.
You may find them helpful as examples
if you prefer to generate your own spectrograms,
and you need to write a script to create array files
containing your spectrograms so vak
can work with them,
as described in How do I use my own spectrograms?.