Usage#

usage: cl [-h]
          {boxplot,b,test_and_duration,td,testwer,tw,statmd,s,st,logplot,l,traintimes,t,times,correlations,c,convert,cv,significance-tests,sign}
          ...

Sub-commands:#

boxplot (b)#

Produce violin plots for different wer_test.txt files. Each file will corresponding to three violin plots (insertions, deletions and substitutions). Usage: python -m cl b <path-to-wer-txt1> [<path-to-wer-txt2> …]

cl boxplot [-h] --wer-paths PATHS [PATHS ...] [--out-path OUT_PATH]
           [--silent-ignore]

Named Arguments#

--wer-paths, -p

List of paths to wer_test.txt files.

--out-path, -o

Path where to save the output plot. If not provided, then the figure will be shown (plt.show()) without being saved somewhere.

--silent-ignore, -s

If provided then we are going to silently ignore the paths to the wer_test.txt files that don’t exist.

Default: False

test_and_duration (td)#

Print wer scores on the test set w.r.t. the durations of the utterances.

cl test_and_duration [-h] [--suffix SUFFIX] [--test-csv TEST_CSV] [--cer]
                     exp_dir

Positional Arguments#

exp_dir: Path to an experiment directory containing the wer_test*.csv and cer_test*.csv files.

Named Arguments#

--suffix, -s

Suffix for the wer_test*.txt and cer_test*.txt. E.g. if suffix=’_segmented’ then we will look for the files wer_test_segmented.txt and cer_test_segmetned.txt.

Default: “”

--test-csv, -t

Path to the test*.csv file which is used for testing. If not provided and –suffix is provided then the test file we are going to look for is going to be test*suffix*.csv

--cer, -c

Whether to check the cer_test*.txt file instead of the wer_test*.txt one.

Default: False

testwer (tw)#

Print wer scores on the test set with the ability to make a barplot.

cl testwer [-h] [--vad] [--forced-segmented] [--wer-suffix WER_SUFFIX]
           [--out-path OUT_PATH] [--cer]
           [--model-name-mappings MODEL_NAME_MAPPINGS]
           [--max-repetition-length MAX_REPETITION_LENGTH]
           [--save-modified SAVE_MODIFIED]
           [--out-kaldi-path [OUT_KALDI_PATH ...]] [--is-cer]
           [--error-threshold ERROR_THRESHOLD] [--print-stats] [--compare]
           [--out-log-path OUT_LOG_PATH] [--find-anomalies]
           [--remove-repetitions] [--to-kaldi-text] [--noreps-to-wer-txt]
           exps [exps ...] [textfile ...] [sb_test_paths ...] [paths ...]

Positional Arguments#

exps: E.g. /path/to/exps/ or /path/to/exps/exp1/1001/ /path/to/exps/exp2/2002 …
textfile: Should probably end with .kaldi
sb_test_paths: Path or paths to wer_test*.txt files.
paths: Path or paths to wer_test*.txt files.

Named Arguments#

--vad, -v

If true then we will read wer files named wer_test_vadded.txt

Default: False

--forced-segmented, -f

If true then we will read wer files named wer_test_forced_segmented.txt

Default: False

--wer-suffix, -s

How should the wer txt file be named? Overrides the ‘-v’ and ‘-f’ options.

--out-path, -o

Location of the output bar plot.

--cer

Use CER instead of WER.

Default: False

--model-name-mappings, -m

Path to a .json file containing a dictionary with the keys curriculum_mappings, transfer_mappings and subset_mappings. This remains to be documented.

--max-repetition-length

Default: 5

--save-modified, -sw

--out-kaldi-path, -okp

Path where the output file will be saved in kaldi’s text format (utt_id w1 w2…).

Default: []

--is-cer

Default: False

--error-threshold, -t

WER/CER/Whatever threshold below which the utterances are considered okay.

Default: 50.0

--print-stats, -p

Whether to print the stats or not.

Default: False

--compare, -c

If true then you must have provided pairs of paths to wer_test*.txt and cer_test*.txt file which will be compared.

Default: False

--out-log-path, -ol

Not implemented.

--find-anomalies, -fa

If provided then we are going to try to find anomalies in the provided wer_test*.txt or cer_test*.txt files and compare them if the –compare option is also provided.

Default: False

--remove-repetitions, -rr

Remove repetitions from .kaldi files.

Default: False

--to-kaldi-text, -kt

If provided then the input {wer,cer}_test*.txt files will be converted to kaldi format (utt_id word1 word2 …).

Default: False

--noreps-to-wer-txt, -nr

If provided then a {wer,cer}_test_noreps.txt file will be created based on the input _noreps.kaldi files. This will contain the typical wer_test.txt stats of speechbrain

Default: False

statmd (s)#

Prints a list of train/dev losses and metric values for the files provided. At the end of the list, it also prints the performance on the test set (if the test set has been decoded).

cl statmd [-h] [--metrics [{PER,WER,CER,p,w,c} ...]] [--no-show-losses]
          [-o OUTPUT_PATH] [--per] [--wer] [--cer]
          PATHS [PATHS ...]

Positional Arguments#

PATHS: Path(s) to log.txt files.

Named Arguments#

--metrics, -m

Possible choices: PER, WER, CER, p, w, c

Expected metric based on which we will extract the statistics. PER (p), WER (w), CER (c) are the only ones currently allowed.

Default: []

--no-show-losses, -nl

If provided then we will not print the train/valid losses.

Default: False

-o, --output-path

If a valid path is provided then the output will be saved to the corresponding file. Otherwise, the output will be simply printed in the console.

--per, -p

Use PER as the expected metric.

Default: False

--wer, -w

Use WER as the expected metric.

Default: False

--cer, -c

Use CER as the expected metric.

Default: False

st (significance-tests, sign)#

Undocumented

cl st [-h] [--out-dir OUT_DIR] ref hyps [hyps ...]

Positional Arguments#

ref: Path to the reference .trn file.
hyps: A sequence of paths to .trn files.

Named Arguments#

--out-dir, -o

Directory where the output files will be written (.unigram, .wilc, .mcn, .sign, .mapsswe).

Default: “”

logplot (l)#

Produces some plots from the log.txt files produced from training. These plots will contain information about how the train/valid losses and train/valid WER/CERs evolve over the training epochs.

cl logplot [-h] [--print-seed PRINT_SEED] [--plot-valid-results] [--barplot]
           [--model-name-mappings MODEL_NAME_MAPPINGS]
           [--metrics [{PER,WER,CER,p,w,c} ...]] [--no-show-losses]
           [-o OUTPUT_PATH] [--per] [--wer] [--cer]
           PATHS [PATHS ...]

Positional Arguments#

PATHS: Path(s) to log.txt files.

Named Arguments#

--print-seed, -ps

Whether to also include the seed information of the experiment’s directories.

Default: False

--plot-valid-results, -v

If provided then the validation set’s performance progress will be plotted for each model.

Default: False

--barplot, -b

If provided then instead of line plots we are going to plot grouped barplots (per epoch).

Default: False

--model-name-mappings, -nm

Path to a .json file containing a dictionary with the keys curriculum_mappings, transfer_mappings and subset_mappings. This remains to be documented.

--metrics, -m

Possible choices: PER, WER, CER, p, w, c

Expected metric based on which we will extract the statistics. PER (p), WER (w), CER (c) are the only ones currently allowed.

Default: []

--no-show-losses, -nl

If provided then we will not print the train/valid losses.

Default: False

-o, --output-path

If a valid path is provided then the output will be saved to the corresponding file. Otherwise, the output will be simply printed in the console.

--per, -p

Use PER as the expected metric.

Default: False

--wer, -w

Use WER as the expected metric.

Default: False

--cer, -c

Use CER as the expected metric.

Default: False

traintimes (t, times)#

Simple line plots denoting the training times (for each epoch) that each model requires

cl traintimes [-h] [--visualize] [--out-plot-path OUT_PLOT_PATH] [--silent]
              [--show-hours-per-model] [--train_csv_name TRAIN_CSV_NAME]
              [input ...]

Positional Arguments#

input: E.g. ‘./path/to/recipes///log.txt’ or ‘./path/to/log.txt’

Named Arguments#

--visualize, -v

If provided, we will also plot the train times per epoch for each model.

Default: False

--out-plot-path, -o

If provided, the output plot (assuming -v is also provided) will be saved there.

--silent, -s

If provided, the program won’t throw NoEpochsTrained errors.

Default: False

--show-hours-per-model, --hpm

If provided then we are going to show the hours that each model has seen during its training. You also need to provide the train_csv_name argument in case it’s not the default value.

Default: False

--train_csv_name, --csv

What’s the filename of the .csv file used for training the speechbrain model.

Default: “train-complete_segmented.csv”

correlations (c)#

Calculate correlations model pairs. The output will be a horizontal barplot where each bar corresponds to the correlation of a pair. You should provide at least 1 pair of models containing a wer_test{suffix}.txt file each.

cl correlations [-h] [--wer-suffix WER_SUFFIX] [--out-path OUT_PATH]
                [--model-name-mappings MODEL_NAME_MAPPINGS]
                exps [exps ...]

Positional Arguments#

exps: E.g. /path/to/exps/ or /path/to/exps/exp1/1001/ /path/to/exps/exp2/2002 …

Named Arguments#

--wer-suffix, -s: How should the wer txt file be named? Overrides the ‘-v’ and ‘-f’ options.
--out-path, -o: Location of the output bar plot.
--model-name-mappings, -m: Path to a .py file containing a dictionary with the keys curriculum_mappings, transfer_mappings and subset_mappings. This remains to be documented.

convert (cv)#

Convert log files to specified formats. E.g. the -trn option converts wer_test*.txt files to .trn files. I.e. w1 w2 w3 … w_n (utterance_id)

cl convert [-h] [--to-trn-format] [--out-dir OUT_DIR] paths [paths ...]

Positional Arguments#

paths: A sequence of paths to wer_test*.txt files containing speechbrain’s outputs.

Named Arguments#

--to-trn-format, -trn

If provided, will convert the wer_test*.txt to the trn format.

Default: False

--out-dir, -o

Directory where the references.trn and hypotheses.trn files will be saved. If not specified, then the .trn files will be saved in the same directory as the original files.