Skip to content

Latest commit

 

History

History
33 lines (23 loc) · 945 Bytes

File metadata and controls

33 lines (23 loc) · 945 Bytes

Long-Form-RNN-T-Datasets

This repo provides a list of publicly available datasets we used in our paper "VADOI: VOICE-ACTIVITY-DETECTION OVERLAPPING INFERENCE FOR END-TO-END LONG-FORM SPEECH RECOGNITION"

LibriSpeech
V. Panayotov, G. Chen, D. Povey, and S. Khudanpur, “Librispeech: an asr corpus based on public domain au- dio books,” in ICASSP, 2015, pp. 5206–5210.

Switch-board
J. J. Godfrey, E. C. Holliman, and J. McDaniel, “Switch- board: Telephone speech corpus for research and devel- opment,” in ICASSP. IEEE Computer Society, 1992, vol. 1, pp. 517–520.

Fisher
C. Cieri, D. Miller, and K. Walker, “The fisher corpus: A resource for the next generations of speech-to-text.,” in LREC, 2004, vol. 4, pp. 69–71.

Amazon prime video https://www.amazon.com/Amazon-Video/b?ie=UTF8&node=7613704011

Tedtalks https://www.ted.com/talks

Sinclair broadcast https://sbgi.net/

Farm Journal https://www.farmjournal.com/