arxiv GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement

名称
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
首页
https://yiyibooks.cn/arxiv/2406.11546v1/index.html
原始地址
https://arxiv.org/abs/2406.11546
描述
数据集大小的快速增长刺激了语音技术的发展。传统的语音模型通常依赖于大量标记的训练数据,这对于资源匮乏的语言来说是稀缺的。本文介绍了gigapeech 2,一个大规模、多领域、多语言的语音识别语料库 ... ...