基本信息 - VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

arxiv VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

阅读

Star 1

名称: VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech

首页: https://yiyibooks.cn/arxiv/2401.14321v4/index.html

原始地址: https://arxiv.org/pdf/2401.14321.pdf

描述

最近具有仅解码器 Transformer 架构的 TTS 模型（例如 SPEAR-TTS 和 VALL-E）实现了令人印象深刻的自然度，并展示了在给定语音提示的情况下零样本适应的能力。然而，此类仅解码器的 TTS 模型缺乏单调对齐约束，有时会导致发音错误、跳词和重复等幻觉问题。为了解决这个限制，我们提出了 VALL-T，一种生成 Transducer 模型，它引入了输入音素序列的移动相对位置嵌入，明确指示单调生成过程，同时保持仅解码器 Transformer 的架构 ...

0%

上传成功 0 个文件