基本信息 - The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

arxiv The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

阅读

Star 0

名称: The Best of Both Worlds: Integrating Language Models and Diffusion Models for Video Generation

首页: https://yiyibooks.cn/arxiv/2503.04606v2/index.html

原始地址: https://arxiv.org/abs/2503.04606

描述

文本到视频（T2V）一代的最新进展是由两个竞争范式驱动的：自回归语言模型和扩散模型。但是，每个范式都有内在的局限性：语言模型在视觉质量和错误积累中挣扎，而扩散模型缺乏语义理解和因果建模。在这项工作中，我们提出了Landiff，这是一个混合框架，通过粗到精细的一代协同范围 ...

文件上传进度

0%

上传成功 0 个文件