Text-to-image customization, which takes given texts and images depicting given subjects as inputs, aims to synthesize new images that align with both text semantics and subject appearance. This task provides precise control over details that text alone cannot capture and is fundamental for various real-world applications, garnering significant interest from academia and industry. Existing works follow the pseudo-word paradigm, which involves representing given subjects as pseudo-words and combining them with given texts to collectively guide the generation.

0 0 0 0 2025/11/06 arXiv:2408.09744v3 dirkashin

生成多个不同的主题仍然是现有文本到图像扩散模型的挑战。复杂的提示往往会导致主题泄漏,导致数量、属性和视觉特征的不准确。防止受试者之间的泄漏需要了解每个受试者的空间位置 ...

0 0 0 0 2025/11/05 arXiv:2505.21488v1 dirkashin

来一起翻译吧!


为了您和其他读者获得更好的阅读体验,请您在阅读时勇敢地改正翻译,特别是一些显而易见的机器翻译错误。


虽然我们追求卓越,但我们并不要求翻译十全十美,因此请不要担心您翻译有误 —— 我们的服务器已经记录所有的翻译,您不必担心会因为您的失误导致无法挽回的破坏。(改编自维基百科)