Text-to-image customization, which takes given texts and images depicting given subjects as inputs, aims to synthesize new images that align with both text semantics and subject appearance. This task provides precise control over details that text alone cannot capture and is fundamental for various real-world applications, garnering significant interest from academia and industry. Existing works follow the pseudo-word paradigm, which involves representing given subjects as pseudo-words and combining them with given texts to collectively guide the generation.
生成多个不同的主题仍然是现有文本到图像扩散模型的挑战。复杂的提示往往会导致主题泄漏,导致数量、属性和视觉特征的不准确。防止受试者之间的泄漏需要了解每个受试者的空间位置 ...