[Reels] Imagen yourself

티스토리 뷰

Showing off studying ML/ML - academic reels

BayesianBacteria 2024. 10. 15. 21:49

Link , Personalized text-to-image generation by Meta AI

they design kind of improved IP-adaptor for “any-personality” generation.
- therefore, the model does not need to be trained for a new subject, unlike LoRA or Dreambooth.
- meanwhile, other “any-personality generation models” could come with a strong over-fitting behavior such as copy-paste effect to the reference image. (it can be resolved in the synthetic pair dataset below)

the limitation of the existing personalization task is “copy-paste” effect—the generated image looks super-similar to the given reference image.
- it means that the target generated image “does not follow” the given prompt.
to resolve such issue, authors proposes the synthetic-data pipeline consisting of several real and synthetic data for one identity.
- sadly, the details of the pipeline is not included (such as, how to generate synthetic “personalized image”)

(common space between image and text) CLIP
(encoding “characters”) ByT5: Byte-Level (Character-level) T5 architecture. (might improve the “text image” generation—for instance, the sign of “moreh is cool”)
(Comprehending long and intricate text prompts) UL2: “improved T5”

only applicable for the models with cross-attention text conditioning.
- at least, from the proposed architecture in the paper
- to apply SD3-like architecture (w/o cross attentions), it need to be adjusted

[Reels] LCM-Lookahead for Encoder-based Text-to-Image Personalization (2)	2024.10.15
[Reels] Battle of the Backbones: A large-Scale Comparison of pre-trained models across computer vision tasks (1)	2024.08.20
[Reels] HyDE (Hypothetical Document Embedding) (0)	2024.05.02
[Reels] The simple theoretical background of Domain generalization (0)	2024.04.30

공지사항

최근에 올라온 글

최근에 달린 댓글

링크

글 보관함