Software:DreamBooth

From HandWiki

DreamBooth is a deep learning generation model used to fine-tune existing text-to-image models, developed by researchers from Google Research and Boston University in 2022. Originally developed using Google's own Imagen text-to-image model, DreamBooth implementations can be applied to other text-to-image models, where it can allow the model to generate more fine-tuned and personalised outputs after training on three to five images of a subject.[1][2][3]

Demonstration of the use of DreamBooth to fine-tune the Stable Diffusion v1.5 diffusion model. Depicted here are algorithmically-generated images of Jimmy Wales, founder of Wikipedia, performing bench press exercises at a fitness gym.

Technology

Pretrained text-to-image diffusion models, while often capable of offering a diverse range of different image output types, lack the specificity required to generate images of lesser-known subjects, and are limited in their ability to render known subjects in different situations and contexts.[1] The methodology used to run implementations of DreamBooth involves the fine-tuning of such models using a small set of images depicting a specific subject, with three to five images identified as generally sufficient, and these images are paired with text prompts that contain the name of the class the subject belongs to, plus a unique identifier (for example, a photograph of a [Nissan R34 GTR] car, with car being the class); a class-specific prior preservation loss is applied to encourage the model to generate diverse instances of the subject based on what the model is already trained on for the original class.[1] Pairs of low-resolution and high-resolution images taken from the set of input images are used to fine-tune the super resolution components, allowing the minute details of the subject to be maintained.[1]

Usage

DreamBooth can be used to fine-tune models such as Stable Diffusion, where it may alleviate a common shortcoming of Stable Diffusion not being able to adequately generate images of specific individual people.[4] Such a use case is quite VRAM intensive, however, and thus cost-prohibitive for hobbyist users.[4] Concerns have been raised regarding the ethics of using DreamBooth to train model checkpoints that are specifically aimed at imitating specific artstyles associated with human artists; one such critic is Hollie Mengert, an illustrator for Disney and Penguin Random House who has had her artstyle trained into a checkpoint model via DreamBooth and shared online, without her consent.[5]

References

  1. 1.0 1.1 1.2 1.3 Ruiz, Nataniel; Li, Yuanzhen; Jampani, Varun; Pritch, Yael; Rubinstein, Michael; Aberman, Kfir (2022-08-25). "DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation". arXiv (Google Research, Boston University). doi:10.48550/arXiv.2208.12242. https://arxiv.org/abs/2208.12242. 
  2. Yuki Yamashita (September 1, 2022). "愛犬の合成画像を生成できるAI 文章で指示するだけでコスプレ 米Googleが開発" (in ja). https://www.itmedia.co.jp/news/articles/2209/01/news041.html. "米Google Researchと米ボストン大学の研究チームが開発した...数枚の被写体画像とテキスト入力を使って、与えられた被写体が溶け込んだ新たな合成画像を作成する被写体駆動型Text-to-Imageモデルだ。" 
  3. Brendan Murphy (October 13, 2022). "AI image generation is advancing at astronomical speeds. Can we still tell if a picture is fake?". https://theconversation.com/ai-image-generation-is-advancing-at-astronomical-speeds-can-we-still-tell-if-a-picture-is-fake-191674. "Recently, Google has released Dream Booth, an alternative, more sophisticated method for injecting specific people, objects or even art styles into text-to-image AI systems." 
  4. 4.0 4.1 Ryo Shimizu (October 26, 2022). "まさに「世界変革」──この2カ月で画像生成AIに何が起きたのか?" (in ja). https://news.yahoo.co.jp/articles/9b10970e584f1a43e8cbb8e1b9d7b9d21bc88941. "Stable Diffusionは、一般に個人の写真や特定の人物を出すのが苦手だが、自分のペットや友人の写真をわずかな枚数から学習させる「Dreambooth」という技術が開発され、これも話題を呼んだ。ただし、Dreamboothでは、巨大なGPUメモリが必要になり、個人ユーザーが趣味の範囲で買えるGPUでは事実上実行不可能なのがネックとされていた。" 
  5. Andy Baio (November 1, 2022). "Invasive Diffusion: How one unwilling illustrator found herself turned into an AI model". https://waxy.org/2022/11/invasive-diffusion-how-one-unwilling-illustrator-found-herself-turned-into-an-ai-model/. 

External links