DreamBooth

From HandWiki
Short description: Deep learning generation model

DreamBooth is a deep learning generation model used to personalize existing text-to-image models by fine-tuning. It was developed by researchers from Google Research and Boston University in 2022. Originally developed using Google's own Imagen text-to-image model, DreamBooth implementations can be applied to other text-to-image models, where it can allow the model to generate more fine-tuned and personalized outputs after training on three to five images of a subject.[1][2][3]

Technology

Pretrained text-to-image diffusion models, while often capable of offering a diverse range of different image output types, lack the specificity required to generate images of lesser-known subjects, and are limited in their ability to render known subjects in different situations and contexts.[1] The methodology used to run implementations of DreamBooth involves the fine-tuning the full UNet component of the diffusion model using a few images (usually 3--5) depicting a specific subject. Images are paired with text prompts that contain the name of the class the subject belongs to, plus a unique identifier. As an example, a photograph of a [Nissan R34 GTR] car, with car being the class); a class-specific prior preservation loss is applied to encourage the model to generate diverse instances of the subject based on what the model is already trained on for the original class.[1] Pairs of low-resolution and high-resolution images taken from the set of input images are used to fine-tune the super-resolution components, allowing the minute details of the subject to be maintained.[1]

Usage

DreamBooth can be used to fine-tune models such as Stable Diffusion, where it may alleviate a common shortcoming of Stable Diffusion not being able to adequately generate images of specific individual people.[4] Such a use case is quite VRAM intensive, however, and thus cost-prohibitive for hobbyist users.[4] The Stable Diffusion adaptation of DreamBooth in particular is released as a free and open-source project based on the technology outlined by the original paper published by Ruiz et. al. in 2022.[5] Concerns have been raised regarding the ability for bad actors to utilise DreamBooth to generate misleading images for malicious purposes, and that its open-source nature allows anyone to utilise or even make improvements to the technology.[6] In addition, artists have expressed their apprehension regarding the ethics of using DreamBooth to train model checkpoints that are specifically aimed at imitating specific art styles associated with human artists; one such critic is Hollie Mengert, an illustrator for Disney and Penguin Random House who has had her art style trained into a checkpoint model via DreamBooth and shared online, without her consent.[7][8]

References

  1. 1.0 1.1 1.2 1.3 Ruiz, Nataniel; Li, Yuanzhen; Jampani, Varun; Pritch, Yael; Rubinstein, Michael; Aberman, Kfir (2022-08-25). "DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation". arXiv:2208.12242 [cs.CV].
  2. Yuki Yamashita (September 1, 2022). "愛犬の合成画像を生成できるAI 文章で指示するだけでコスプレ 米Googleが開発" (in ja). https://www.itmedia.co.jp/news/articles/2209/01/news041.html. "米Google Researchと米ボストン大学の研究チームが開発した...数枚の被写体画像とテキスト入力を使って、与えられた被写体が溶け込んだ新たな合成画像を作成する被写体駆動型Text-to-Imageモデルだ。" 
  3. Brendan Murphy (October 13, 2022). "AI image generation is advancing at astronomical speeds. Can we still tell if a picture is fake?". https://theconversation.com/ai-image-generation-is-advancing-at-astronomical-speeds-can-we-still-tell-if-a-picture-is-fake-191674. "Recently, Google has released Dream Booth, an alternative, more sophisticated method for injecting specific people, objects or even art styles into text-to-image AI systems." 
  4. 4.0 4.1 Ryo Shimizu (October 26, 2022). "まさに「世界変革」──この2カ月で画像生成AIに何が起きたのか?" (in ja). https://news.yahoo.co.jp/articles/9b10970e584f1a43e8cbb8e1b9d7b9d21bc88941. "Stable Diffusionは、一般に個人の写真や特定の人物を出すのが苦手だが、自分のペットや友人の写真をわずかな枚数から学習させる「Dreambooth」という技術が開発され、これも話題を呼んだ。ただし、Dreamboothでは、巨大なGPUメモリが必要になり、個人ユーザーが趣味の範囲で買えるGPUでは事実上実行不可能なのがネックとされていた。" 
  5. Benj Edwards (December 9, 2022). "AI image generation tech can now create life-wrecking deepfakes with ease". https://arstechnica.com/information-technology/2022/12/thanks-to-ai-its-probably-time-to-take-your-photos-off-the-internet/. "But not long after its announcement, someone adapted the Dreambooth technique to work with Stable Diffusion and released the code freely as an open source project." 
  6. Kevin Jiang (December 1, 2022). "These AI images look just like me. What does that mean for the future of deepfakes?". https://www.thestar.com/business/technology/2022/12/01/these-ai-images-look-just-like-me-what-does-that-mean-for-the-future-of-deepfakes.html. "For example, DreamBooth could be used to copy signatures or official signage to fake documents, create misleading photos or videos of politicians, manufacture revenge porn of individuals and more... A specific issue with DreamBooth and Stable Diffusion is that they’re open source, Gupta continued. Unlike centralized AI-generation models that can impose regulations and barriers to image creation, the decentralized models like DreamBooth mean anyone can access and improve on the technology." 
  7. Isabel Berwick; Sophia Smith (December 14, 2022). "Will AI replace human workers?". https://www.ft.com/content/24f07261-f95d-4bb3-8aa4-3799f1f75e52. "Illustrator Hollie Mengert, whose artwork was used to train an AI model without her consent, spoke publicly against the practice of training AI models on artists’ work without permission." 
  8. "Генеративные нейросети и этика: появилась модель, копирующая стиль конкретного художника" (in ru). November 9, 2022. https://dtf.ru/life/1436360-generativnye-neyroseti-i-etika-poyavilas-model-kopiruyushchaya-stil-konkretnogo-hudozhnika. "Так, совсем недавно известная художница и иллюстратор Холли Менгерт стала своеобразным датасетом для новой нейросети (не давая на то согласия)... «В первую очередь мне показалось бестактным то, что моё имя фигурировало в этом инструменте. Я ничего о нём не знала и меня об этом не спрашивали. А если бы меня спросили, можно ли это сделать, я бы не согласилась»." 

External links