INDUSTRYTMT & Other
Over the past few months, every fan of generative modeling has observed several developments and papers on text-to-image generation. The recent advancements and successes in the image-generation domain have created a wave of interest among tech companies and machine learning practitioners. Contributing to this trend, the deepsense.ai team prepared a Christmas Card Creator, utilizing and adapting diffusion models.The challenge
The goal of the project was to develop an application for internal use, which on the one hand added a touch of AI to the leadup to Christmas and on the other hand became another exciting R&D experience.
The main challenges arose from the architecture, time constraints and UX design.
The main technical goal was to obtain hands-on experience in adapting and applying diffusion models. There were several key objectives within the project:Domain adaptation
Our projects are usually bespoke solutions for a particular area. In this case, the intention was to make sure we could narrow down the image generation space to a given domain – a Christmas theme. In order to do so, we had to pre-train the model and incentivize it to produce only ‘Christmas card’–like images.Object embedding
We aimed to explore the ability to introduce new concepts/objects to the “memory” of the network by experimenting with different styles of image creation, but also with the content. We decided to make it possible to finetune the model on faces, so everyone could generate a Christmas card with a personal touch. It turned out that just 20 images was enough for the model to learn faces and generate images with a strong resemblance to the originals.Limited computational resources
The generative models typically need massive computational resources to train and use them. Our intention was to research the limits on the lower end and check how to get them to generate attractive images at a minimal cost. As a result, the memory requirements (from 16 to 10 GB of VRAM) were significantly reduced. The final implementation was able to train a personal model on a single GPU within 30 minutes and generate up to 10 images in a single minute.Scaling
The application processed thousands of tasks simultaneously thanks to the implemented queue mechanism, distributed workers and load balancing. We stored thousands of generated photos, and users had access to them any time they logged into the application. That access was smooth and fast thanks to the implemented client side performance optimization – deferring loading of the images, limiting rendered DOM nodes and removing as many unnecessary rerenders as we could.Pleasant user experience
Finally, to encourage users to play with the card generator, an attractive and easy-to-use interface was provided. We developed a React application that communicates with the backend through an effective API, serving as a nice-looking facade for complicated requests. The tool made it possible to train personalized user models and utilize them to generate Christmas cards, e.g. with their faces wearing glasses or Christmas hats in different scenes.The effect
The application was widely popular among the deepsense.ai team – more than 100 custom models were trained and 9500 cards were generated. In addition, some of the team showed off their cards on social media, which generated organic traffic at the level of about 30,000 views.