deepsense.ai
  • Careers
    • Job offers
    • Summer internship
  • Clients’ stories
  • Services
    • AI software
    • Team augmentation
    • AI advisory
    • Train your team
    • Generative models
  • Industries
    • Retail
    • Manufacturing
    • Financial & Insurance
    • IT operations
    • TMT & Other
    • Medical & Beauty
  • Knowledge base
    • deeptalks
    • Blog
    • R&D hub
  • About us
    • Our story
    • Management
    • Advisory board
    • Press center
  • Contact
  • Menu Menu
ChatGPT – what is the buzz all about?

ChatGPT – what is the buzz all about?

March 10, 2023/in Generative models /by Eryk Mazuś and Maciej Domagała

Over the last few months, ChatGPT has generated a great deal of excitement. Some have gone as far as to suggest it is a giant step in developing AI that will overtake humanity in many important areas, both in business and social life. Others view it more as a distraction on the path towards achieving human-level intelligence. How did ChatGPT generate such hype? In this article, we’ll try to explain.

How did we get here?

Recent advances in natural language processing can be viewed as a progression toward more flexible and general systems. We can see various ideas flowing through the field of NLP development. A few years ago, around 2013-14, the main approach to NLP tasks was to use word embeddings, which are vectors that represent the meaning of words. This was the standard approach in the vast majority of language-related tasks, such as text classification, in which embeddings were first obtained either through training or by downloading pre-trained vectors from public sources, and then fed into a task-specific architecture. This approach necessitated the creation of a task-specific, labeled dataset on the one hand, and a task-specific architecture of the model itself on the other. Not only did this require a significant amount of effort, but the performance of such an approach was limited by the representational capabilities of the input embeddings. Word embeddings were unable to capture the meaning of words based on context (words surrounding them) or the semantics of the entire text.

Figure 1: NLP Timeline

Figure 1: NLP Timeline

Since 2015, researchers have been experimenting with the idea of semi-supervised pre-training of LSTM [1] and Transformer-based language models on large corpora of text, and then supervised fine-tuning them for specific tasks on a much smaller dataset. BERT [2] and GPT-1 [3] are two examples of such approaches. Such methods eliminated the need for task-specific models, resulting in architectures that outperformed existing solutions to many difficult NLP tasks. Even though the task-specific dataset and fine-tuning were still required, it is a significant improvement.

The scarcity of large enough datasets for some tasks, the effort required to create them, and the lack of generalization of fine-tuned models outside the training distribution prompted the development of a new, human-like paradigm in which all that is required is a short natural language description of the task that the model is asked to perform, with an optional, tiny number of demonstrations added to the instruction. GPT-2 [4], GPT-3 [5], and other generative language models described in the following section represent this paradigm.

GPT: applications and architecture

GPT is an abbreviation of Generative Pre-trained Transformer. It is generative in the sense that it can generate text-given input. Because it has already been trained on a large corpus of text, it is pre-trained. Finally, it is a neural network architecture that is based on the Transformer [6].

A GPT generates text in response to a text input, called a prompt. It is a simple but versatile framework, as many problems can be converted to text-to-text tasks. On the one hand, GPT can be asked to perform standard NLP tasks such as summarizing/classifying a text passage, answering questions about a given piece of text, or extracting named entities from it. On the other hand, due to its generative nature, GPT is an ideal tool for creative applications. It can create a story based on a brief premise, hold a conversation, or… write a blog post. Furthermore, if trained on a corpus of code, such a model could perform code generation, editing, and explanation tasks, such as generating Python docstrings, generating git commit messages, translating natural language to SQL queries, or even translating code from one programming language to another.

Modern language models, such as OpenAI’s GPT-3, Google’s LaMDA [7], and DeepMind’s Gopher [8], are essentially GPT implementations. They are much more powerful than the original GPT-1, mostly because of their size – for instance, the largest variant has 175 billion parameters – and because they were pre-trained on massive amounts of text; in the case of GPT-3, it was hundreds of billions of words.

Figure 2 - Number of parameters and the release date of Transformer-based models. GPT-like models are highlighted in red

Figure 2: Number of parameters and the release date of Transformer-based models. GPT-like models are highlighted in red. Source: [9]

The GPT and GPT-like models are actually autoregressive language models that predict the next word in a sequence. After predicting the next word, it is appended to the initial sequence and fed back into the model to predict the subsequent one. The procedure is repeated until the model outputs a stop token or reaches the user-specified maximum length of the output sequence.

From a technical standpoint, the model is a decoder-only variant of a Transformer model, consisting of a stack of Transformer blocks followed by a linear layer and softmax that predict the probability that each word in the model’s vocabulary is the next token in a sequence. Each transformer block is composed of a Multi-Head Casual Self Attention layer, a linear layer, layer normalizations, and residual connections. This architecture can be thought of as a “general-purpose differential computer” that is both efficient (transformers enable high parallelism of computation) and optimizable (via backpropagation)[10].

Figure 3 - Decoder architecture underpinning the GPT-like models

Figure 3: Decoder architecture underpinning the GPT-like models Source: [11]

ChatGPT

The research community recently took a few steps forward in the development of language models. GPT-family models are trained to complete the input text rather than follow the user’s instructions. To make the models generate more sensible outputs in response to user instructions, as well as to make them more truthful and less toxic, the authors opted for the inclusion of human feedback in the process of training the model. This technique, called Reinforcement Learning from Human Feedback (RLHF) is so interesting that we decided to devote a whole blog post to describing it in detail – feel free to read more about it here!

Figure 4 - Evolution from transformer architecture to ChatGPT

Figure 4: Evolution from transformer architecture to ChatGPT

The application of this technique has resulted in new iterations of the models, such as InstructGPT [12] and ChatGPT [13]. The latter was the subject of massive attention from the public, even outside of the AI world itself. ChatGPT created a stir in the media, mostly because of its availability and API that allows everyone to use it directly [14].

With just a couple of commands, ChatGPT can prove its ability to interact with a human by producing a well-tailored resume, playing a game of chess, or writing a part of compilable code. It also acts as an information distiller, providing a comprehensive yet concise summary of a given subject.

OpenAI recently enabled ChatGPT API access under the name gpt-3.5-turbo. It’s a GPT-3.5 model optimized for the chat that costs one-tenth the price of the best previously available model. More information on that can be found here.

Future perspectives

In spite of the fact that such developments are clearly ground-breaking, there seems to be a long way to go for it to become standard for general NLP purposes. Current studies prove that even though the model is impressive given its do-it-all ability, it is underperforming compared to existing state-of-the-art solutions for NLP tasks. For instance, in the recently published paper by J. Kocon et al., ChatGPT seems to be yielding worse results than the current best models in all of the 25 different NLP tasks that were tested in the publication [15]. Anyone who has used the model for a bit longer could notice its limitations, such as the fact that it lacks knowledge of recent events.

We are eager to observe further development in this area of AI. Ideas to make the model better and more versatile seem to be never-ending and the results are already looking very promising.

Bibliography

  1. Semi-supervised Sequence Learning, Andrew M. Dai, Quoc V. Le, 2015
  2. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, Jacob Devlin et al., 2018
  3. Improving Language Understanding by Generative Pre-Training, Alec Radford et al., 2018
  4. Language Models are Unsupervised Multitask Learners, Alec Radford et al., 2019
  5. Language Models are Few-Shot Learners, Tom B. Brown et al., 2020
  6. Attention Is All You Need, Ashish Vaswani et al., 2017
  7. LaMDA blogpost, Eli Collins, Zoubin Ghahramani, 2021
  8. Scaling Language Models: Methods, Analysis & Insights from Training Gopher, Jack W. Rae, 2022
  9. Transformer Models: an Introduction and Catalog, Xavier Amatriain, 2023
  10. https://twitter.com/karpathy/status/1582807367988654081
  11. GPT in 60 Lines of NumPy, Jay Mody, 2023
  12. Training language models to follow instructions with human feedback, Long Ouyang et al., 2022
  13. https://openai.com/blog/chatgpt/
  14. https://chat.openai.com/chat
  15. ChatGPT: Jack of all trades, master of none, Jan Kocon et al., 2023
https://deepsense.ai/wp-content/uploads/2023/03/ChatGPT-–-what-is-the-buzz-all-about.jpeg 568 1920 Eryk Mazuś https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg Eryk Mazuś2023-03-10 10:00:182023-03-10 15:31:11ChatGPT – what is the buzz all about?
How to leverage ChatGPT to boost marketing strategy

How to leverage ChatGPT to boost marketing strategy?

February 26, 2023/in Generative models /by Ewa Szkudlarek

The revolution in marketing is happening before our very eyes. The latest developments in the area of generative models mark a milestone where artificial intelligence and human expertise have come together like never before, and the use of AI in marketing is no longer just a buzzword. With ChatGPT and other large language models, marketers will be able to harness the power of AI in an easy way.

Since its launch in November 2022, the potential of using ChatGPT in business has been widely discussed. Marketing seems like the perfect area to test this technology, as the number of use cases is a kind of low-hanging fruit. In this article, we will explore the most effective areas to leverage ChatGPT in marketing strategy that can quickly bring noticeable business value.

More human-like chatbots
The first association for most marketers is the use of ChatGPT to create chatbots that will more naturally interact with customers in real time. There are many possibilities, and here we are not only talking about answering simple questions, but about complex conversations, at the level of a virtual assistant. Certainly, such solutions will also help to examine the individual needs of customers and precisely match the offer to their requirements, which will ensure a more personalized experience.

Large language model technology offers the possibility of deep advanced interaction, which can support a competitive advantage. It will not only broaden the use of intelligent chatbots in customer service, but could also be a great opportunity for a completely new approach to, for example, the range of medical or educational services offered.

Hyper-personalized customer service
Over the years, marketing has become more and more data-driven, but solutions based on large language models allow marketers to fully maximize the potential of data. By analyzing customers’ needs, behavior, and interactions with the brand, the company has a chance to fully respond to their interests. This can help increase customer loyalty and drive revenue growth.

ChatGPT and other large language models also support customer service automation and improve response times – they enable the customer to receive a response and an offer at a chosen time. Customers get the information they need quickly and efficiently, without having to wait for a human agent.

Winning content creation
Marketers can use ChatGPT to generate engaging content ideas, blog post outlines and up-to-date insights that are relevant to their brand. This can help streamline the content creation process and ensure consistency in messaging and brand voice. By inputting a few keywords related to their industry or niche, ChatGPT can provide a list of topics that can be used to create blog posts, social media updates, and other forms of content. This can help to establish the brand as a thought leader in their field and provide content that is perfectly suited to specific marketing objectives and target audiences.

Almost real-time optimization
The American merchant John Wanamaker, a pioneer of advertising, used to say that “half the money I spend on advertising is wasted, the problem is I don’t know which half” 😉 A thorough analysis of advertising expenditure is the key to the effectiveness of marketing strategy, and since Wanamaker’s time, a lot has changed. The possibilities of large language models broaden the approach to optimizing advertising. ChatGPT can help revise marketing campaigns by analyzing performance data, reviewing customer sentiment, and providing insights on areas for improvement. This can notably increase conversion rates, reduce customer acquisition costs, and improve overall ROI.

Market research
In a rapidly changing business reality, converting ChatGPT capabilities into the most useful business use cases can determine a competitive advantage. By using language models, marketers can easily analyze large amounts of text data to extract valuable insights about customers, competitors, and market trends. This information can be used to develop marketing strategy and tactics in areas such as product positioning, messaging, and channel selection.

On an everyday basis, ChatGPT can also help market researchers to gain a deeper understanding of customer feedback and social media conversations, which can provide proof of the actual image of the brand in the eyes of customers.

Sounds promising, but where to start?
The dynamic development of artificial intelligence requires marketers with a deep understanding of new technologies in order to be able to capture new opportunities to win customers’ attention. This sounds like simply stating the obvious! However, in practice, it turns out that the lack of technological know-how excludes many marketers from using AI. In order not to lag behind, it is worth supporting the marketing expertise with the knowledge of a technology partner such as deepsense.ai.

At deepsense.ai, the overriding goal of cooperation with clients is not to deliver the technology solution itself, but above all to provide the client with real business value. That’s why most of our projects start with ideation sessions and discovery workshops, where we introduce our clients to the possibilities of ChatGPT and other large language models and jointly analyze the most attractive use cases. Then deepsense.ai’s teams of AI engineers will perform end-to-end deployment and customization of selected AI solutions and put them on the fast track to delivering value. Close cooperation with deepsense.ai allows the marketing departments to maximize the potential of state-of-the-art technologies and focus on the industry-related aspects of building a competitive advantage.

https://deepsense.ai/wp-content/uploads/2023/02/How-to-leverage-ChatGPT-to-boost-marketing-strategy.jpeg 337 1140 Ewa Szkudlarek https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg Ewa Szkudlarek2023-02-26 21:41:222023-02-26 21:47:07How to leverage ChatGPT to boost marketing strategy?
How can we improve language models using reinforcement learning? ChatGPT case study

How can we improve language models using reinforcement learning? ChatGPT case study

February 20, 2023/in Generative models /by Kinga Prusinkiewicz

ChatGPT is a cutting-edge natural language processing model released in November 2022 by OpenAI. It is a variant of the GPT-3 model, specifically designed for chatbot and conversational AI applications. On the rising tide of ChatGPT, there are plenty of amazing examples of the chatbot’s accomplishments, one of which is presented in Figure 1.

Figure 1: Example usage of ChatGPT to analyze worst-case time complexity of bubble sorting in the specified style

Figure 1: Example usage of ChatGPT to analyze worst-case time complexity of bubble sorting in the specified style. Source: https://twitter.com/goodside/status/1598129631609380864

Introduction of GPT models

Let’s start with a short introduction of what GPT models are (what the GPT model family is). This acronym is used when referring to a series (GPT, GPT-2 and GPT-3, with the next generations expected soon). It has been trained to process and generate human-like language, and it has achieved impressive results in various language tasks such as translation, summarization, and question answering. GPT has been trained on a massive dataset of text, and it uses this training data to learn patterns and relationships in language. This allows it to understand and generate language in a way that is similar to how humans do. GPT is a powerful tool for developers looking to create articles, poetry, stories, news, reports and dialogue. It can be fine-tuned for specific tasks or domains, allowing it to become even more effective at handling specific types of language tasks.

What makes ChatGPT different from classic GPT models is its incorporation of human feedback during training using reinforcement learning. In this post we will dive into the details of RLHF (Reinforcement Learning from Human Feedback) and how we can use it to fine-tune language models. It is worth noting that the idea was previously used by the OpenAI team in InstructGPT – a sibling model which was trained to follow an instruction in a prompt and provide a detailed response.

What is reinforcement learning?

Reinforcement learning is a machine learning area which aims to train models to make a sequence of decisions. The agent learns by interacting with the (usually complex) environment. Each action is chained with a reward (or penalty). The aim of the model is to learn which actions will maximize the total reward.

The typical reinforcement learning setup consists of a tuple of five elements:

  • States Space (\(S\)) – a set of possible states that an agent can visit.
  • Action Space (\(A\)) – a set of possible actions that an agent may take.
  • State Transition Probability (\(P\)) – describes the dynamics of the environment. It is also called the world model. For model-free reinforcement learning, it is not necessary to know the state transition probability.
  • Reward Function (\(R\)) – a reward (penalty) that an agent receives for a selected action made in a specific state.
  • Discount Factor (\(\gamma\)) – defines the present value of future rewards.

A reinforcement learning agent learns policy (\(\gamma\)), which defines the action that should be taken in the current state.

RL has a wide range of applications, including control systems and robotics. It is particularly useful for tasks that involve sequential decision-making or learning from experience, such as playing Go or Atari games.

Reinforcement learning from human feedback

The history of incorporating human feedback into reinforcement learning is very long. There have been plenty of ideas on how we can integrate human-based samples into the agent training process, for example by adjusting the algorithm itself or with reward shaping. We would like to focus a little bit more on the approach presented in “Deep Reinforcement Learning from Human Preferences” published by DeepMind in 2017.

A typical reinforcement learning training loop involves an agent who interacts with the environment and changes states. Each interaction is connected to a reward. The whole process is presented in Figure 2. The reward function has a huge impact on agent performance. If poorly designed, it results in poor agent performance as well. In the paper, the authors propose to learn the reward function from human feedback, while the agent is still training the same way as in the classical reinforcement learning task.

Figure 2: Classic reinforcement learning training loop. Source: own elaboration

Figure 2: Classic reinforcement learning training loop. Source: own elaboration.

The best way to train the agents is by going through the example provided by the authors in the video.

Figure 3: Experiment video screenshot. The human coordinator selected the left agent, as its behavior is more similar to a backflip, which was the goal

Figure 3: Experiment video screenshot. The human coordinator selected the left agent, as its behavior is more similar to a backflip, which was the goal. Source: https://www.youtube.com/watch?v=oC7Cw3fu3gU

The task was to teach the agent how to do a backflip. Two trajectories delivered by current policy were shown to a human, who decided which one did the better backflip (or at least made a better attempt at doing a backflip). Based on preference, the reward estimator is updated to grant the favored agent behavior with a higher reward. Then the agent is trained in the classical manner of reinforcement learning. Our training loop was enriched with one additional step. The new loop is presented in Figure 4.

Figure 4: Reinforcement learning from human feedback training loop. Source: own elaboration

Figure 4: Reinforcement learning from human feedback training loop. Source: own elaboration.

To sum up, the new process consists of three steps:

  1. Generating a set of trajectories \(\{\tau^{1}, …, \tau^{n}\}\), with learned policy. The parameters of the policy are learned via traditional reinforcement learning to maximize total reward. Policy can be learned using any suitable reinforcement learning algorithm.
  2. Selecting two segments \((\sigma^{1}, \sigma^{2})\) from the generated trajectories and letting the human compare them and rank which one did better. Human judgments are stored as a tuple \((\sigma^{1}, \sigma^{2}, \mu)\), where \(\mu\) is the distribution of which segment was preferred.
  3. Train the reward predictor using supervised learning techniques. To estimate the reward predictor, we should find a way to express the preferred strategy, which can be achieved via the Bradley-Terry model. The simplest example of how this model works is a situation in which we would like to rank football teams in a competition. As the number of matches played might not be even for all teams, we can introduce a model that compares the “strength” of teams to achieve the probability of one team beating another. We can introduce the same thing for trajectories:

$$
\widehat{P}[\sigma^{1} \succ \sigma^{2}] = \frac{\exp(\Sigma\widehat{r}(\sigma^{1}_{t}, a^{1}_{t}))}{\exp(\Sigma\widehat{r}(\sigma^{1}_{t}, a^{1}_{t})) + \exp(\Sigma\widehat{r}(\sigma^{2}_{t}, a^{2}_{t}))}
$$

Therefore we can write the loss function as:

$$
loss(\widehat{r}) = \Sigma_{(\sigma^{1}, \sigma^{2}, \mu)} \mu(1)\widehat{P}[\sigma^{1} \succ \sigma^2] + \mu(2)\widehat{P}[\sigma^{2} \succ \sigma^1]
$$

Now, as we are equipped with reinforcement learning from human feedback knowledge, we can take a deep dive into the ChatGPT example.

ChatGPT/Instruct GPT cases

ChatGPT and InstructGPT use reinforcement learning from human feedback in the model fine-tuning phase. We can split it into the three stages presented in Figure 5.

Figure 5: ChatGPT fine-tuning steps

Figure 5: ChatGPT fine-tuning steps. Source: https://openai.com/blog/chatgpt/

Step 1

The first step involves fine-tuning GPT-3.5 using data delivered by humans playing the role of assistant and user. The trainers had access to model-written suggestions to help with composing responses. These dialogues were mixed with the InstructGPT dataset, which contains prompts and instructions written by users of earlier versions of InstructGPT submitted through Playground. Regarding InstructGPT, the data collection step is limited to obtaining and using the InstructGPT dataset and fine-tuning the GPT-3 model. This step is summarized in Figure 6.

Figure 6: Language model pretraining

Figure 6: Language model pretraining. Source: https://huggingface.co/blog/rlhf

The next steps remain the same for both ChatGPT and InstructGPT.

Step 2

The second step is focused on the training reward model. The language model from the first step is used to prepare samples of responses that are compared and ranked by humans to express their preferences. According to the InstructGPT paper, a labeler receives between 4 and 9 responses to rank. It means, that there is \(\binom{K}{2}\) comparisons, where K is the number of responses to compare. Each set of comparisons is fed to a neural network to learn how to evaluate generated responses in terms of human preferences.

Figure 7: Reward model training

Figure 7: Reward model training. Source: https://huggingface.co/blog/rlhf

Step 3

The last step utilizes the prepared elements in one reinforcement learning task to fine-tune the language model. Let’s formulate the task to fit reinforcement learning language:

  • The agent is represented by a language model.
  • State space is the possible input token sequences.
  • The action space is all the tokens corresponding to the vocabulary of the language model.
  • The reward from the environment is delivered by the reward predictor trained in step 2.

The algorithm used in ChatGPT is PPO, which is short for Proximal Policy Optimization – a state-of-the-art technique in the Reinforcement Learning area. Kullbach-Leibler divergence is added to PPO loss between the initial model and current policy distributions to prevent one from moving substantially away from the initial model.

Figure 8: Fine-tuning with Reinforcement Learning

Figure 8: Fine-tuning with Reinforcement Learning. Source: https://huggingface.co/blog/rlhf

Summary

Using human feedback as the reward signal has several advantages. It allows the model to learn from real-world human preferences and expectations, making it more likely to generate responses that are natural and human-like. It also allows the model to learn more quickly and efficiently, since it can use the feedback it receives to fine-tune its output and avoid making the same mistakes in the future.

However, there are also some limitations to this approach. The feedback may be subjective and prone to bias, which could affect the model’s learning process. Additionally, it can be time-consuming and resource-intensive to collect and process large amounts of human feedback, especially if the model is generating a large number of responses.

Bibliography

  • “Deep reinforcement learning from human preferences” Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, Dario Amodei, https://arxiv.org/abs/1706.03741
  • https://openai.com/blog/chatgpt/
  • https://openai.com/blog/instruction-following/
  • https://huggingface.co/blog/rlhf
  • https://openai.com/research/learning-from-human-preferences
  • https://wandb.ai/ayush-thakur/RLHF/reports/Understanding-Reinforcement-Learning-from-Human-Feedback-RLHF-Part-1–VmlldzoyODk5MTIx
https://deepsense.ai/wp-content/uploads/2023/02/How-can-we-improve-language-models-using-reinforcement-learning-ChatGPT-case-study.jpeg 337 1140 Kinga Prusinkiewicz https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg Kinga Prusinkiewicz2023-02-20 07:00:452023-03-13 16:49:54How can we improve language models using reinforcement learning? ChatGPT case study
The recent rise of diffusion-based models

The recent rise of diffusion-based models

September 5, 2022/in Generative models /by Maciej Domagała

Every fan of generative modeling has been living an absolute dream for the last year and a half (at least!). The past few months have brought several developments and papers on text-to-image generation, each one arguably better than the last. We have observed a social media surge of spectacular, purely AI-generated images, such as this golden retriever answering tough questions on the campaign trail or a brain riding a rocketship to the moon.

The recent rise of diffusion-based models - Introduction

Sources: https://openai.com/dall-e-2/ and https://imagen.research.google/

In this post, we will sum up the very recent history of solving the text-to-image generation problem and explain the latest developments regarding diffusion models, which are playing a huge role in the new, state-of-the-art architectures.

The recent rise of diffusion-based models - Short timeline of image generation and text-to-image solutions

A short timeline of image generation and text-to-image solutions.

It all starts with DALL·E

In 2020 the OpenAl team [1] published the GPT-3 model – a multimodal do-it-all huge language model, capable of machine translation, text generation, semantic analysis etc. The model swiftly became regarded as the state-of-the-art for language modeling solutions, and DALL·E [7] can be viewed as a natural expansion of the transformer capabilities into the computer vision domain.

Autoregressive approach

The authors proposed an elegant two-stage approach:

  • train a discrete VAE model to compress images into image tokens,
  • concatenate the encoded text snippet with the image tokens and train the autoregressive transformer to learn the joint distribution over text and images.

The final version was trained on 250 million text-image pairs obtained from the Internet.

CLIP

During inference, the model is able to output a whole batch of generated images. But how can we estimate which images are best? Simultaneously with the publication of DALL·E, the OpenAI team presented a solution for image and text linking called CLIP [9]. In a nutshell, CLIP offers a reliable way of pairing a text snippet with its image representation. Putting aside all of the technical aspects, the idea of training this type of model is fairly simple – take the text snippet and encode it, take an image and encode it. Do that for a lot of examples (400 million (image, text) pairs) and train the model in a contrastive fashion.

The recent rise of diffusion-based models - Visualisation of CLIP contrastive pre-training

Visualisation of CLIP contrastive pre-training, source: [9]

This kind of mapping allows us to estimate which of the generated images are the best match considering the text input. For anyone who would like to see the power of CLIP – feel free to check out my previous post on combining CLIP and evolutionary algorithms to generate images [deepsense.ai’s blogpost].

DALL·E attracted major attention from people both inside and outside the AI world; it gained lots of publicity and stirred a great deal of conversation. Even so, it only gets an honorable mention here, as the trends shifted quite quickly towards novel ideas.

All you need is diffusion

Sohl-Dickstein et al. [2] proposed a fresh idea on the subject of image generation – diffusion models.

The recent rise of diffusion-based models - Generative models

Generative models, source: [13]

The idea is inspired by non-equilibrium thermodynamics, although underneath it is packed with some interesting mathematical concepts. We can notice the already known concept of encoder-decoder structure here, but the underlying idea is a bit different than what we can observe in traditional variational autoencoders. To understand the basics of this model, we need to describe forward and reverse diffusion processes.

Forward image diffusion

This process can be described as gradually applying Gaussian noise to the image until it becomes entirely unrecognizable. This process is fixed in a stochastic sense – the noise application procedure can be formulated as the Markov chain of sequential diffusion steps. To untangle the difficult wording a little bit, we can neatly describe it with a few formulas. Assume that images have a certain starting distribution \(q\left(\bf{x}_{0}\right)\). We can sample just one image from this distribution – \(\bf{x}_{0}\). We want to perform a chain of diffusion steps \(\bf{x}_{0} \rightarrow \bf{x}_{1} \rightarrow … \rightarrow \bf{x}_{\it{T}}\), each step disintegrating the image more and more.

How exactly is the noise applied? It is formally defined by a noising schedule \(\{\beta_{t}\}^{T}_{t=1}\), where for every \(t = 1,…,T\) we have \(\beta_{t} \in (0,1)\). With such a schedule we can formally define the forward process as

$$
q\left(\mathbf{x}_{t} \mid \mathbf{x}_{t-1}\right)=\mathcal{N}\left(\sqrt{1-\beta_{t}} \mathbf{x}_{t-1}, \beta_{t} \mathbf{I}\right)
$$

There are just two more things worth mentioning:

  • As the number of noising steps increases \((T \to \infty)\), the final distribution \(q(\mathbf{x}_{T})\) approaches a very handy isotropic Gaussian distribution. This makes any future sampling from noised distribution efficient and easy.
  • Noising with a Gaussian kernel provides another benefit – there is no need to go step-by-step through the noising process to achieve any intermediate latent state. We can sample any latent state directly thanks to reparametrization$$
    q\left(\mathbf{x}_{t} \mid \mathbf{x}_{0}\right)=\mathcal{N}\left(\sqrt{\bar{\alpha}_{t}} \mathbf{x}_{0},\left(1-\bar{\alpha}_{t}\right) \mathbf{I}\right) = \sqrt{\bar{\alpha}_{t}} \mathbf{x}_{0}+\sqrt{1-\bar{\alpha}_{t}} \cdot \epsilon,
    $$where \(\alpha_{t} := 1-\beta_{t}\), \(\bar{\alpha}_{t} := \prod_{k=0}^{t}\alpha_{k}\) and \(\epsilon \sim \mathcal{N}(0, \mathbf{I})\). Here \(\epsilon\) represents Gaussian noise – this formulation will be essential for training.

Reverse image diffusion

We have a nicely defined forward process. One might ask – so what? Why can’t we just define a reverse process \(q\left(\mathbf{x}_{t-1} \mid \mathbf{x}_{t}\right)\) and trace back from the noise to the image? First of all, that would fail conceptually, as we want to have a neural network that learns how to deal with a problem – we shouldn’t provide it with a clear solution. And second of all, we cannot quite do that, as it would require marginalization over the entire data distribution. To get back to the starting distribution \(q(\bf{x}_{0})\) from the noised sample we would have to marginalize over all of the ways we could arise at \(\mathbf{x}_{0}\) from the noise, including all of the latent states. That means calculating \(\int q(\mathbf{x}_{0:T})d\mathbf{x}_{1:T}\), which is intractable. So, if we cannot calculate it, surely we can… approximate it!

The core idea is to develop a reliable solution – in the form of a learnable network – that successfully approximates the reverse diffusion process. The first way to achieve that is by estimating the mean and covariance for denoising steps

$$
p_{\theta}\left(\mathbf{x}_{t-1} \mid \mathbf{x}_{t}\right)=\mathcal{N}(\mu_{\theta}(\mathbf{x}_{t}, t), \Sigma_{\theta}(\mathbf{x}_{t}, t) ).
$$

In a practical sense, \(\mu_{\theta}(\mathbf{x}_{t}, t)\) can be estimated via the neural network and \(\Sigma_{\theta}(\mathbf{x}_{t}, t)\) can be fixed to a certain constant related to the noising schedule, such as \(\beta_{t}\mathbf{I}\).

The recent rise of diffusion-based models - Forward and reverse diffusion processes

Forward and reverse diffusion processes, source: [14]

Estimating \(\mu_{\theta}(\mathbf{x}_{t}, t)\) this way is possible, but Ho et al. [3] came up with a different way of training – a neural network \(\epsilon_{\theta}(\mathbf{x}_{t}, t)\) can be trained to predict the noise \(\epsilon\) from the earlier formulation of \(q\left(\mathbf{x}_{t} \mid \mathbf{x}_{0}\right)\).

As in Ho et al. [3], the training process consists of the following steps:

  1. Sample image \(\mathbf{x}_{0}\sim q(\bf{x}_{0})\),
  2. Choose a certain step in the diffusion process \(t \sim U(\{1,2,…,T\})\),
  3. Apply the noising \(\epsilon \sim \mathcal{N}(0,\mathbf{I})\),
  4. Try to estimate the noise \(\epsilon_{\theta}(\mathbf{x}_{t}, t)= \epsilon_{\theta}(\sqrt{\bar{\alpha}_{t}} \mathbf{x}_{0}+\sqrt{1-\bar{\alpha}_{t}} \cdot \epsilon, t)\),
  5. Learn the network by gradient descent on loss \(\nabla_{\theta} \|\epsilon – \epsilon_{\theta}(\mathbf{x}_{t}, t)\|^{2}\).

In general, loss can be nicely presented as

$$
L_{\text{diffusion}}=\mathbb{E}_{t, \mathbf{x}_{0}, \epsilon}\left[\left\|\epsilon-\epsilon_{\theta}\left(\mathbf{x}_{t}, t\right)\right\|^{2}\right],
$$

where \(t, \mathbf{x}_0\) and \(\epsilon\) are described as in the steps above.

All of the formulations, reparametrizations and derivations are a bit math-extensive, but there are already some great resources available for anyone that wants to have a deeper understanding of the subject. Most notably, Lillian Weng [13], Angus Turner [14] and Ayan Das [15] went through some deep derivations while maintaining an understandable tone – I highly recommend checking these posts.

Guiding the diffusion

The above part itself explains how we can perceive the diffusion model as generative. Once the model \(\epsilon_{\theta}(\mathbf{x}_{t}, t)\) is trained, we can use it to run the noise \(\mathbf{x}_{t}\) back to \(\mathbf{x}_{0}\). Given that it is straightforward to sample the noise from isotropic Gaussian distribution, we can obtain limitless image variations. We can also guide the image generation by feeding additional information to the network during the training process. Assuming that the images are labeled, the information about class \(y\) can be fed into a class-conditional diffusion model \(\epsilon_{\theta}(\mathbf{x}_{t}, t \mid y)\).
and one way of introducing the guidance in the training process is to train a separate model, which acts as a classifier of noisy images. At each step of denoising, the classifier checks whether the image is denoised in the right direction and contributes its own gradient of loss function into the overall loss of diffusion model.

Ho & Salimans [5] proposed an idea on how to feed the class information into the model without the need to train an additional classifier. During the training the model \(\epsilon_{\theta}(\mathbf{x}_{t}, t \mid y)\) is sometimes (with fixed probability) ne n shown the actual class \(y\). Instead, the class label is replaced with the null label \(\emptyset\). So it learns to perform diffusion with and without the guidance. For inference, the model performs two predictions, once given the class label \(\epsilon_{\theta}(\mathbf{x}_{t}, t \mid y)\) and once not \(\epsilon_{\theta}(\mathbf{x}_{t}, t \mid \emptyset)\). The final prediction of the model is moved away from \(\epsilon_{\theta}(\mathbf{x}_{t}, t \mid \emptyset)\) and towards \(\epsilon_{\theta}(\mathbf{x}_{t}, t \mid y)\) by scaling with guidance scale \(s \geq 1\).

$$
\hat{\epsilon}_{\theta}\left(\mathbf{x}_{t}, t \mid y\right)=\epsilon_{\theta}\left(\mathbf{x}_{t}, t \mid \emptyset\right)+s \cdot\left(\epsilon_{\theta}\left(\mathbf{x}_{t}, t \mid y\right)-\epsilon_{\theta}\left(\mathbf{x}_{t}, t \mid \emptyset\right)\right)
$$

This kind of classifier-free guidance uses only the main model’s comprehension – an additional classifier is not needed – which yields better results according to Nichol et al. [6].

Text-guided diffusion with GLIDE

Even though the paper describing GLIDE [6] architecture received the least publicity out of all the publications discussed in this post, it arguably presents the most novel and interesting ideas. It combines all of the concepts presented in the previous chapter nicely. We already know how diffusion models work and that we can use them to generate images. The two questions we would now like to answer are:

  • How can we use the textual information to guide the diffusion model?
  • How can we make sure that the quality of the model is good enough?

Architecture choice

Architecture can be boiled down to three main components:

  1. A UNet-based model responsible for the visual part of the diffusion learning,
  2. A transformer-based model responsible for creating text embedding from a snippet of text,
  3. An upsampling diffusion model is used for enhancing output image resolution.

The first two work together in order to create a text-guided image output, while the last one is used to enlarge the image while preserving the quality.

The core of the model is the well-known UNet architecture, used for the diffusion in Dhariwal & Nichol [8]. The model, just like in its early versions, stacks residual layers with downsampling and upsampling convolutions. It also consists of attention layers which are crucial for simultaneous text processing. The model proposed by the authors has around 2.3 billion parameters and was trained on the same dataset as DALL·E.

The text used for guidance is encoded in tokens and fed into the Transformer model. The model used in GLIDE had roughly 1.2 billion parameters and was built from 24 residual blocks of width 2048. The output of the transformer has two purposes:

  • the final embedding token is used as class embedding \(y\) in \(\epsilon_{\theta}(\mathbf{x}_{t}, t \mid y)\),
  • the final layer of token embeddings is added to every attention layer of the model.

It is clear that a great deal of focus was put into making sure that the model receives enough text-related context in order to generate accurate images. The model is conditioned on the text snippet embedding, the encoded text is concatenated with the attention context and during training, the classifier-free guidance is used.

As for the final component, the authors used the diffusion model to go from a low-resolution to a high-resolution image using an ImageNet upsampler.

The recent rise of diffusion-based models - GLIDE interpretation of ‘a corgi in a field’

GLIDE interpretation of ‘a corgi in a field’, source: [6]

GLIDE incorporates a few notable achievements developed in recent years and sheds new light on the concept of text-guided image generation. Given that the DALL·E model was based on different structures, it is fair to say that the publication of GLIDE represents the dawn of the diffusion-based text-to-image generation era.

The next version – DALL·E 2

The OpenAI team doesn’t seem to get much rest, as in April they took the Internet by storm with DALL·E 2 [7]. It takes elements from both predecessors: it relies heavily on CLIP [9] but a large part of the solution revolves around GLIDE [6] architecture. DALL·E 2 has two main underlying components called the prior and the decoder, which are able to produce image output when stacked together. The entire mechanism was named unCLIP, which may already spoil the mystery of what exactly is going on under the hood.

The recent rise of diffusion-based models - Visualization of DALL-E 2 two-stage mechanism

Visualization of DALL·E 2 two-stage mechanism. Source: [7]

The prior

The first stage is meant to convert the caption – a text snippet such as a “corgi playing a flame-throwing trumpet” – into text embedding. We obtain it using a frozen CLIP model.

After text embedding comes the fun part – we now want to obtain an image embedding, similar to the one which is obtained via the CLIP model. We want it to encapsulate all important information from the text embedding, as it will be used for image generation through diffusion. Well, isn’t that exactly what CLIP is for? If we want to find a respective image embedding for our input phrase, we can just look at what is close to our text embedding in the CLIP encoded space. One of the authors of DALL·E 2 [Aditya Ramesh, 2022] posted a nice explanation of why that solution fails and why the prior is needed – “An infinite number of images could be consistent with a given caption, so the outputs of the two encoders will not perfectly coincide.
Hence, a separate prior model is needed to “translate” the text embedding into an image embedding that could plausibly match it”.

On top of that, the authors empirically checked the importance of the prior in the network. Passing both the image embedding produced by the prior and the text vastly outperforms generation using only the caption or caption with CLIP text embedding.

The recent rise of diffusion-based models - Samples generated conditioned on- caption, text embedding and image embedding

Samples generated conditioned on: caption, text embedding, and image embedding. Source: https://arxiv.org/pdf/2204.06125.pdf

The authors tested two model classes for the prior: the autoregressive model and the diffusion model. This post will cover only the diffusion prior, as it was deemed better performing than autoregressive, especially from a computational point of view. For the training of prior, a decoder-only Transformer model was chosen. It was trained by using a sequence of several inputs:

  • encoded text,
  • CLIP text embedding,
  • embedding for the diffusion timestep,
  • noised image embedding,

with the goal of outputting an unnoised image embedding \(z_{i}\). As opposed to the way of training proposed by Ho et al. [7] covered in previous sections, predicting the unnoised image embedding directly instead of predicting the noise was a better fit. So, remembering the previous formula for diffusion loss in a guided model

$$
L_{\text{diffusion}}=\mathbb{E}_{t, \mathbf{x}_{0}, \epsilon}\left[\left\|\epsilon-\epsilon_{\theta}\left(\mathbf{x}_{t}, t\mid y\right)\right\|^{2}\right],
$$

we can present the prior diffusion loss as

$$
L_{\text{prior:diffusion}}=\mathbb{E}_{t}\left[\left\|z_{i}-f_{\theta}\left({z}_{i}^{t}, t \mid y\right)\right\|^{2}\right],
$$

where \(f_{\theta}\) stands for the prior model, \({z}_{i}^{t}\) is the noised image embedding, \(t\) is the timestamp and \(y\) is the caption used for guidance.

The decoder

We covered the prior part of the unCLIP, which was meant to produce a model that is able to encapsulate all of the important information from the text into a CLIP-like image embedding. Now we want to use that image embedding to generate an actual visual output. This is when the name unCLIP unfolds itself – we are walking back from the image embedding to the image, the reverse of what happens when the CLIP image encoder is trained.

As the saying goes: “After one diffusion model it is time for another diffusion model!”. And this one we already know – it is GLIDE, although slightly modified. Only slightly, since the single major change is adding the additional CLIP image embedding (produced by the prior) to the vanilla GLIDE text encoder. After all, this is exactly what the prior was trained for – to provide information for the decoder. Guidance is used just as in regular GLIDE. To improve it, CLIP embeddings are set to \(\emptyset\) in 10% of cases and text captions \(y\) in 50% of cases.

Another thing that did not change is the idea of upsampling after the image generation. The output is tossed into additional diffusion-based models. This time two upsampling models are used (instead of one in the original GLIDE), one taking the image from 64×64 to 256×256 and the other further enhancing resolution up to 1024×1024.

Imagen that we can do it better

The Google Brain team decided not to be late to the party, as less than two months after the publication of DALL·E 2 they presented the fruits of their own labor – Imagen (Saharia et al. [7].

The recent rise of diffusion-based models - Overview of Imagen architecture

Overview of Imagen architecture. Source: [7]

Imagen architecture seems to be oddly simple in its structure. A pretrained textual model is used to create the embeddings that are diffused into an image. Next, the resolution is increased via super-resolution diffusion models – the steps we already know from DALL·E 2. A lot of novelties are scattered in different bits of the architecture – a few in the model itself and several in the training process. Together, they offer a slight upgrade when compared to other solutions. Given the large portion of knowledge already served, we can explain this model via differences with previously described models:

Use a pretrained transformer instead of training it from scratch. This is viewed as the core improvement compared to OpenAI’s work. For everything regarding text embeddings, the GLIDE authors used a new, specifically trained transformer model.
The Imagen authors used a pretrained, frozen T5-XXL model [4]. The idea is that this model has vastly more context regarding language processing than a model trained only on the image captions, and so is able to produce more valuable embeddings without the need to additionally fine-tune it.

Make the underlying neural network more efficient. An upgraded version of the neural network called Efficient U-net was used as the backbone of super-resolution diffusion models. It is said to be more memory-efficient and simpler than the previous version, and it converges faster1 as well. The changes were introduced mainly in residual blocks and via additional scaling of the values inside the network. For anyone who enjoys digging deep into the details – the changes are well documented in Saharia et al. [7].

Use conditioning augmentation to enhance image fidelity. Since the solution can be viewed as a sequence of diffusion models, there is an argument to be made about enhancements in the areas where the models are linked. Ho et al. [10] presented a solution called conditioning augmentation. In simple terms, it is equivalent to applying various data augmentation techniques, such as a Gaussian blur, to a low-resolution image before it is fed into the super-resolution models.

There are a few other resources deemed crucial to a low FID score and high image fidelity (such as dynamic thresholding) – these are explained in detail in the source paper [7]. The core of the approach is already covered in previous chapters.

The recent rise of diffusion-based models - Some of Imagen generations with captions

Some Imagen generations with captions. Source: [7]

Is it the best yet?

As of writing this text, Google’s Imagen is considered to be state-of-the-art as far as text-to-image generation is concerned. But why exactly is that? How can we evaluate the models and compare them to each other?

The authors of Imagen opted for two means of evaluation. One is considered to be the current standard for text-to-image modeling, namely establishing a Fréchet inception distance score on a COCO validation dataset. The authors report (unsurprisingly) that Imagen shows a state-of-the-art performance, its zero-shot FID outperforming all other models, even those specifically trained on COCO.

The recent rise of diffusion-based models - Comparison of several models

Comparison of several models. Source: https://arxiv.org/pdf/2205.11487.pdf

A far more intriguing means of evaluation is a brand new proposal from the authors called DrawBench – a comprehensive and challenging set of prompts that support the evaluation and comparison of text-to-image models (source). It consists of 200 prompts divided into 11 categories, collected from e.g. DALL·E or Reddit. A list of the prompts with categories can be found in [17]. The evaluation was performed by 275 unbiased (sic!) raters, 25 for each category.
Each rater was shown two non-cherry picked and random sets of images generated by two different models (e.g. Imagen and DALL·E 2) and had to respond to two questions:

  1. Which set of images is of higher quality?
  2. Which set of images better represents the text caption?

These two questions are meant to address the two most important characteristics of a good text-to-image model: the quality of the images produced (fidelity) and how well it reflects the input text prompt (alignment). Each rater had three choices – to claim that one of the models performs better, or to call it a tie. Once again, there can be only one winner. Interestingly, the GLIDE model seems to perform slightly better than DALL·E 2, at least based on this curated dataset.

The recent rise of diffusion-based models - Imagen vs other models

Source: [7]

As expected, a large portion of the publication is devoted to the comparison between the images produced by Imagen and GLIDE/DALL·E – more can be found in Appendix E of [7].

The fun is far from over

As usual, with new architecture gaining recognition there is a surge of interesting publications and solutions emerging from the void. The pace of developments makes it nearly impossible to track every interesting publication. There are also a lot of interesting characteristics of the models to discover other than raw generative power, such as image inpainting, style transfer, and image editing.

Apart from the understandable excitement over a new era of generative models, there are some shortcomings embedded into the diffusion process structure, such as slow sampling speed compared to previous models [16].

The recent rise of diffusion-based models - Models comparison

Models comparison. Source: [16]

For anyone who likes to go deep into the minutiae of implementation, I highly recommend going through Phil Wang’s (@lucidrains on github) repositories [20], which is a collaborative effort from many people to recreate the unpublished models in PyTorch.

For anyone who would like to admire some more examples of DALL·E 2’s generative power, I recommend checking the newly created subreddit with DALL·E 2 creations in [18]. It is moderated by people with OpenAI’s Lab access – feel free to join the waitlist [19] and have the opportunity to play with models yourself.

References

  1. Language Models are Few-Shot Learners Tom B. Brown et al. 2020
  2. Deep Unsupervised Learning using Nonequilibrium Thermodynamics Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli. 2015
  3. Denoising Diffusion Probabilistic Models Jonathan Ho, Ajay Jain, Pieter Abbeel. 2020
  4. How Much Knowledge Can You Pack Into the Parameters of a Language Model? Adam Roberts, Colin Raffel, Noam Shazeer. 2020
  5. Classifier-Free Diffusion Guidance Jonathan Ho, Tim Salimans. 2021
  6. GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models Alex Nichol et al. 2021
  7. Zero-Shot Text-to-Image Generation Aditya Ramesh et al. 2021
  8. Diffusion Models Beat GANs on Image Synthesis Prafulla Dhariwal, Alex Nichol. 2021
  9. Learning Transferable Visual Models From Natural Language Supervision Alec Radford et al. 2021
  10. Cascaded Diffusion Models for High Fidelity Image Generation Jonathan Ho et al. 2021
  11. Hierarchical Text-Conditional Image Generation with CLIP Latents Aditya Ramesh et al. 2022
  12. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding Chitwan Saharia et al. 2022
  13. What are Diffusion Models? Lilian Weng. 2021
  14. Diffusion Models as a kind of VAE Angus Turner. 2021
  15. An introduction to Diffusion Probabilistic Models Ayan Das. 2021
  16. Improving Diffusion Models as an Alternative To GANs, Part 1 Arash Vahdat, Karsten Kreis. 2022
  17. DrawBench prompts Google Brain team. 2022
  18. DALL·E 2 subreddit Reddit. 2022
  19. OpenAI’s waitilist OpenAI team. 2022
  20. Phil Wang’s repositories Phil Wang. 2022
https://deepsense.ai/wp-content/uploads/2022/07/The-recent-rise-of-diffusion-based-models.jpeg 337 1140 Maciej Domagała https://deepsense.ai/wp-content/uploads/2019/04/DS_logo_color.svg Maciej Domagała2022-09-05 08:00:592023-02-26 21:57:37The recent rise of diffusion-based models

Start your search here

Build your AI solution
with us!

Contact us!

NEWSLETTER SUBSCRIPTION

    You can modify your privacy settings and unsubscribe from our lists at any time (see our privacy policy).

    This site is protected by reCAPTCHA and the Google privacy policy and terms of service apply.

    CATEGORIES

    • Generative models
    • Elasticsearch
    • Computer vision
    • Artificial Intelligence
    • AIOps
    • Big data & Spark
    • Data science
    • Deep learning
    • Machine learning
    • Neptune
    • Reinforcement learning
    • Seahorse
    • Job offer
    • Popular posts
    • AI Monthly Digest
    • Press release

    POPULAR POSTS

    • ChatGPT – what is the buzz all about?ChatGPT – what is the buzz all about?March 10, 2023
    • How to leverage ChatGPT to boost marketing strategyHow to leverage ChatGPT to boost marketing strategy?February 26, 2023
    • How can we improve language models using reinforcement learning? ChatGPT case studyHow can we improve language models using reinforcement learning? ChatGPT case studyFebruary 20, 2023

    Would you like
    to learn more?

    Contact us!
    • deepsense.ai logo white
    • Services
    • Customized AI software
    • Team augmentation
    • AI advisory
    • Generative models
    • Knowledge base
    • deeptalks
    • Blog
    • R&D hub
    • deepsense.ai
    • Careers
    • Summer internship
    • Our story
    • Management
    • Advisory board
    • Press center
    • Support
    • Terms of service
    • Privacy policy
    • Code of ethics
    • Contact us
    • Join our community
    • facebook logo linkedin logo twitter logo
    • © deepsense.ai 2014-
    Scroll to top

    This site uses cookies. By continuing to browse the site, you are agreeing to our use of cookies.

    OKLearn more

    Cookie and Privacy Settings



    How we use cookies

    We may request cookies to be set on your device. We use cookies to let us know when you visit our websites, how you interact with us, to enrich your user experience, and to customize your relationship with our website.

    Click on the different category headings to find out more. You can also change some of your preferences. Note that blocking some types of cookies may impact your experience on our websites and the services we are able to offer.

    Essential Website Cookies

    These cookies are strictly necessary to provide you with services available through our website and to use some of its features.

    Because these cookies are strictly necessary to deliver the website, refuseing them will have impact how our site functions. You always can block or delete cookies by changing your browser settings and force blocking all cookies on this website. But this will always prompt you to accept/refuse cookies when revisiting our site.

    We fully respect if you want to refuse cookies but to avoid asking you again and again kindly allow us to store a cookie for that. You are free to opt out any time or opt in for other cookies to get a better experience. If you refuse cookies we will remove all set cookies in our domain.

    We provide you with a list of stored cookies on your computer in our domain so you can check what we stored. Due to security reasons we are not able to show or modify cookies from other domains. You can check these in your browser security settings.

    Other external services

    We also use different external services like Google Webfonts, Google Maps, and external Video providers. Since these providers may collect personal data like your IP address we allow you to block them here. Please be aware that this might heavily reduce the functionality and appearance of our site. Changes will take effect once you reload the page.

    Google Webfont Settings:

    Google Map Settings:

    Google reCaptcha Settings:

    Vimeo and Youtube video embeds:

    Privacy Policy

    You can read about our cookies and privacy settings in detail on our Privacy Policy Page.

    Accept settingsHide notification only