They learn a statistical distributions properties.
They can then be used to generate samples from that distribution.
Famous examples are style transfer and face generation.
The Basic Idea
A GAN is really a composition of two models: a generator and a discriminator.
The generator is trained to trick the discriminator.
The discriminator is trained to filter real data from fake.
The goals are opposed: adverserial.
Components
Suppose the data are generated from some distribution: \[ x \sim \text{Dist}(\alpha...) \]
The discriminator is a function mapping to a binary target: \[ D \rightarrow \{ 0, 1 \} \]
The generator \(G\) takes a latent vector \(z \sim U(0,1)^k\) and generates a fake data sample: \[G(z) = y \sim \text{Dist}_\text{Gen}(\alpha^\prime \ldots) \]
The training goal is to approximate the data generating distribution: \[ \text{Dist}_\text{Gen}(\alpha ^\prime \ldots) \approx \text{Dist}_\text{Data}(\alpha \ldots) \]
Construction
\(G\) and \(D\) are arbitrary probability transforms.
Neural networks are common choices: generalised functions.
Desirable to use data to inform network architecture.
Loss Functions
The discriminator is acting as a binary classifier.
The natural loss function is logitcrossentropy: \[ L = -E_{x\in\text{Dist}_\text{Data}} [\log(D(x))] - E_{y \in \text{Dist}_\text{Gen}}[\log(1-D(G(z))]\]
Discriminator Loss
The discriminator aims to minimise the above loss.
The generator is targeting a vector in the data distribution.
For labelled data \((x,y)\) this reduces to: \[ L_D(x, y) = -y\log(D(x)) - (1 - y) \log(1 - D(x))\]
Generator Loss
The generator is aiming to trick the discriminator.
Natural reward is to flip the classification loss.
It doesn’t need to know the data: this is implicitly learned. \[ L_G(z) = -\log(D(G(z))) \]
Training
Training proceeds in a two-step fashion.
First, batches are presented to the generator and G updated.
Second, batches are presented to discriminator and D updated.