Relativistic f-divergences

My latest paper is out! (

This will be my last paper done solo as I am now starting a PhD at the Montreal Institute for Learning Algorithms (MILA)! Do not worry, I will keep doing the same type of research at a similar pace (although it may slow down slightly in the first year). My thesis is titled Understanding, improving, and generalizing GANs.

This is mainly a theoretical paper with a few experimental findings. It provides a more mathematically grounded view of Relativistic GANs. However, it still leaves many questions unanswered (such as why RaGANs and better than RpGANs) and it brings new questions (such as why do better estimators lead to worse results?).

For most people, a 10 pages of proofs might be overwhelming 🙀. Thus, Appendix A (p11) provides a one-page high-level explanation of the proof of the main theorem. This is great for people who still wants to understand the math behind the theorem without going in extreme details.

The contributions of the paper are the following:

– I prove that the objective functions of the discriminator in RGANs are divergences (Relativistic f-divergences).
– I devise a few variants of Relativistic f-divergences.
– I show that the Wasserstein Distance is weaker than f-divergences which are weaker than relativistic f-divergences.
– I present the minimum-variance unbiased estimator (MVUE) of Relativistic paired GANs (RpGANs) and show that using it hinders the performance of the generator.
– I show that Relativistic average GANs (RaGANs) are only asymptotically unbiased, but that the finite-sample bias is small. Removing this bias does not improve the performance of the generator.


Btw, once I gain access to the GPUs at MILA, I might run some extra analyses on larger images for this paper. I couldn’t train my neural networks as long as I did in my previous paper (20h+) because of the noise it creates. My girlfriend needs silence to be able to compose her music. Look her up if you like electronic/dubstep music: