Alternative losses for Relativistic GANs

Further investigation needs to be done, but I suspect some variants of Relativistic average GANs (RaGANs) might be more sensible than the ones I proposed in my paper. If you are using Relativistic GANs, you might be interested in trying out also variant 3 which is the most promising.

For simplicity, let’s assume we use the non-saturating loss and that we have symmetry, i.e., f1(-y)=f2(y). (This is true in HingeGAN, LSGAN with -1/1 labels, Standard GAN with sigmoid activation).

1) This is the RaGAN formula proposed in the paper.

2) This variant works as well as the original RaGAN. I know this because I used it by mistake before and it made no difference in the results. The generator loss doesn’t make much sense, but as discussed in GANs beyond divergence minimization, the generator can minimize pretty much anything related to the divergence estimated (the loss function of the discriminator) and it will likely still work. GANs don’t actually minimize the divergence.

3) This variant is the most promising, but I did not have the time to test it. It follows the same divergence as the one above since it uses the same loss function for the discriminator. The difference is that now the generator wants every fake sample to be a little better than the average of the real samples which is more sensible.

6 thoughts on “Alternative losses for Relativistic GANs

    1. Honestly, I don’t think it’s better, it’s not symmetric anymore and both variants arise naturally from inequalities depending on which way you go. I will release my next paper soon, there are just a few annoying hiccups right now that I have to figure out.


  1. Also, regarding Relativistic Average GANs, is computing the average loss of the “other” (real/fake) samples in the minibatch worth the cost? Why not arbitrarily select a single opposing sample and use that to compute an (admitedly crude) estimate of the average loss of the “other” side?


    1. However you look at it, Relativistic “paired” GANs (the non-average kind, I’m just changing the name so it’s less ambiguous) doesn’t perform anywhere close to Relativistic average GANs. I still don’t know why this is the case. The idea you propose is equivalent to using Relativistic paired GANs.


      1. Interesting. It could simply be because there’s less variance in the gradients when you average them. Thank you for your answers.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s