Further investigation needs to be done, but I suspect some variants of Relativistic average GANs (RaGANs) might be more sensible than the ones I proposed in my paper. If you are using Relativistic GANs, you might be interested in trying out also variant 3 which is the most promising.
For simplicity, let’s assume we use the non-saturating loss and that we have symmetry, i.e., f1(-y)=f2(y)
. (This is true in HingeGAN, LSGAN with -1/1 labels, Standard GAN with sigmoid activation).
1) This is the RaGAN formula proposed in the paper.
2) This variant works as well as the original RaGAN. I know this because I used it by mistake before and it made no difference in the results. The generator loss doesn’t make much sense, but as discussed in GANs beyond divergence minimization, the generator can minimize pretty much anything related to the divergence estimated (the loss function of the discriminator) and it will likely still work. GANs don’t actually minimize the divergence.
3) This variant is the most promising, but I did not have the time to test it. It follows the same divergence as the one above since it uses the same loss function for the discriminator. The difference is that now the generator wants every fake sample to be a little better than the average of the real samples which is more sensible.
Hi! It’s been a couple of months. Did you have a chance to test option #3?
LikeLike
Honestly, I don’t think it’s better, it’s not symmetric anymore and both variants arise naturally from inequalities depending on which way you go. I will release my next paper soon, there are just a few annoying hiccups right now that I have to figure out.
LikeLike
Also, regarding Relativistic Average GANs, is computing the average loss of the “other” (real/fake) samples in the minibatch worth the cost? Why not arbitrarily select a single opposing sample and use that to compute an (admitedly crude) estimate of the average loss of the “other” side?
LikeLike
However you look at it, Relativistic “paired” GANs (the non-average kind, I’m just changing the name so it’s less ambiguous) doesn’t perform anywhere close to Relativistic average GANs. I still don’t know why this is the case. The idea you propose is equivalent to using Relativistic paired GANs.
LikeLike
Interesting. It could simply be because there’s less variance in the gradients when you average them. Thank you for your answers.
LikeLike
I don’t think so as I have been using the minimum-variance unbiased estimator (MVUE) of RpGANs which takes longer to estimate and results still don’t compete with RaGANs.
LikeLiked by 1 person
Hi, I really like your work here. What do you think are the drawbacks of RaGANs that future researchers can focus on?
LikeLike
Sorry for the long delay, I just saw this post. RpGANs can solve mode collapse (https://arxiv.org/abs/2011.04926), but both RpGANs and RaGANs have the issue that they can get vanishing gradients, even if less likely to do so; gradient penalty solve this problem. Can we devise a new relativistic variant that can solve mode collapse without needing gradient penalty, is more efficient that RpGAN (which needs O(N^2) real-fake pairs), and cannot get vanishing gradients or the gradients is very informative (see https://arxiv.org/abs/1902.05687, gradient penalty causes the gradient to be informative).
LikeLike