When reading this article on Frank-Wolfe's algorithm, I did not understand why Frank-Wolfe's algorithm was devoid of projection, while the gradient descent was not.
I think the problem is that I do not understand what it means for an operation to be without projection.
In linear algebra, I understand that a transformation is a projection if it is idempotent.
I also understand that it's good to have algorithms without projection (this helps reduce complexity) – but I do not know why that's the case.
My question is this: when is an operation without projection and why can we reduce the complexity of non-projection algorithms?