arxiv Surrogate Gap Minimization Improves Sharpness-Aware Training