arxiv Robust Preference Optimization with Provable Noise Tolerance for LLMs