arxiv Discovering Preference Optimization Algorithms with and for Large Language Models