arxiv RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control