arxiv SEPT: Towards Efficient Scene Representation Learning for Motion Prediction