[2412.14058] Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models