Teach VLM to Zoom and Pan

Generate a dataset using existing VLM to train a model to select which regions to add to the input for further processing