Train small model that can query external sources
input query_token query_embeddings query_results_token result_embeddings …
Bootstrap from existing LLM, and on device nearest neighbor index (embedding tables only)
Curriculum learning by paging in different partitions of the data and distilling from the large LLM