Synthetic Data Generation [2410.12896] A Survey on Data Synthesis and Augmentation for Large Language Models GitHub - amazon-science/synthesizrr: Synthesizing realistic and diverse text-datasets from augmented LLMs