Synthetic Data Generation [2410.12896] A Survey on Data Synthesis and Augmentation for Large Language Models GitHub - amazon-science/synthesizrr: Synthesizing realistic and diverse text-datasets from augmented LLMs GitHub - bespokelabsai/curator: Synthetic Data curation for post-training and structured data extraction