Pretrain on synthetic conversation data
-
Use cheap LLM to turn Wikipedia articles into conversations between a human and a bot
-
Pretrain on that data with no post training
-
Distill to a tiny convo conversion model for fast generation
-
Try on fineweb