• synthetic data
    • tool definitions LLM generated text queries / prompts
  • distill into embedding / encoder / crossencoder models
    • setfit
    • LORA
    • transformer
  • or tiny decoder LLM
    • in context examples with RAG
    • in context history of previous actions
  • fast serving
    • trie prefix cache
    • structured generation
    • small custom tokenizer
    • [ ]