-
[2410.02749] Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
-
[2410.02089] RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Code Embedding Models
- voyage-code-3: more accurate code retrieval with lower dimensional, quantized embeddings – Voyage AI
Synthetic Tasks
- remove function body, implement it, validate output against original function
- corrupt function, have LLM find and fix mistakes (AST based)
- translate code into other languages
- commit ⇒ describe issue ⇒ implement commit
- make up tasks ⇒ write tests ⇒ implement code
- fill in the middle tasks
- learn to optimize, best of N implementation to write faster version of existing functions
- existing code ⇒ tests ⇒ new code
- parser guided generation (reject paths that can’t parse)
- GitHub - tree-sitter/tree-sitter: An incremental parsing system for programming tools
- GitHub - lark-parser/lark: Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
- GitHub - amazon-science/incremental-parsing: Incremental Python parser for constrained generation of code by LLMs.
- structured generation (grammar based) Structured Generation with LLMs