Code LLMs
-
[2410.02749] Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
-
[2410.02089] RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning
Code Embedding Models
- voyage-code-3: more accurate code retrieval with lower dimensional, quantized embeddings – Voyage AI
Synthetic Tasks
- remove function body, implement it, validate output against original function
- corrupt function, have LLM find and fix mistakes (AST based)
- translate code into other languages
- commit => describe issue => implement commit
- make up tasks => write tests => implement code
- fill in the middle tasks
- learn to optimize, best of N implementation to write faster version of existing functions
- existing code => tests => new code
- parser guided generation (reject paths that can’t parse)
- GitHub - tree-sitter/tree-sitter: An incremental parsing system for programming tools
- GitHub - lark-parser/lark: Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.
- GitHub - amazon-science/incremental-parsing: Incremental Python parser for constrained generation of code by LLMs.
- structured generation (grammar based) Structured Generation with LLMs