Tips for Productive ML Teams
Reduce Iteration Time
Iterate on Subsets
- Less Classes
- Less Data
- Smaller Models
expand slowly from there
Parallelize
- Pay for 8 GPUs to get experiment done in an hour instead of 1 GPU for 8 hour
Make it easy to deploy
- start with simple python services in pytorch
- use ray serve
- avoid early optimizations like onnx, triton-inference-server
Tune Open Models
- LoRA
- SetFit
- GLiNER
Cookiecutter Projects
Friction Points
- lack of data
- lack of evals
- slow to load data
- machine and env startup time
- recovery after crashes
- out of memory
- nans
- dependency issues
Sanity Checks
- overfit a few batches
- replicate results on public benchmarks
Automate
Experiment Job Queue
Bayesian Optimization
One Click Train Runs
Set Baselines
Naive Baseline
Standard Architecture Baseline
Top Baseline
Human Baseline
- annotator agreement
Measure
Segment Your Data
- frequency
- cluster
Black box gold test set
Gamify
Internal Evals and Leaderboards
Share Learnings and Knowledge
Invest in Tooling - Especially Data
Recommendations
- uv
- ray
- skypilot
- modal
- duckdb
- pytorch
- prefect
- msgspec
Look at the Data
- analyze errors
- look for disagreements between different models
- look at samples with top losses
Build Feedback Loops
Ship Early and Often
- get production data as soon as possible
- ship best available large model ASAP, plan on replacing or improving later if too slow / expensive
Build Demos and POCs
- streamlit
- jupyter
- gradio
Logs Logs Logs
- track experiments
- track outputs
- index embeddings / predictions from multiple models
Data Data Data
- store raw data, don’t normalize anything ahead of time
- use uuidv7 as data ids
- datasets become time slices
- synthetic
Resist the urge to get off the beaten path
Don’t do research until you need to
- There’s a whole industry behind transformers, trying a new architecture requires you to reinvent most of it, including optimizer kernels, serving infrastructure, fine tuning libraries