Tips for Productive ML Teams

January 24, 2025 updated December 14, 2025 2 min read

Reduce Iteration Time

Iterate on Subsets

Less Classes
Less Data
Smaller Models

expand slowly from there

Parallelize

Pay for 8 GPUs to get experiment done in an hour instead of 1 GPU for 8 hour

Make it easy to deploy

start with simple python services in pytorch
- use ray serve
avoid early optimizations like onnx, triton-inference-server

Tune Open Models

LoRA
SetFit
GLiNER

Cookiecutter Projects

Friction Points

Sanity Checks

overfit a few batches
replicate results on public benchmarks

Automate

Experiment Job Queue

Bayesian Optimization

One Click Train Runs

Set Baselines

Naive Baseline

Standard Architecture Baseline

Top Baseline

Human Baseline

annotator agreement

Measure

Segment Your Data

frequency
cluster

Black box gold test set

Gamify

Internal Evals and Leaderboards

Invest in Tooling - Especially Data

Recommendations

Look at the Data

analyze errors
look for disagreements between different models
look at samples with top losses

Build Feedback Loops

Ship Early and Often

get production data as soon as possible
ship best available large model ASAP, plan on replacing or improving later if too slow / expensive

Build Demos and POCs

streamlit
jupyter
gradio

Logs Logs Logs

track experiments
track outputs
index embeddings / predictions from multiple models

Data Data Data

store raw data, don’t normalize anything ahead of time
use uuidv7 as data ids
- datasets become time slices
synthetic

Resist the urge to get off the beaten path

Don’t do research until you need to

There’s a whole industry behind transformers, trying a new architecture requires you to reinvent most of it, including optimizer kernels, serving infrastructure, fine tuning libraries