Forward Mode Auto Diff
Compute derivatives in the forward pass
Cons
- need to do partials for each input variable separately (full forward)
Reverse Mode Auto Diff
Compute in reverse topological order
Dynamic vs Static Graph
Dynamic builds computation graph on each iteration