Forward Mode Auto Diff

Compute derivatives in the forward pass

Cons

  • need to do partials for each input variable separately (full forward)

Reverse Mode Auto Diff

Compute in reverse topological order

Dynamic vs Static Graph

Dynamic builds computation graph on each iteration