Review of Architectures
- Network in Network
- 1x1 convs
- Resnet
- Residual Block
- Bottleneck Block
- SE Net
- channel wise attention (global)
- MobileNetV2
- inverted residual
- depthwise convolutions
- 3x3 convs
- winograd conv
- EfficientNet
- poor comp efficiency on GPUs
- reducing activation size to reduce latency (due to memory movement and large intermediates)
- wider models have more compute per activation
- depth first execution of kernels to avoid large intermediate activations
- can’t be used for global ops like SE attention
- ConvNeXT
- ResNeXt
- [ ]
Contributions
- GPU kernels for fused FusedMBConv and MBConv blocks, exploiting temporal locality and reduce workspace size to avoid spilling to dram
instead use a fused kernel that computes in depth first fashion, getting rid of large intermediates