michal.i/o
Explorer
blog
2022-12-17 TIL
journal
2024-08-12
2024-08-19
2024-08-26
2024-09-02
2024-09-09
2024-09-16
2024-09-18 - Pytorch Conference Notes
2024-09-23
2024-09-30
2024-10-07
2024-10-14
notes
business
accounting
consulting
Growth
legal
marketing
Open Source Business Models
pricing
Productivity Software
sales
VC Alternatives
dev
algorithms
arrow
bashrc x zshrc
cloud
CRDTs
cuda
data visualization and dashboarding
Databases
django
docker
duckdb
ffmpeg
hardware
kubernetes
logging
networking
object-stores
parquet
postgres
python
pytorch
ray
react-native
redis
rust
search
security
sqlite
terraform
web-servers
math
Linear Algebra
Math for ML
Optimization
ml
conferences
2023 NeurIPS
models
Mistral7B
papers
2023-04-14 - Combined Scaling for Zero-shot Transfer Learning
2023-12-04 - MobileCLIP - Fast Image-Text Models through Multi-Modal Reinforced Training
2023-12-04 - Rejuvenating image-GPT as Strong Visual Representation Learners
2023-12-05 - Mamba Linear-Time Sequence Modeling with Selective State Spaces
2023-12-09 - SILC Improving Vision Language Pretraining with Self-Distillation
2023-12-09 - Text as Image Learning Transferable Adapter for Multi-Label Classification
2023-12-17 - Stable and low-precision training for large-scale vision-language models
2024-10-04 - Movie Gen A Cast of Media Foundation Models
2024-10-10 - Pixtral 12B
research ideas
Latent Transformers with small vocabularies
Multi Modal Learning to Rank as a replacement for CLIP
Recurrent Computation with Transformers by repeating layers
Remove all the things
SSMs 4 Rec
Task Routing for Multimodal LLMs
Tiny Foundational model by distilling from a lot of SOTA models
Tiny LLMs with rag in the middle
Universal embedding space for popular foundational models (or adapters)
VLMs for better Vision Backbones
3d
A glossary of all the ways ML models fail to train
Active Learning
Alignment and Post Training
Approximate Nearest Neighbor Search
autograd
benchmarks
CLIP
Cloud GPUs
cnns
Code LLMs
compilers
compression
Computer Graphics
Computer Vision Backbones
contrastive
Data Formats for ML
datasets
Decoder Transformer Inference
Deepspeed
detection
Diffusion Models
distill
Distributed Training
Embedding Models
Evaluation Metrics
Extreme Classification
FairScale
feature-stores
Few Shot Learning
fine-tuning
Flow Matching
GPUs
graphs
Human Pose Estimation and Human Modeling
Image Generation
Image Recognition
Imitation Learning
Instance Recognition and Retrieval
Instance Retrieval and Instance Recognition
jax
label-noise
Learning to Rank
LLM Reasoning and Test Time Compute
LLM Training and Tuning
Long Context Transformers
Long Tail Classification and Class Imbalance
Machine Learning Tricks and Best Practices
maes
Mamba
medical
mixture of experts
ML Conferences
ML for Math
ML Infrastructure
ML Scaling
MLX
Mobile Inference
Model Routing
Multi Label Classification
multi-modal
multi-task
Natural Language Processing
nerf
Networking
Normalization
Numerics
ocr
paper-params
Parameter Efficient Fine Tuning (PEFT)
PrefixLM
Quantization
recommenders
resources
Retrieval Augmented Generation (RAG)
Retrieval Augmented Models
rl
RL for LMs
segmentation
semantic-search
semisup
Server Inference
softmax
speech
State Space Models (SSMs)
Storage
Synthetic Data
Tabular Machine Learning
Text Embeddings
text2sql
torch compile
Transformer Alternatives
Transformer Properties
transformers
tricks
triton
video
Video Generation
Vision - Language Models
Vision Language Models
Vision Transformers
Visual Search
xformers
xlstm
Dark mode
Light mode
Home
❯
notes
❯
ml
❯
softmax
softmax
Oct 16, 2024
1 min read
Log Softmax
Log Exp Sum
Online Softmax
Search
Search
Explorer
blog
2022-12-17 TIL
journal
2024-08-12
2024-08-19
2024-08-26
2024-09-02
2024-09-09
2024-09-16
2024-09-18 - Pytorch Conference Notes
2024-09-23
2024-09-30
2024-10-07
2024-10-14
notes
business
accounting
consulting
Growth
legal
marketing
Open Source Business Models
pricing
Productivity Software
sales
VC Alternatives
dev
algorithms
arrow
bashrc x zshrc
cloud
CRDTs
cuda
data visualization and dashboarding
Databases
django
docker
duckdb
ffmpeg
hardware
kubernetes
logging
networking
object-stores
parquet
postgres
python
pytorch
ray
react-native
redis
rust
search
security
sqlite
terraform
web-servers
math
Linear Algebra
Math for ML
Optimization
ml
conferences
2023 NeurIPS
models
Mistral7B
papers
2023-04-14 - Combined Scaling for Zero-shot Transfer Learning
2023-12-04 - MobileCLIP - Fast Image-Text Models through Multi-Modal Reinforced Training
2023-12-04 - Rejuvenating image-GPT as Strong Visual Representation Learners
2023-12-05 - Mamba Linear-Time Sequence Modeling with Selective State Spaces
2023-12-09 - SILC Improving Vision Language Pretraining with Self-Distillation
2023-12-09 - Text as Image Learning Transferable Adapter for Multi-Label Classification
2023-12-17 - Stable and low-precision training for large-scale vision-language models
2024-10-04 - Movie Gen A Cast of Media Foundation Models
2024-10-10 - Pixtral 12B
research ideas
Latent Transformers with small vocabularies
Multi Modal Learning to Rank as a replacement for CLIP
Recurrent Computation with Transformers by repeating layers
Remove all the things
SSMs 4 Rec
Task Routing for Multimodal LLMs
Tiny Foundational model by distilling from a lot of SOTA models
Tiny LLMs with rag in the middle
Universal embedding space for popular foundational models (or adapters)
VLMs for better Vision Backbones
3d
A glossary of all the ways ML models fail to train
Active Learning
Alignment and Post Training
Approximate Nearest Neighbor Search
autograd
benchmarks
CLIP
Cloud GPUs
cnns
Code LLMs
compilers
compression
Computer Graphics
Computer Vision Backbones
contrastive
Data Formats for ML
datasets
Decoder Transformer Inference
Deepspeed
detection
Diffusion Models
distill
Distributed Training
Embedding Models
Evaluation Metrics
Extreme Classification
FairScale
feature-stores
Few Shot Learning
fine-tuning
Flow Matching
GPUs
graphs
Human Pose Estimation and Human Modeling
Image Generation
Image Recognition
Imitation Learning
Instance Recognition and Retrieval
Instance Retrieval and Instance Recognition
jax
label-noise
Learning to Rank
LLM Reasoning and Test Time Compute
LLM Training and Tuning
Long Context Transformers
Long Tail Classification and Class Imbalance
Machine Learning Tricks and Best Practices
maes
Mamba
medical
mixture of experts
ML Conferences
ML for Math
ML Infrastructure
ML Scaling
MLX
Mobile Inference
Model Routing
Multi Label Classification
multi-modal
multi-task
Natural Language Processing
nerf
Networking
Normalization
Numerics
ocr
paper-params
Parameter Efficient Fine Tuning (PEFT)
PrefixLM
Quantization
recommenders
resources
Retrieval Augmented Generation (RAG)
Retrieval Augmented Models
rl
RL for LMs
segmentation
semantic-search
semisup
Server Inference
softmax
speech
State Space Models (SSMs)
Storage
Synthetic Data
Tabular Machine Learning
Text Embeddings
text2sql
torch compile
Transformer Alternatives
Transformer Properties
transformers
tricks
triton
video
Video Generation
Vision - Language Models
Vision Language Models
Vision Transformers
Visual Search
xformers
xlstm
Table of Contents
Log Softmax
Log Exp Sum
Online Softmax
Backlinks
No backlinks found
Graph View