michal.i/o

      • 2022-12-17 TIL
      • N+1 ways to implement attention
      • 2024-08-12
      • 2024-08-19
      • 2024-08-26
      • 2024-09-02
      • 2024-09-09
      • 2024-09-16
      • 2024-09-18 - Pytorch Conference Notes
      • 2024-09-23
      • 2024-09-30
      • 2024-10-07
      • 2024-10-14
      • 2024-10-21
      • 2024-10-28
      • 2024-11-04
      • 2024-11-18
      • 2024-11-25
      • 2024-12-02
      • 2024-12-09
      • 2024-12-16
      • 2024-12-23
      • 2024-12-30
      • 2025-01-06
      • 2025-01-13
      • 2025-01-20
        • accounting
        • Business Lessons
        • consulting
        • Growth
        • Landing Pages
        • legal
        • marketing
        • Open Source Business Models
        • pricing
        • Productivity Software
        • sales
        • VC Alternatives
          • Quickselect
          • Resources
          • Static Search Trees
        • arrow
        • bashrc x zshrc
        • ClickHouse
        • cloud
        • Concurrency
        • CRDTs
        • cuda
        • Data Structures and Algorithms
        • data visualization and dashboarding
        • Databases
        • django
        • docker
        • duckdb
        • Engineering Blogs
        • ffmpeg
        • hardware
        • Kafka
        • kubernetes
        • Latencies
        • Leetcode
        • logging
        • networking
        • object-stores
        • parquet
        • postgres
        • python
        • pytorch
        • ray
        • react-native
        • redis
        • rust
        • Search - Full Text Search and Semantic Search
        • security
        • sqlite
        • terraform
        • web-servers
        • Linear Algebra
        • Math for ML
        • Optimization
        • Probability
          • 2023 NeurIPS
          • 2024 NeurIPS
          • Mistral7B
          • 2023-04-14 - Combined Scaling for Zero-shot Transfer Learning
          • 2023-12-04 - MobileCLIP - Fast Image-Text Models through Multi-Modal Reinforced Training
          • 2023-12-04 - Rejuvenating image-GPT as Strong Visual Representation Learners
          • 2023-12-05 - Mamba Linear-Time Sequence Modeling with Selective State Spaces
          • 2023-12-09 - SILC Improving Vision Language Pretraining with Self-Distillation
          • 2023-12-09 - Text as Image Learning Transferable Adapter for Multi-Label Classification
          • 2023-12-17 - Stable and low-precision training for large-scale vision-language models
          • 2024-10-04 - Movie Gen A Cast of Media Foundation Models
          • 2024-10-10 - Pixtral 12B
          • 2024-11-03 - GATED DELTA NETWORKS IMPROVING MAMBA2 WITH DELTA RULE
          • 2024-11-03 - On the Efficiency of Convolutional Neural Networks
          • 2024-11-03 - ReMoE FULLY DIFFERENTIABLE MIXTURE-OF-EXPERTS WITH RELU ROUTING
          • 2024-11-03 - TokenFormer - RETHINKING TRANSFORMER SCAL-ING WITH TOKENIZED MODEL PARAMETERS
          • 2024-11-17 - Mixture-of-Transformers A Sparse and Scalable Architecture for Multi-Modal Foundation Models
          • AI Web Browser
          • Bad apples for label noise early stopping
          • Commander - Super Fast Local Function Calling
          • Early Fusion Multimodal Encoder Models
          • Latent Transformers with small vocabularies
          • Learn to Initialize from OS Models
          • Learning Skip Layers
          • Mixture of Modules
          • Multi Modal Learning to Rank as a replacement for CLIP
          • Neural Architecture Search for SSM Hybrids
          • Predict token from positional embedding
          • Pretrain on synthetic conversation data
          • Recurrent Computation with Transformers by repeating layers
          • Remove all the things
          • Sapiens for Robotics
          • Small Proxy model to predict loss for given sample
          • SSMs 4 Rec
          • Task Routing for Multimodal LLMs
          • Teach VLM to Zoom and Pan
          • Tiny Foundational model by distilling from a lot of SOTA models
          • Tiny LLMs with rag in the middle
          • Two Stream SSMs
          • Universal embedding space for popular foundational models (or adapters)
          • Untitled
          • VLMs for better Vision Backbones
          • White space separated conv text encoder
        • "World Models" - Modeling the Real World
        • 3D Computer Vision
        • A glossary of all the ways ML models fail to train
        • Activation Functions
        • Active Learning
        • Agents
        • Alignment and Post Training
        • Approximate Nearest Neighbor Search (ANN)
        • autograd
        • Autonomous Driving - Self Driving
        • benchmarks
        • CLIP
        • Cloud GPUs
        • cnns
        • Code LLMs
        • compilers
        • compression
        • Computer Graphics
        • Computer Vision Backbones
        • Contrastive Learning
        • Data Curation
        • Data Formats for ML
        • Data Loading
        • Decoder Transformer Inference (LLM Serving)
        • Decoding and Sampling
        • Deep Learning Tricks of the Trade
        • Deepspeed
        • Diffusion Models
        • Distributed Training
        • Document Processing
        • Embedding Models
        • Evaluation Metrics
        • Extreme Classification
        • FairScale
        • Feature Stores
        • Few Shot Learning
        • fine-tuning
        • Flow Matching - Rectified Flows
        • Food Recognition
        • Function Calling (with LLMs)
        • Generative Models
        • GPUs
        • graphs
        • Hallucinations
        • Human Pose Estimation and Human Modeling
        • Image Matching
        • Image Recognition
        • Imitation Learning
        • Information Retrieval - Retrieval, Ranking and Search
        • Instance Retrieval and Instance Recognition
        • jax
        • Label Noise
        • Learning to Rank
        • LLM Evaluation
        • LLM Tokenization
        • LLM Training and Tuning
        • logsumexp
        • Long Context Transformers
        • Long Tail Classification and Class Imbalance
        • Machine Learning Tricks and Best Practices
        • maes
        • Mamba
        • matryoshka embeddings
        • Mechanistic Interpretability
        • medical
        • mixture of experts
        • ML Competitions
        • ML Conferences
        • ML Courses & Books
        • ML for Math
        • ML Infrastructure
        • ML Scaling
        • ML Systems
        • MLX
        • Mobile Inference
        • Model Distillation and Transfer Learning
        • Model Routing
        • Multi Label Classification
        • Multi Modal Learning
        • Multi Task Learning
        • Natural Language Processing
        • NeRF - Neural Radiance Fields
        • Networking
        • Neural Architecture Search (NAS)
        • Normalization
        • Numerics
        • Object Detection
        • ocr
        • paper-params
        • Parameter Efficient Fine Tuning (PEFT)
        • PrefixLM
        • Production Machine Learning Systems
        • Prompt Engineering
        • Pruning
        • Quantization
        • Recommendation Systems (RecSys)
        • Reinforcement Learning (RL)
        • resources
        • Retrieval Augmented Generation (RAG)
        • Retrieval Augmented Models
        • RL for LMs
        • Robotics
        • segmentation
        • Self-Supervised Image Models
        • Semantic Search and Ranking
        • Semi Supervised Learning
        • Server Inference
        • SLAM
        • Small Foundational Models
        • softmax
        • Speech - Speech Recognition and TTS
        • Speedruns
        • State Space Models (SSMs)
        • Storage
        • Structured Generation with LLMs
        • Synthetic Data
        • Tabular Machine Learning
        • Tensor Tricks
        • Test Time Compute, LLM Reasoning, Inference Time Scaling
        • Text Embeddings
        • text2sql
        • Token Dropping, Pruning, Merging and Compression
        • torch compile
        • Transformer Alternatives (mostly SSMs)
        • Transformer Properties
        • transformers
        • triton
        • Untitled
        • Variational Autoencoders (VAE)
        • video
        • Video Generation
        • Vision Language Models
        • Vision Transformers
        • Visual Search
        • xformers
        • xlstm
        • C4AI Command R7B
        • ColBERT
        • ColPali & ColQwen
        • Conformer
        • Contextual Document Embeddings (CDE)
        • ControlNet
        • DeepSeek R1
        • DeepSeek v3
        • DeltaNet
        • DETR
        • Diffusion Transformer (DiT)
        • FLUX
        • Gecko - Versatile Text Embeddings Distilled from Large Language Models
        • GLiNER - General NER
        • HNSW
        • InternVL
        • Kolmogorov-Arnold Theorem
        • KV Cache Compression
        • Latent Diffusion
        • LayerSkip
        • LO-PQ
        • Maximal Update Parametrization (μP)
        • Mixture of Depth
        • Mixture-of-Transformer
        • MMDiT - Multi Modal Diffusion Transformer
        • ModernBERT
        • Movie Gen
        • Multi-Head Latent Attention (MLA)
        • Not All Tokens Are What You Need For Pretraining
        • PaliGemma
        • ReAct
        • Ring Attention
        • SetFit
        • Speech-to-Speech
        • SPLADE
        • Stable Diffusion 3 and 3.5
        • Test Time Learning (Local Learning)
        • Token Dropping
        • Unified-IO
        • Vision-Language-Action Models (VLA)
        • Wav2vec
        • WaveNet
        • You Only Cache Once (YOCO)
    Home

    ❯

    tags

    ❯

    Tag: torch

    Tag: torch

    1 item with this tag.

    • Jan 21, 2025

      2024-08-19

      • ssms
      • distillation
      • label-noise
      • selfsup
      • torch
      • quantization
      • python
      • uv
      • packaging

    Created with Quartz v4.4.0 © 2025