Self-supervised learning for Images
| No. | Model Name | Title | Links | Pub. | Organization | Release Time |
|---|---|---|---|---|---|---|
| 1 | iGPT | Generative Pretraining from Pixels | paper code | ICML 2021 | OpenAI | 17 June 2020 |
| 2 | MST | MST: Masked Self-Supervised Transformer for Visual Representation | paper | NeurIPS 2021 | Chinese Academy of Sciences | 10 June 2021 |
| 3 | BEiT | BEiT: BERT Pre-Training of Image Transformers | paper code | ICLR 2022 | Microsoft Research | 15 June 2021 |
| 4 | MAE | Masked Autoencoders Are Scalable Vision Learners | paper code | CVPR 2022 | Meta | 19 Dec 2021 |
| 5 | iBoT | iBOT: Image BERT Pre-Training with Online Tokenizer | paper code | ICLR 2022 | ByteDance | 15 Nov 2021 |
| 6 | SimMIM | SimMIM: A Simple Framework for Masked Image Modeling | paper code | arXiv | MSRA | 18 Nov 2021 |
| 7 | PeCo | PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers | paper | arXiv | Univeristy of Science and Technology of China | 24 Nov 2021 |
| 8 | MaskFeat | Masked Feature Prediction for Self-Supervised Visual Pre-Training | paper | arXiv | Meta | 16 Dec 2021 |
| 9 | SplitMask | Are Large-scale Datasets Necessary for Self-Supervised Pre-training? | paper | arXiv | Meta | 20 Dec 2021 |
| 10 | ADIOS | Adversarial Masking for Self-Supervised Learning | paper | ICML 2022 | Unviersity of Oxford | 31 Jan 2021 |
| 11 | CAE | Context Autoencoder for Self-Supervised Representation Learning | paper | arXiv | Peking University | 7 Feb 2022 |
| 12 | CIM | Corrupted Image Modeling for Self-Supervised Visual Pre-Training | paper code | arXiv | Microsoft | 7 Feb 2022 |
| 13 | ConvMAE | ConvMAE: Masked Convolution Meets Masked Autoencoders | paper code | arXiv | Shanghai AI Laboratory | 19 May 2022 |
| 14 | uniform masking | Uniform Masking: Enabling MAE Pre-training for Pyramid-based Vision Transformers with Locality | paper code | arXiv | Nanjing University of Science and Technology | 20 May 2022 |
| 15 | LoMaR | Efficient self-supervised learning with local masked reconstruction | paper code | arXiv | KAUST | 1 Jun 2022 |
| 16 | M3AE | Multimodal Masked Autoencoders Learn Transferable Representations | paper | arXiv | UCB | 31 May 2022 |
| 17 | HiViT | HiViT: Hierarchical Vision Transformer Meets Masked Image Modeling | paper | arXiv | University of Chinese Academy of Sciences | 30 May 2022 |
| 18 | GreenMiM | Green Hierarchical Vision Transformer for Masked Image Modeling | paper code | arXiv | The University of Tokyo | 26 May 2022 |
| 19 | A^2MIM | Architecture-Agnostic Masked Image Modeling – From ViT back to CNN | paper | arXiv | AI Lab, Westlake University | 1 Jun 2022 |
| 20 | MixMIM | MixMIM: Mixed and Masked Image Modeling for Efficient Visual Representation Learning | paper code | arXiv | SenseTime Research | 28 May 2022 |
| 21 | SemMAE | SemMAE: Semantic-Guided Masking for Learning Masked Autoencoders | paper | arXiv | Chinese Academy of Sciences | 21 Jun 2022 |
| 22 | Voxel-MAE | Voxel-MAE: Masked Autoencoders for Pre-training Large-scale Point Clouds | paper code | arXiv | Peking University | 20 Jun 2022 |
| 23 | BootMAE | Bootstrapped Masked Autoencoders for Vision BERT Pretraining | paper code | ECCV 2022 | University of Science and Technology of China | 14 Jul 2022 |
| 24 | OmniMAE | OmniMAE: Single Model Masked Pretraining on Images and Videos | paper code | arXiv | Meta AI | 16 Jun 2022 |
| 25 | SatMAE | SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery | paper | arXiv | Stanford University | 17 Jul 2022 |
| 26 | CMAE | Contrastive Masked Autoencoders are Stronger Vision Learners | paper | arXiv | University of Science and Technology | 27 Jul 2022 |
| 27 | BEiT v2 | BEIT V2: Masked Image Modeling with Vector-Quantized Visual Tokenizers | paper | arXiv | University of Chinese Academy of Sciences | 12 Aug 2022 |
| 28 | BEiT v3 | Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks | paper | arXiv | Microsoft Corporation | 22 Aug 2022 |
Self-supervised Learning for Videos
| No. | Model Name | Title | Links | Pub. | Organization | Release Time |
|---|---|---|---|---|---|---|
| 1 | VideoMAE | VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training | paper code | arXiv | Tencent AI Lab | 23 Mar 2022 |
| 2 | MAE in Video | Masked Autoencoders As Spatiotemporal Learners | paper | arXiv | Meta | 18 May 2022 |
Self-supervised Learning for Audios
| No. | Model Name | Title | Links | Pub. | Organization | Release Time |
|---|---|---|---|---|---|---|
| 1 | AudioMAE | Masked Autoencoders that Listen | paper code | arXiv | Meta AI | 13 Jul 2022 |
Survey in self-supervised learning
| No. | Title | Links | Pub. | Organization | Release Time |
|---|---|---|---|---|---|
| 1 | A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond | paper | arXiv | KAIST | 30 Jul 2022 |