This repo is my implementation of the mode-connectivity result in 1. The paper posits that any two local minima in the loss landscape can be connected by a curve through a valley in the loss landscape. Apart from the interesting fact that this is possible, the result can also be used for quick ensemble sampling for uncertainty quantification.
Assume that we have a class of models
In the paper, the path is parametrized as a chain of two straight lines that both connect to a third parameter set
Or as I have done in this implementation, the path is parameterized as a bezier curve connecting the start and end points:
Given that the two end-point models parametrized by
where
The loss is minimized by first sampling
The following results are created with CIFAR_experiment.ipynb, but can also be reproduced using the standard setting for modeconnectivity.py as described below. CIFAR10 data has been used for this particular experiment. The model used is a pretty standard convolutional neural network with 3 convolutional layers and two linear layers, ReLU activation and 50% dropout. See models.py for further details.
The code first trains the start and end models, and next the
The loss landscape, projected unto the plane suspended by the beziercurve is plottet below. Note that the landscape is squeezed such that 
It can indeed be seen that the curve lies in a valley as posited in the paper.
Given that
Finally, the question is: What if use parameter sets sampled along the curve as ensembles? In the normal setup for a classification task, the model would predict logits
Instead, in ensemble prediction we use the sampled parameter sets and average over the output:
The idea is that the parameter samples
This results in a more robust model. Below is a table with the evaluation of the Cross Entropy loss, the Expected Calibration Error (ECE), the Accuracy and the Area Under the Reciever-Operator Curve (AUROC), for the start model, the end model and for the ensemble of models along the curve.
As can be seen, all measurements except the ECE improve when using the ensemble of models. The ECE is probably higher since the ensemble inevitably gives more conservative predictions than the start and end models - in the sense that the ensemble is less sure about its predictions.
| Model | Cross Entropy | Expected Calibration Error | Accuracy | AUROC |
|---|---|---|---|---|
| Start model | 0.587726 | 0.0231762 | 0.8025 | 0.977849 |
| End model | 0.607342 | 0.0216215 | 0.7918 | 0.976107 |
| Ensemble | 0.544668 | 0.0373571 | 0.8158 | 0.98081 |
Create a virtual environment and install the packages:
python -m pip install -e .
Then the code can be run with the standard settings with
cd modeconnectivity
python3 modeconnectivity.py
mode-connectivity/
├── LICENSE
├── README.md
├── requirements.txt
├── experiments/
│ └── results_notebook_CIFAR/
│ ├── curve_model/
│ ├── end_model/
│ ├── figures/
│ ├── logs/
│ ├── models/
│ └── start_model/
└── modeconnectivity/
├── CIFAR_experiment.ipynb
├── curve_eval.py
├── curve_model.py
├── curve_plots.py
├── function_experiment.ipynb
├── modeconnectivity.py
├── models.py
├── scheduler.py
└── train.py
LICENSE: License for the project.README.md: Project overview, results, and usage instructions.requirements.txt: Python dependencies.experiments/: Saved outputs from runs (models, logs, plots, and artifacts).modeconnectivity/: Core source code for training, curve optimization, evaluation, and plotting.CIFAR_experiment.ipynb: Runs main experiment and makes plots for this readme.curve_model.py: The Curve class implements the parameter reparametrization.curve_eval.py: Evaluation utilities for models along the curve.curve_plots.py: Plotting utilities for landscapes and curve metrics.function_experiment.ipynb: Experiment showing that for small models, the result does not hold.modeconnectivity.py: Main script to train endpoint models and fit the curve model.models.py: Model architectures used in experiments.scheduler.py: Learning-rate scheduling logic and optimizer definition.train.py: Standard model training routines.
- Timur Garipov, Pavel Izmailov, Dmitrii Podoprikhin, Dmitry Vetrov, and Andrew Gordon Wilson. 2018. Loss surfaces, mode connectivity, and fast ensembling of DNNs. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 8803–8812. https://arxiv.org/pdf/1802.10026
