A Closer Look at Multimodal Representation Collapse

1Fujitsu Research of Europe   2University of Surrey  

Modality collapse results from noisy features of one modality getting entangled with predictive features of another through polysemantic neurons in the fusion head. Freeing up rank bottlenecks allows for the denoising of such features along independent dimensions, allowing them to contribute to loss reduction without affecting the other modality.

Abstract

We aim to develop a fundamental understanding of modality collapse, a recently observed empirical phenomenon wherein models trained for multimodal fusion tend to rely only on a subset of the modalities, ignoring the rest. We show that modality collapse happens when noisy features from one modality are entangled, via a shared set of neurons in the fusion head, with predictive features from another, effectively masking out positive contributions from the predictive features of the former modality and leading to its collapse. We further prove that cross-modal knowledge distillation implicitly disentangles such representations by freeing up rank bottlenecks in the student encoder, denoising the fusion-head outputs without negatively impacting the predictive features from either modality. Based on the above findings, we propose an algorithm that prevents modality collapse through explicit basis reallocation, with applications in dealing with missing modalities. Extensive experiments on multiple multimodal benchmarks validate our theoretical claims.

Code Release Statement

Unfortunately, we are unable to open-source the code at the moment due to internal business reasons. However, there is a possibility (but not a guarantee, which is why we didn’t promise the release of our code as part of our submission) that we might do this in the future. In case we do, we will update our project page accordingly.

BibTeX


            @inproceedings{
               chaudhuri2024mmcollapse,
               title={A Closer Look at Multimodal Representation Collapse},
               author={Abhra Chaudhuri and Anjan Dutta and Tu Bui and Serban Georgescu},
               booktitle={International Conference on Machine Learning},
               year={2025},
          }
         

Last updated: 04 September 2025 | Template Credit: Nerfies