1

DrML: Diagnosing and Rectifying Vision Models using Language

The traditional process of diagnosing model behaviors in deployment settings involves labor-intensive data acquisition and annotation. Our proposed method, DrML, can discover high-error data slices, identify influential attributes and further rectify undesirable model behaviors, without requiring any visual data. Through a combination of theoretical explanation and empirical verification, we present conditions under which classifiers trained on embeddings from one modality can be equivalently applied to embeddings from another modality.

Yuhui Zhang, Jeff Z HaoChen, Mars (Shih-Cheng) Huang, Kuan-Chieh Wang, James Zou, Serena Yeung

DrML: Diagnosing and Rectifying Vision Models using Language

Adapting pre-trained vision transformers from 2D to 3D through weight inflation improves medical image segmentation

In this work, we use a simple yet effective weight inflation strategy to adapt pre-trained Transformers from 2D to 3D, retaining the benefit of both transfer learning and depth information. We further investigate the effectiveness of transfer from different pre-training sources and objectives. Our approach achieves state-of-the-art performances across a broad range of 3D medical image datasets.

Yuhui Zhang, Mars (Shih-Cheng) Huang, Zhengping Zhou,, Matthew P. Lungren, Serena Yeung

Adapting pre-trained vision transformers from 2D to 3D through weight inflation improves medical image segmentation

GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition

The purpose of this work is to develop label-efficient multimodal medical imaging representations by leveraging radiology reports. Specifically, we propose an attention-based framework (GLoRIA) for learning global and local representations by contrasting image sub-regions and words in the paired report. In addition, we propose methods to leverage the learned representations for various downstream medical image recognition tasks with limited labels.

Mars (Shih-Cheng) Huang, Liyue Shen, Matthew P Lungren, Serena Yeung

GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-Efficient Medical Image Recognition