[ICASSP'25] MEDIAN: Adaptive Intermediate-grained Aggregation Network for Composed Image Retrieval

Qinlei Huang1, Zhiwei Chen1, Zixu Li1, Chunxiao Wang2, Xuemeng Song3,
Yupeng Hu1*, Liqiang Nie4,
1School of Software, Shandong University,
2Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Qilu University of Technology (Shandong Academy of Sciences)
3School of Computer Science and Technology, Shandong University,
4School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)

*Corresponding author.

Abstract

MY ALT TEXT

Multi-grained Features

MY ALT TEXT

An example of the multi-grained features.


Framework: adaptive interMEDiate-graIned Aggregation Networ (MEDIAN)

MY ALT TEXT

Overall architecture of our proposed MEDIAN: 1) Intermediate Granularity Extractor, 2) Target-guided Semantic Aligning, and 3) Multi-Grained Composition. For ease of representation, we abbreviate the intermediate-grained feature as the median feature in the figure.


Experiment

MY ALT TEXT
MY ALT TEXT

MY ALT TEXT

MY ALT TEXT

Attention visualization the CIRR dataset.

MY ALT TEXT

Qualitative examples of MEDIAN on CIRR datasets. The ground-truths are color-boxed.

BibTeX


        @inproceedings{MEDIAN,
        title={MEDIAN: Adaptive Intermediate-grained Aggregation Network for Composed Image Retrieval},
        author={Huang, Qinlei and Chen, Zhiwei and Li, Zixu and Wang, Chunxiao and Song, Xuemeng and Hu, Yupeng and Nie,
        Liqiang},
        booktitle={Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing},
        pages={1--5},
        year={2025},
        organization={IEEE}
        }