News

Generalist models, capable of handling multiple modalities and tasks simultaneously, are currently one of the hottest research topics. However, due to interference between different tasks during the ...
Visual grounding involves the identification and localization of image regions given textual descriptions. To reduce the manual labeling effort on region-text pairs, unsupervised visual grounding aims ...