News
Generalist models, capable of handling multiple modalities and tasks simultaneously, are currently one of the hottest research topics. However, due to interference between different tasks during the ...
Visual grounding involves the identification and localization of image regions given textual descriptions. To reduce the manual labeling effort on region-text pairs, unsupervised visual grounding aims ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results