News

This study proposes an innovative network to fuse infrared and visible images, called HitFusion, which uses the cross-feature transformer module and is compatible with high-level vision tasks. Firstly ...
We present X-Decoder, a generalized decoding model that can predict pixel-level segmentation and language tokens seamlessly. X-Decoder takes as input two types of queries: (i) generic non-semantic ...