Terminus Group has explored two parallel modules—a dynamic local enhancement module (DLE) & a novel unary co-occurrence excitation module (UCE) along with the multi-head self-attention mechanism to enhance the transformer for the first time.
After the combination based on a parallel-designed, the new architecture can well complement the multi-head self-attention’s degrading performance in tracking convolutional features when global correlations are not exhibited, meanwhile be able to capture more informative local patches and actively searches for the local co-occurrence between patches within images.
What’s the issue?
It is uncertain whether the power of transformer architectures can complement existing convolutional neural networks. A few recent attempts have combined convolution with transformer design through a range of structures in series. And the multi-head self-attention conducted on convolutional features is mainly sensitive to global correlations and that the performance degrades when these correlations are not exhibited.
The newly-built parallel-designed Dynamic Unary Convolution in Transformer can focus on the global correlations while capture more informative local patches, and actively search for the local co-occurrence between patches within images.
Essential computer vision tasks such as image-based classification, segmentation, retrieval and density estimation will benefit from this innovative initiative.
For original paper info, please refer to:
Dynamic Unary Convolution in Transformers | IEEE Journals & Magazine | IEEE Xplore