Latest Results The latest content available from Springer
- Application of CLIP for efficient zero-shot learningel julio 26, 2024 a las 12:00 am
Abstract Zero-shot learning (ZSL) addresses the challenging task of recognizing classes absent during training. Existing methodologies focus on knowledge transfer from known to unknown categories by formulating a correlation between visual and semantic spaces. However, these methods are faced with constraints related to the discrimination of visual features and the integrity of semantic representations. To alleviate these limitations, we propose a novel Collaborative learning Framework for Zero-Shot Learning (CFZSL), which integrates the CLIP architecture into a fundamental zero-shot learner. Specifically, the foundational zero-shot learning model extracts visual features through a set of CNNs and maps them to a domain-specific semantic space. Simultaneously, the CLIP image encoder extracts visual features containing universal semantics. In this way, the CFZSL framework can obtain discriminative visual features for both domain-specific and domain-agnostic semantics. Additionally, a more comprehensive semantic space is explored by combining the latent feature space learned by CLIP and the domain-specific semantic space. Notably, we just leverage the pre-trained parameters of the CLIP model, mitigating the high training cost and potential overfitting issues associated with fine-tuning. Our proposed framework, characterized by its simple structure, undergoes training exclusively via classification and triplet loss functions. Extensive experimental results, conducted on three widely recognized benchmark datasets-AwA2, CUB, and SUN, conclusively affirm the effectiveness and superiority of our proposed approach.
- SAM-guided contrast based self-training for source-free cross-domain semantic segmentationel julio 26, 2024 a las 12:00 am
Abstract Traditional domain adaptive semantic segmentation methods typically assume access to source domain data during training, a paradigm known as source-access domain adaptation for semantic segmentation (SASS). To address data privacy concerns in real-world applications, source-free domain adaptation for semantic segmentation (SFSS) has recently been studied, eliminating the need for direct access to source data. Most SFSS methods primarily utilize pseudo-labels to regularize the model in either the label space or the feature space. Inspired by the segment anything model (SAM), we propose SAM-guided contrast based pseudo-label learning for SFSS in this work. Unlike previous methods that heavily rely on noisy pseudo-labels, we leverage the class-agnostic segmentation masks generated by SAM as prior knowledge to construct positive and negative sample pairs. This approach allows us to directly shape the feature space using contrastive learning. This design ensures the reliable construction of contrastive samples and exploits both intra-class and intra-instance diversity. Our framework is built upon a vanilla teacher–student network architecture for online pseudo-label learning. Consequently, the SFSS model can be jointly regularized in both the feature and label spaces in an end-to-end manner. Extensive experiments demonstrate that our method achieves competitive performance in two challenging SFSS tasks.
- CMLCNet: medical image segmentation network based on convolution capsule encoder and multi-scale local co-occurrenceel julio 26, 2024 a las 12:00 am
Abstract Medical images have low contrast and blurred boundaries between different tissues or between tissues and lesions. Because labeling medical images is laborious and requires expert knowledge, the labeled data are expensive or simply unavailable. UNet has achieved great success in the field of medical image segmentation. However, the pooling layer in downsampling tends to discard important information such as location information. It is difficult to learn global and long-range semantic interactive information well due to the locality of convolution operation. The usual solution is increasing the number of datasets or enhancing the training data though augmentation methods. However, to obtain a large number of medical datasets is tough, and the augmentation methods may increase the training burden. In this work, we propose a 2D medical image segmentation network with a convolutional capsule encoder and a multiscale local co-occurrence module. To extract more local detail and contextual information, the capsule encoder is introduced to learn the information about the target location and the relationship between the part and the whole. Multi-scale features can be fused by a new attention mechanism, which can then selectively emphasize salient features useful for a specific task by capturing global information and suppress background noise. The proposed attention mechanism is used to preserve the information that is discarded by pooling layers of the network. In addition, a multi-scale local co-occurrence algorithm is proposed, where the context and dependencies between different regions in an image can be better learned. Experimental results on the dataset of Liver, ISIC and BraTS2019 show that our network is superior to the UNet and other previous medical image segmentation networks under the same experimental conditions.
- RA-RevGAN: region-aware reversible adversarial example generation network for privacy-preserving applicationsel julio 26, 2024 a las 12:00 am
Abstract The rise of online sharing platforms has provided people with diverse and convenient ways to share images. However, a substantial amount of sensitive user information is contained within these images, which can be easily captured by malicious neural networks. To ensure the secure utilization of authorized protected data, reversible adversarial attack techniques have emerged. Existing algorithms for generating adversarial examples do not strike a good balance between visibility and attack capability. Additionally, the network oscillations generated during the training process affect the quality of the final examples. To address these shortcomings, we propose a novel reversible adversarial network based on generative adversarial networks (RA-RevGAN). In this paper, the generator is used for noise generation to map features into perturbations of the image, while the region selection module confines these perturbations to specific areas that significantly affect classification. Furthermore, a robust attack mechanism is integrated into the discriminator to stabilize the network’s training by optimizing convergence speed and minimizing time cost. Extensive experiments have demonstrated that the proposed method ensures a high image generation rate, excellent attack capability, and superior visual quality while maintaining high classification accuracy in image restoration.
- Multimedia Systemsel julio 26, 2024 a las 12:00 am