Unlocking AI's Potential: Zero-Shot and Few-Shot Learning for Voice and Image Recognition
DOI:
https://doi.org/10.62019/kcsndc55Abstract
The zero-shot and few-shot learning paradigms have emerged as promising solutions to the shortcomings of traditional deep models that require large volumes of labeled data for training. In the present paper, a full-fledged experimental study of the usage of zero-shot and few-shot learning methods in voice and image recognition tasks is provided. We analyze and compare several state-of-the-art architectures, such as CLIP, Whisper, and prototypical networks, in benchmark datasets including ESC-50 for audio classification and mini-ImageNet for image recognition. The experiments are designed to evaluate the generalization ability of models in situations where classes are unseen or sparsely represent during training. Our results show that multimodal models that are pre-trained on large-scale datasets have a high rate of performance in zero-shot scenarios, whereas metric-based few-shot approaches allow greater accuracy when only a small amount of supervision is provided. We also discuss cross-modal transfer ability, and examine how acquired representations in one modality (e.g., voice) can be used in the other (e.g., images). The findings highlight critical trade-offs between model complexity, data efficiency, and recognition accuracy which provide practical information to the deployment of lightweight and scalable AI systems in resource-limited settings. The paper develops the concept of generalized recognition systems and provides a base on how the concept will be researched on in the future in a low-resource learning environment. The main contributions of this work include a comparative study of ZSL and FSL methods, analysis of cross-modal transfer and identification of key trade-offs that lay the foundation for future research in generalized and adaptive AI.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Adnan Ali, Shoaib Farooq , Muhammad Zeeshan Shafi, Muhammad Talha Tahir Bajwa , Jamil Ur Rehman, Hanifullah

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
