Publications
For the complete list, please see my Google Scholar Profile.
2025
- ArxivA Comprehensive Survey on Knowledge DistillationAmir M Mansourian, Rozhan Ahmadi, Masoud Ghafouri, Amir Mohammad Babaei, Elaheh Badali Golezani, Zeynab Yasamani Ghamchi, Vida Ramezanian, Alireza Taherian, Kimia Dinashi, Amirali Miri, and Shohreh KasaeiarXiv preprint arXiv:2503.12067, 2025
Deep Neural Networks (DNNs) have achieved notable performance in the fields of computer vision and natural language processing with various applications in both academia and industry. However, with recent advancements in DNNs and transformer models with a tremendous number of parameters, deploying these large models on edge devices causes serious issues such as high runtime and memory consumption. This is especially concerning with the recent large-scale foundation models, Vision-Language Models (VLMs), and Large Language Models (LLMs). Knowledge Distillation (KD) is one of the prominent techniques proposed to address the aforementioned problems using a teacher-student architecture. More specifically, a lightweight student model is trained using additional knowledge from a cumbersome teacher model. In this work, a comprehensive survey of knowledge distillation methods is proposed. This includes reviewing KD from different aspects: distillation sources, distillation schemes, distillation algorithms, distillation by modalities, applications of distillation, and comparison among existing methods. In contrast to most existing surveys, which are either outdated or simply update former surveys, this work proposes a comprehensive survey with a new point of view and representation structure that categorizes and investigates the most recent methods in knowledge distillation. This survey considers various critically important subcategories, including KD for diffusion models, 3D inputs, foundational models, transformers, and LLMs. Furthermore, existing challenges in KD and possible future research directions are discussed.
@article{mansourian2025comprehensive, title = {A Comprehensive Survey on Knowledge Distillation}, author = {Mansourian, Amir M and Ahmadi, Rozhan and Ghafouri, Masoud and Babaei, Amir Mohammad and Golezani, Elaheh Badali and Ghamchi, Zeynab Yasamani and Ramezanian, Vida and Taherian, Alireza and Dinashi, Kimia and Miri, Amirali and Kasaei, Shohreh}, year = {2025}, journal = {arXiv preprint arXiv:2503.12067} }
- Improving Weakly-supervised Video Instance Segmentation by Leveraging Spatio-temporal ConsistencyFarnoosh Arefi, Amir M Mansourian, and Shohreh KasaeiIEEE Access, 2025
The performance of Video Instance Segmentation (VIS) methods has improved significantly with the advent of transformer networks. However, these networks often face challenges in training due to the high annotation cost. To address this, unsupervised and weakly-supervised methods have been developed to reduce the dependency on annotations. This work introduces a novel weakly-supervised method called Eigen-Cluster VIS that, without requiring any mask annotations, achieves competitive accuracy compared to other VIS approaches. This method is based on two key innovations: a Temporal Eigenvalue Loss (TEL) and a clip-level Quality Cluster Coefficient (QCC). The TEL ensures temporal coherence by leveraging the eigenvalues of the Laplacian matrix derived from graph adjacency matrices. By minimizing the mean absolute error between the eigenvalues of adjacent frames, this loss function promotes smooth transitions and stable segmentation boundaries over time, reducing temporal discontinuities and improving overall segmentation quality. The QCC employs the K-means method to ensure the quality of spatio-temporal clusters without relying on ground truth masks. Using the Davies-Bouldin score, the QCC provides an unsupervised measure of feature discrimination, allowing the model to self-evaluate and adapt to varying object distributions, enhancing robustness during the testing phase. These enhancements are computationally efficient and straightforward, offering significant performance gains without additional annotated data. The proposed Eigen-Cluster VIS method is evaluated on the YouTube-Video Instance Segmentation (YouTube-VIS) 2019/2021 and Occluded Video Instance Segmentation (OVIS) datasets, demonstrating that it effectively narrows the performance gap between the fully-supervised and weakly-supervised VIS approaches.
@article{arefi2025eigen, title = {Improving Weakly-supervised Video Instance Segmentation by Leveraging Spatio-temporal Consistency}, author = {Arefi, Farnoosh and Mansourian, Amir M and Kasaei, Shohreh}, year = {2025}, journal = {IEEE Access}, }
- AICSD: Adaptive Inter-Class Similarity Distillation for Semantic SegmentationAmir M Mansourian, Rozhan Ahmadi, and Shohreh KasaeiMultimedia Tools and Applications, 2025
In recent years, deep neural networks have achieved remarkable accuracy in computer vision tasks. With inference time being a crucial factor, particularly in dense prediction tasks such as semantic segmentation, knowledge distillation has emerged as a successful technique for improving the accuracy of lightweight student networks. The existing methods often neglect the information in channels and among different classes. To overcome these limitations, this paper proposes a novel method called Inter-Class Similarity Distillation (ICSD) for the purpose of knowledge distillation. The proposed method transfers high-order relations from the teacher network to the student network by independently computing intra-class distributions for each class from network outputs. This is followed by calculating inter-class similarity matrices for distillation using KL divergence between distributions of each pair of classes. To further improve the effectiveness of the proposed method, an Adaptive Loss Weighting (ALW) training strategy is proposed. Unlike existing methods, the ALW strategy gradually reduces the influence of the teacher network towards the end of the training process to account for errors in teacher’s predictions. Extensive experiments conducted on two well-known datasets for semantic segmentation, Cityscapes, Pascal VOC 2012, and COCO, validate the effectiveness of the proposed method in terms of mIoU and pixel accuracy. The proposed method outperforms most of existing knowledge distillation methods as demonstrated by both quantitative and qualitative evaluations.
@article{mansourian2025aicsd, title = {AICSD: Adaptive Inter-Class Similarity Distillation for Semantic Segmentation}, author = {Mansourian, Amir M and Ahmadi, Rozhan and Kasaei, Shohreh}, journal = {Multimedia Tools and Applications}, year = {2025} }
2024
- SoccerNet game state reconstruction: End-to-end athlete tracking and identification on a minimapVladimir Somers, Victor Joos, Anthony Cioppa, Silvio Giancola, Seyed Abolfazl Ghasemzadeh, Floriane Magera, Baptiste Standaert, Amir M Mansourian, Xin Zhou, Shohreh Kasaei, and othersIn IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
Tracking and identifying athletes on the pitch holds acentral role in collecting essential insights from the game,such as estimating the total distance covered by players orunderstanding team tactics. This tracking and identification process is crucial for reconstructing the game state,defined by the athletes’ positions and identities on a 2Dtop-view of the pitch, (i.e. a minimap). However, reconstructing the game state from videos captured by a singlecamera is challenging. It requires understanding the position of the athletes and the viewpoint of the camera to localize and identify players within the field. In this work,we formalize the task of Game State Reconstruction and introduce SoccerNet-GSR, a novel Game State Reconstruction dataset focusing on football videos. SoccerNet-GSRis composed of 200 video sequences of 30 seconds, annotated with 9.37 million line points for pitch localization andcamera calibration, as well as over 2.36 million athlete positions on the pitch with their respective role, team, and jersey number. Furthermore, we introduce GS-HOTA, a novelmetric to evaluate game state reconstruction methods. Finally, we propose and release an end-to-end baseline forgame state reconstruction, bootstrapping the research onthis task. Our experiments show that GSR is a challengingnovel task, which opens the field for future research.
@inproceedings{somers2024soccernet, title = {SoccerNet game state reconstruction: End-to-end athlete tracking and identification on a minimap}, author = {Somers, Vladimir and Joos, Victor and Cioppa, Anthony and Giancola, Silvio and Ghasemzadeh, Seyed Abolfazl and Magera, Floriane and Standaert, Baptiste and Mansourian, Amir M and Zhou, Xin and Kasaei, Shohreh and others}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year = {2024} }
- ArXivAttention-guided Feature Distillation for Semantic SegmentationAmir M Mansourian, Arya Jalali, Rozhan Ahmadi, and Shohreh KasaeiarXiv preprint arXiv:2403.05451, 2024
In contrast to existing complex methodologies commonly employed for distilling knowledge from a teacher to a student, the pro-posed method showcases the efficacy of a simple yet powerful method for utilizing refined feature maps to transfer attention. The proposed method has proven to be effective in distilling rich information, outperforming existing methods in semantic segmentation as a dense prediction task. The proposed Attention-guided Feature Distillation (AttnFD) method, employs the Convolutional Block Attention Module (CBAM), which refines feature maps by taking into account both channel-specific and spatial information content. By only using the Mean Squared Error (MSE) loss function between the refined feature maps of the teacher and the student,AttnFD demonstrates outstanding performance in semantic segmentation, achieving state-of-the-art results in terms of mean Intersection over Union (mIoU) on the PascalVoc 2012 and Cityscapes datasets.
@article{mansourian2024attention, title = {Attention-guided Feature Distillation for Semantic Segmentation}, author = {Mansourian, Amir M and Jalali, Arya and Ahmadi, Rozhan and Kasaei, Shohreh}, journal = {arXiv preprint arXiv:2403.05451}, year = {2024} }
- Deep Spectral Improvement for Unsupervised Image Instance SegmentationFarnoosh Arefi, Amir M Mansourian, and Shohreh KasaeiPlos One, 2024
Recently, there has been growing interest in deep spectral methods for image localization and segmentation, influenced by traditional spectral segmentation approaches. These methods reframe the image decomposition process as a graph partitioning task by extracting features using self-supervised learning and utilizing the Laplacian of the affinity matrix to obtain eigensegments. However, instance segmentation has received less attention compared to other tasks within the context of deep spectral methods. This paper addresses the fact that not all channels of the feature map extracted from a selfsupervised backbone contain sufficient information for instance segmentation purposes. In fact, some channels are noisy and hinder the accuracy of the task. To overcome this issue, this paper proposes two channel reduction modules: Noise Channel Reduction (NCR) and Deviation-based Channel Reduction (DCR). The NCR retains channels with lower entropy, as they are less likely to be noisy, while DCR prunes channels with low standard deviation, as they lack sufficient information for effective instance segmentation. Furthermore, the paper demonstrates that the dot product, commonly used in deep spectral methods, is not su itable for instance segmentation due to its sensitivity to feature map values, potentially leading to incorrect instance segments. To address this issue, a new similarity metric called Bray-Curtis over Chebyshev (BoC) is proposed. It takes into account the distribution of features in addition to their values, providing a more robust similarity measure for instance segmentation. Quantitative and qualitative results on the Youtube-VIS2019 dataset highlight the improvements achieved by the proposed channel reduction methods and the use of BoC instead of the conventional dot product for creating the affinity matrix. These improvements are observed in terms of mean Intersection over Union (mIoU) and extracted instance segments, demonstrating enhanced instance segmentation performance.
@article{arefi2024deep, title = {Deep Spectral Improvement for Unsupervised Image Instance Segmentation}, author = {Arefi, Farnoosh and Mansourian, Amir M and Kasaei, Shohreh}, year = {2024}, journal = {Plos One} }
- Rethinking RAFT for Efficient Optical FlowNavid Eslami, Farnoosh Arefi, Amir M Mansourian, and Shohreh KasaeiInternational Conference on Machine Vision and Image Processing (MVIP), 2024
Despite significant progress in deep learning-based optical flow methods, accurately estimating large displacements and repetitive patterns remains a challenge. The limitations of local features and similarity search patterns used in these algorithms contribute to this issue. Additionally, some existing methods suffer from slow runtime and excessive graphic memory consumption. To address these problems, this paper proposes a novel approach based on the RAFT framework. The proposed Attention-based Feature Localization (AFL) approach incorporates the attention mechanism to handle global feature extraction and address repetitive patterns. It introduces an operator for matching pixels with corresponding counterparts in the second frame and assigning accurate flow values. Furthermore, an Amorphous Lookup Operator (ALO) is proposed to enhance convergence speed and improve RAFTs ability to handle large displacements by reducing data redundancy in its search operator and expanding the search space for similarity extraction. The proposed method, Efficient RAFT (Ef-RAFT),achieves significant improvements of 10% on the Sintel dataset and 5% on the KITTI dataset over RAFT. Remarkably, these enhancements are attained with a modest 33% reduction in speed and a mere 13% increase in memory usage.
@article{eslami2024rethinking, title = {Rethinking RAFT for Efficient Optical Flow}, author = {Eslami, Navid and Arefi, Farnoosh and Mansourian, Amir M and Kasaei, Shohreh}, year = {2024}, journal = {International Conference on Machine Vision and Image Processing (MVIP)} }
2023
- Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual TrackingAmir M Mansourian, Vladimir Somers, Christophe De Vleeschouwer, and Shohreh KasaeiIn Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports, 2023
Effective tracking and re-identification of players is essential foranalyzing soccer videos. But, it is a challenging task due to the non-linear motion of players, the similarity in appearance of playersfrom the same team, and frequent occlusions. Therefore, the abilityto extract meaningful embeddings to represent players is crucialin developing an effective tracking and re-identification system.In this paper, a multi-purpose part-based person representationmethod, called PRTreID, is proposed that performs three tasks ofrole classification, team affiliation, and re-identification, simultane-ously. In contrast to available literature, a single network is trainedwith multi-task supervision to solve all three tasks, jointly. The pro-posed joint method is computationally efficient due to the sharedbackbone. Also, the multi-task learning leads to richer and morediscriminative representations, as demonstrated by both quanti-tative and qualitative results. To demonstrate the effectiveness ofPRTreID, it is integrated with a state-of-the-art tracking method,using a part-based post-processing module to handle long-termtracking. The proposed tracking method, outperforms all existingtracking methods on the challenging SoccerNet tracking dataset.
@inproceedings{mansourian2023multi, title = {Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual Tracking}, author = {Mansourian, Amir M and Somers, Vladimir and De Vleeschouwer, Christophe and Kasaei, Shohreh}, booktitle = {Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports}, pages = {103--112}, year = {2023} }
- An Efficient Knowledge Distillation Architecture for Real-time Semantic SegmentationAmir M Mansourian, Nader Karimi, and Shohreh KasaeiAUT Journal of Modeling and Simulation, 2023
In recent years, Convolutional Neural Networks (CNNs) have made significant strides in the field of segmentation, particularly in semantic segmentation where both accuracy and efficiency are crucial. However, despite their high accuracy, these deep networks are not practical for real-time use due to their low inference speed. This issue has prompted researchers to explore various techniques to improve the efficiency of CNNs. One such technique is knowledge distillation, which involves transferring knowledge from a larger, cumbersome (teacher) model to a smaller, more compact (student) model. This paper proposes a simple yet efficient approach to address the issue of low inference speed in CNNs using knowledge distillation. The proposed method involves distilling knowledge from the feature maps of the teacher model to guide the learning of the student model. The approach uses a straightforward technique known as pixel-wise distillation to transfer the feature maps of the last convolution layer of the teacher model to the student model. Additionally, a pair-wise distillation technique is used to transfer pair-wise similarities of the intermediate layers. To validate the effectiveness of the proposed method, extensive experiments were conducted on the PascalVoc 2012 dataset using a state-of-the art DeepLabV3+ segmentation network with different backbone architectures. The results showed that the proposed method achieved a balanced mean Intersection over Union (mIoU) and training time.
@article{mansourian2023anefficient, author = {Mansourian, Amir M and Karimi, Nader and Kasaei, Shohreh}, title = {An Efficient Knowledge Distillation Architecture for Real-time Semantic Segmentation}, journal = {AUT Journal of Modeling and Simulation}, year = {2023}, }