Publications
For the complete list, please see my Google Scholar Profile.
2024
- SoccerNet game state reconstruction: End-to-end athlete tracking and identification on a minimapVladimir Somers, Victor Joos, Anthony Cioppa, Silvio Giancola, Seyed Abolfazl Ghasemzadeh, Floriane Magera, Baptiste Standaert, Amir M Mansourian, Xin Zhou, Shohreh Kasaei, and othersIn IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2024
Tracking and identifying athletes on the pitch holds acentral role in collecting essential insights from the game,such as estimating the total distance covered by players orunderstanding team tactics. This tracking and identification process is crucial for reconstructing the game state,defined by the athletes’ positions and identities on a 2Dtop-view of the pitch, (i.e. a minimap). However, reconstructing the game state from videos captured by a singlecamera is challenging. It requires understanding the position of the athletes and the viewpoint of the camera to localize and identify players within the field. In this work,we formalize the task of Game State Reconstruction and introduce SoccerNet-GSR, a novel Game State Reconstruction dataset focusing on football videos. SoccerNet-GSRis composed of 200 video sequences of 30 seconds, annotated with 9.37 million line points for pitch localization andcamera calibration, as well as over 2.36 million athlete positions on the pitch with their respective role, team, and jersey number. Furthermore, we introduce GS-HOTA, a novelmetric to evaluate game state reconstruction methods. Finally, we propose and release an end-to-end baseline forgame state reconstruction, bootstrapping the research onthis task. Our experiments show that GSR is a challengingnovel task, which opens the field for future research.
@inproceedings{somers2024soccernet, title = {SoccerNet game state reconstruction: End-to-end athlete tracking and identification on a minimap}, author = {Somers, Vladimir and Joos, Victor and Cioppa, Anthony and Giancola, Silvio and Ghasemzadeh, Seyed Abolfazl and Magera, Floriane and Standaert, Baptiste and Mansourian, Amir M and Zhou, Xin and Kasaei, Shohreh and others}, booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)}, year = {2024} }
- ArXivAttention-guided Feature Distillation for Semantic SegmentationAmir M Mansourian, Arya Jalali, Rozhan Ahmadi, and Shohreh KasaeiarXiv preprint arXiv:2403.05451, 2024
In contrast to existing complex methodologies commonly employed for distilling knowledge from a teacher to a student, the pro-posed method showcases the efficacy of a simple yet powerful method for utilizing refined feature maps to transfer attention. The proposed method has proven to be effective in distilling rich information, outperforming existing methods in semantic segmentation as a dense prediction task. The proposed Attention-guided Feature Distillation (AttnFD) method, employs the Convolutional Block Attention Module (CBAM), which refines feature maps by taking into account both channel-specific and spatial information content. By only using the Mean Squared Error (MSE) loss function between the refined feature maps of the teacher and the student,AttnFD demonstrates outstanding performance in semantic segmentation, achieving state-of-the-art results in terms of mean Intersection over Union (mIoU) on the PascalVoc 2012 and Cityscapes datasets.
@article{mansourian2024attention, title = {Attention-guided Feature Distillation for Semantic Segmentation}, author = {Mansourian, Amir M and Jalali, Arya and Ahmadi, Rozhan and Kasaei, Shohreh}, journal = {arXiv preprint arXiv:2403.05451}, year = {2024} }
- Deep Spectral Improvement for Unsupervised Image Instance SegmentationFarnoosh Arefi, Amir M Mansourian, and Shohreh KasaeiPlos One, 2024
Recently, there has been growing interest in deep spectral methods for image localization and segmentation, influenced by traditional spectral segmentation approaches. These methods reframe the image decomposition process as a graph partitioning task by extracting features using self-supervised learning and utilizing the Laplacian of the affinity matrix to obtain eigensegments. However, instance segmentation has received less attention compared to other tasks within the context of deep spectral methods. This paper addresses the fact that not all channels of the feature map extracted from a selfsupervised backbone contain sufficient information for instance segmentation purposes. In fact, some channels are noisy and hinder the accuracy of the task. To overcome this issue, this paper proposes two channel reduction modules: Noise Channel Reduction (NCR) and Deviation-based Channel Reduction (DCR). The NCR retains channels with lower entropy, as they are less likely to be noisy, while DCR prunes channels with low standard deviation, as they lack sufficient information for effective instance segmentation. Furthermore, the paper demonstrates that the dot product, commonly used in deep spectral methods, is not su itable for instance segmentation due to its sensitivity to feature map values, potentially leading to incorrect instance segments. To address this issue, a new similarity metric called Bray-Curtis over Chebyshev (BoC) is proposed. It takes into account the distribution of features in addition to their values, providing a more robust similarity measure for instance segmentation. Quantitative and qualitative results on the Youtube-VIS2019 dataset highlight the improvements achieved by the proposed channel reduction methods and the use of BoC instead of the conventional dot product for creating the affinity matrix. These improvements are observed in terms of mean Intersection over Union (mIoU) and extracted instance segments, demonstrating enhanced instance segmentation performance.
@article{arefi2024deep, title = {Deep Spectral Improvement for Unsupervised Image Instance Segmentation}, author = {Arefi, Farnoosh and Mansourian, Amir M and Kasaei, Shohreh}, year = {2024}, journal = {Plos One} }
- Rethinking RAFT for Efficient Optical FlowNavid Eslami, Farnoosh Arefi, Amir M Mansourian, and Shohreh KasaeiInternational Conference on Machine Vision and Image Processing (MVIP), 2024
Despite significant progress in deep learning-based optical flow methods, accurately estimating large displacements and repetitive patterns remains a challenge. The limitations of local features and similarity search patterns used in these algorithms contribute to this issue. Additionally, some existing methods suffer from slow runtime and excessive graphic memory consumption. To address these problems, this paper proposes a novel approach based on the RAFT framework. The proposed Attention-based Feature Localization (AFL) approach incorporates the attention mechanism to handle global feature extraction and address repetitive patterns. It introduces an operator for matching pixels with corresponding counterparts in the second frame and assigning accurate flow values. Furthermore, an Amorphous Lookup Operator (ALO) is proposed to enhance convergence speed and improve RAFTs ability to handle large displacements by reducing data redundancy in its search operator and expanding the search space for similarity extraction. The proposed method, Efficient RAFT (Ef-RAFT),achieves significant improvements of 10% on the Sintel dataset and 5% on the KITTI dataset over RAFT. Remarkably, these enhancements are attained with a modest 33% reduction in speed and a mere 13% increase in memory usage.
@article{eslami2024rethinking, title = {Rethinking RAFT for Efficient Optical Flow}, author = {Eslami, Navid and Arefi, Farnoosh and Mansourian, Amir M and Kasaei, Shohreh}, year = {2024}, journal = {International Conference on Machine Vision and Image Processing (MVIP)} }
2023
- Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual TrackingAmir M Mansourian, Vladimir Somers, Christophe De Vleeschouwer, and Shohreh KasaeiIn Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports, 2023
Effective tracking and re-identification of players is essential foranalyzing soccer videos. But, it is a challenging task due to the non-linear motion of players, the similarity in appearance of playersfrom the same team, and frequent occlusions. Therefore, the abilityto extract meaningful embeddings to represent players is crucialin developing an effective tracking and re-identification system.In this paper, a multi-purpose part-based person representationmethod, called PRTreID, is proposed that performs three tasks ofrole classification, team affiliation, and re-identification, simultane-ously. In contrast to available literature, a single network is trainedwith multi-task supervision to solve all three tasks, jointly. The pro-posed joint method is computationally efficient due to the sharedbackbone. Also, the multi-task learning leads to richer and morediscriminative representations, as demonstrated by both quanti-tative and qualitative results. To demonstrate the effectiveness ofPRTreID, it is integrated with a state-of-the-art tracking method,using a part-based post-processing module to handle long-termtracking. The proposed tracking method, outperforms all existingtracking methods on the challenging SoccerNet tracking dataset.
@inproceedings{mansourian2023multi, title = {Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual Tracking}, author = {Mansourian, Amir M and Somers, Vladimir and De Vleeschouwer, Christophe and Kasaei, Shohreh}, booktitle = {Proceedings of the 6th International Workshop on Multimedia Content Analysis in Sports}, pages = {103--112}, year = {2023} }
- SESoccerNet 2023 Challenges ResultsAnthony Cioppa, Silvio Giancola, Vladimir Somers, Floriane Magera, Xin Zhou, Hassan Mkhallati, Adrien Deliege, Jan Held, Carlos Hinojosa, Amir M Mansourian, and othersSports Engineering, 2023
The SoccerNet 2023 challenges were the third annual video understanding challenges organized bythe SoccerNet team. For this third edition, the challenges were composed of seven vision-based taskssplit into three main themes. The first theme, broadcast video understanding, is composed of threehigh-level tasks related to describing events occurring in the video broadcasts: (1) action spotting,focusing on retrieving all timestamps related to global actions in soccer, (2) ball action spotting,focusing on retrieving all timestamps related to the soccer ball change of state, and (3) dense videocaptioning, focusing on describing the broadcast with natural language and anchored timestamps.The second theme, field understanding, relates to the single task of (4) camera calibration, focus-ing on retrieving the intrinsic and extrinsic camera parameters from images. The third and lasttheme, player understanding, is composed of three low-level tasks related to extracting informa-tion about the players: (5) re-identification, focusing on retrieving the same players across multipleviews, (6) multiple object tracking, focusing on tracking players and the ball through unedited videostreams, and (7) jersey number recognition, focusing on recognizing the jersey number of players fromtracklets. Compared to the previous editions of the SoccerNet challenges, tasks (2-3-7) are novel,including new annotations and data, task (4) was enhanced with more data and annotations, andtask (6) now focuses on end-to-end approaches.
@article{cioppa2023soccernet, title = {SoccerNet 2023 Challenges Results}, author = {Cioppa, Anthony and Giancola, Silvio and Somers, Vladimir and Magera, Floriane and Zhou, Xin and Mkhallati, Hassan and Deliege, Adrien and Held, Jan and Hinojosa, Carlos and Mansourian, Amir M and others}, journal = {Sports Engineering}, year = {2023} }
- ArXivAICSD: Adaptive Inter-Class Similarity Distillation for Semantic SegmentationAmir M Mansourian, Rozhan Ahmadi, and Shohreh KasaeiarXiv preprint arXiv:2308.04243, 2023
In recent years, deep neural networks have achievedremarkable accuracy in computer vision tasks. With inferencetime being a crucial factor, particularly in dense predictiontasks such as semantic segmentation, knowledge distillation hasemerged as a successful technique for improving the accuracyof lightweight student networks. The existing methods oftenneglect the information in channels and among different classes.To overcome these limitations, this paper proposes a novelmethod called Inter-Class Similarity Distillation (ICSD) for thepurpose of knowledge distillation. The proposed method transfershigh-order relations from the teacher network to the studentnetwork by independently computing intra-class distributionsfor each class from network outputs. This is followed bycalculating inter-class similarity matrices for distillation usingKL divergence between distributions of each pair of classes. Tofurther improve the effectiveness of the proposed method, anAdaptive Loss Weighting (ALW) training strategy is proposed.Unlike existing methods, the ALW strategy gradually reducesthe influence of the teacher network towards the end of trainingprocess to account for errors in teacher’s predictions. Extensiveexperiments conducted on two well-known datasets for semanticsegmentation, Cityscapes and Pascal VOC 2012, validate theeffectiveness of the proposed method in terms of mIoU andpixel accuracy. The proposed method outperforms most ofexisting knowledge distillation methods as demonstrated by bothquantitative and qualitative evaluations.
@article{mansourian2023aicsd, title = {AICSD: Adaptive Inter-Class Similarity Distillation for Semantic Segmentation}, author = {Mansourian, Amir M and Ahmadi, Rozhan and Kasaei, Shohreh}, journal = {arXiv preprint arXiv:2308.04243}, year = {2023} }
- An Efficient Knowledge Distillation Architecture for Real-time Semantic SegmentationAmir M Mansourian, Nader Karimi, and Shohreh KasaeiAUT Journal of Modeling and Simulation, 2023
In recent years, Convolutional Neural Networks (CNNs) have made significant strides in the field of segmentation, particularly in semantic segmentation where both accuracy and efficiency are crucial. However, despite their high accuracy, these deep networks are not practical for real-time use due to their low inference speed. This issue has prompted researchers to explore various techniques to improve the efficiency of CNNs. One such technique is knowledge distillation, which involves transferring knowledge from a larger, cumbersome (teacher) model to a smaller, more compact (student) model. This paper proposes a simple yet efficient approach to address the issue of low inference speed in CNNs using knowledge distillation. The proposed method involves distilling knowledge from the feature maps of the teacher model to guide the learning of the student model. The approach uses a straightforward technique known as pixel-wise distillation to transfer the feature maps of the last convolution layer of the teacher model to the student model. Additionally, a pair-wise distillation technique is used to transfer pair-wise similarities of the intermediate layers. To validate the effectiveness of the proposed method, extensive experiments were conducted on the PascalVoc 2012 dataset using a state-of-the art DeepLabV3+ segmentation network with different backbone architectures. The results showed that the proposed method achieved a balanced mean Intersection over Union (mIoU) and training time.
@article{mansourian2023anefficient, author = {Mansourian, Amir M and Karimi, Nader and Kasaei, Shohreh}, title = {An Efficient Knowledge Distillation Architecture for Real-time Semantic Segmentation}, journal = {AUT Journal of Modeling and Simulation}, year = {2023}, }