Comparative Analysis of CNN Architectures in Siamese Networks with Test-Time Augmentation for Trademark Image Similarity Detection
DOI:
https://doi.org/10.15294/sji.v11i4.13811Keywords:
Trademark, CNN, Siamese neural network, Test-time augmentation, Data AugmentationAbstract
Purpose: This study aims to enhance the detection of trademark image similarity by conducting a comparative analysis of various Convolutional Neural Network (CNN) architectures within Siamese networks, integrated with test-time augmentation techniques. Existing methods often face challenges in accurately capturing subtle visual similarities between trademarks due to limitations in feature extraction and generalization capabilities. The research seeks to identify the most effective CNN architecture for this task and to assess the impact of test-time augmentation on model performance.
Methods: The study implements Siamese networks utilizing three distinct CNN architectures: VGG16, VGG19, and ResNet50. Each network is trained on a dataset of trademark images to learn deep feature representations that can discriminate between similar and dissimilar trademarks. During the evaluation phase, test-time augmentation (TTA) is applied to enhance model robustness by averaging predictions over multiple augmented versions of the input images. TTA includes transformations such as random rotations (up to 40%), width and height shifts (up to 20%), random shear transformations, zooming (up to 20%), horizontal and vertical flips, and random brightness adjustments.
Result: Experimental findings reveal that the Siamese network based on VGG19 achieves the highest accuracy at 98.82%, outperforming the VGG16-based network with an accuracy of 97.07% and the ResNet50-based network with 50.00% accuracy. The application of TTA has improved performance across all models, with the VGG19 model receiving the highest improvement. The extremely low accuracy of ResNet50 can be attributed to its misinterpretation of original trademark images as close-forged ones, probably due to overfitting or lack of an efficient ability in generalizing very fine visual features.
Novelty: The study conducted a comparative analysis of CNN architectures, namely VGG16, VGG19, and ResNet50 in Siamese networks for trademark image similarity detection.