Siamese Neural Networks in a nutshell

5 min readJun 27, 2021

What are Siamese Networks?

A Siamese neural network (sometimes called a twin neural network) is an artificial neural network that contains two or more identical subnetworks which means they have the same configuration with the same parameters and weights. ‘identical’ here means, they have the same configuration with the same parameters and weights. Parameter updating is mirrored across both sub-networks. It is used to find the similarity of the inputs by comparing its feature vectors, so these networks are used in many applications. Usually, we only train one of the subnetworks and use the same configuration for other sub-networks. These networks are used to find the similarity of the inputs by comparing their feature vectors.

Traditionally, a neural network learns to predict multiple classes. This poses a problem when we need to add/remove new classes to the data. In this case, we have to update the neural network and retrain it on the whole dataset. Also, deep neural networks need a large volume of data to train on. SNNs, on the other hand, learn a similarity function. Thus, we can train it to see if the two images are the same. This enables us to classify new classes of data without training the network again.

Not only the architecture of the subnetworks must be identical, but the weights have to be shared among them as well for the network to be called “Siamese”. The main idea behind siamese networks is that they can learn useful data descriptors that can be further used to compare between the inputs of the respective subnetworks. Hereby, inputs can be anything from numerical data, image data, or even sequential data such as sentences or time signals.

Key features of Siamese Network

Siamese network takes two different inputs passed through two similar subnetworks with the same architecture, parameters, and weights.
The two subnetworks are a mirror image of each other, just like the Siamese twins. Hence, any change to any subnetworks architecture, parameter, or weights is also applied to the other subnetwork.
The two subnetwork outputs an encoding to calculate the difference between the two inputs.
The Siamese network’s objective is to classify if the two inputs are the same or different using the Similarity score. The Similarity score can be calculated using Binary cross-entropy, Contrastive function, or Triplet loss, which are techniques for the general distance metric learning approach.
Siamese network is a one-shot classifier that uses discriminative features to generalize the unfamiliar categories from an unknown distribution.

Working of Siamese Neural Network

Training the Siamese Neural Network
Load the dataset containing the different classes
Create positive and negative data pairs. Positive data pair is when both the inputs are the same, and a negative pair is when the two inputs are dissimilar.
Build the Convolutional neural network, which outputs the feature encoding using a fully connected layer. This is the sister CNN’s through which we will pass the two inputs. The sister CNN’s should have the same architecture, hyperparameters, and weights.
Build the differencing layer to calculate the Euclidian distance between the two sister CNN networks encoding output.
The final layer is a fully connected layer with a single node using the sigmoid activation function to output the Similarity score.
Compile the model using binary cross-entropy
Testing the Siamese Neural Network
Send two inputs to the trained model to output the Similarity score.
As the last layer uses the sigmoid activation function, it outputs a value in the range 0 to 1. A Similarity score close to 1 implies that the two inputs are similar. A Similarity score close to 0 implies that the two inputs are dissimilar. A good rule of thumb is to use a similarity cutoff threshold value of 0.5.

Pros and Cons of Siamese Networks

The main advantages of Siamese Networks are,

More Robust to class Imbalance: With the aid of One-shot learning, given a few images per class is sufficient for Siamese Networks to recognize those images in the future
Nice to an ensemble with the best classifier: Given that its learning mechanism is somewhat different from Classification, simple averaging of it with a Classifier can do much better than average 2 correlated Supervised models
Learning from Semantic Similarity: Siamese focuses on learning embeddings that place the same classes/concepts close together. Hence, can learn semantic similarity.

The downsides of the Siamese Networks can be,

Needs more training time than normal networks: Since Siamese Networks involves quadratic pairs to learn from (to see all information available) it is slower than the normal classification type of learning(pointwise learning)
Doesn’t output probabilities: Since training involves pairwise learning, it won’t output the probabilities of the prediction, but the distance from each class

Applications Of Siamese Networks

Siamese networks have wide-ranging applications. Here are a few of them:

One-shot learning. In this learning scenario, a new training dataset is presented to the trained (classification) network, with only one sample per class. Afterward, the classification performance on this new dataset is tested on a separate testing dataset. As siamese networks first learn discriminative features for a large specific dataset, they can be used to generalize this knowledge to entirely new classes and distributions as well.
Pedestrian tracking for video surveillance (Leal-Taixé, Laura, Cristian Canton-Ferrer, and Konrad Schindler. “Learning by tracking: Siamese CNN for robust target association.” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. 2016.). In this work, a siamese CNN network is combined with the size and position features of image patches to track multiple persons in the field-of-view of the camera by detecting their position in each video frame, learning the associations between multiple frames, and computing the trajectories.
Cosegmentation (Mukherjee, Prerana, Brejesh Lall, and Snehith Lattupally. “Object segmentation using deep Siamese network.” arXiv preprint arXiv:1803.02555 (2018).).
Matching resumes to jobs (Maheshwary, Saket, and Hemant Misra. “Matching Resumes to Jobs via Deep Siamese Network.” Companion of The Web Conference 2018 on The Web Conference 2018. International World Wide Web Conferences Steering Committee, 2018.). In this exotic application, the network tries to find matching job postings for applicants. In order to do this, a trained siamese CNN network extracts deep contextual information from both the postings and the resumes and computes their semantic similarity. The hypothesis is that matching resume — posting pairs will rank higher on the similarity scale than non-matching ones.
Face recognition system for the employees in an organization.
One-shot learning is applied to drug discovery where data is very scarce.
One-shot learning to build an offline signature verification system which is very useful for Banks and other Government and also private institutions

Conclusion

Siamese network inspired by the Siamese twins is a one-shot classification to differentiate between similar and dissimilar images. It can be applied even when you don’t know all of your training time classes and have limited training data. Siamese network is based on a metrics learning approach that finds the relative distance between its inputs using binary cross-entropy or contrastive loss, or triple loss.