Introducing Visual Similarity Engine as SentryPage New Feature
Why We Need This Type of Engine
There is one thing we will need to convert more website visitors: credibility.
Credibility shows customers that we are safe and trustworthy. It is impossible to generate more leads, sell more products, or attract more visitors; without it, almost every company struggles with website credibility. Most visitors leave web pages in 15 seconds or less. Web pages with a compelling value proposition will hold visitors’ attention longer.
Scientists from Microsoft Research analyzed page-visit duration for 205,873 web pages (with more than 10,000 visits to each page). They found that visitor time-on-site follows a Weibull distribution.
Weibull is a reliability metric used to analyze and predict the time-to-failure in components. Let's say we replace a spare part in a random piece of equipment. A Weibull analysis predicts when we will have to replace that specific part again. It does not sound all that helpful.
Replace “component failure” with “visitors leaving web pages,” which becomes very helpful.
One of the reasons your website can lose some credibility is because some drastic changes, it can be in a form of blank pagea, defacement, or some internal errora and you don’t know about it.
To answer those problems, we are introducing newest features from SentryPage called Visual Similarity Engine. This feature relies on machine learning to inform you if some drastic changes occur in your website.
But, How Does It Work?
Before we are creating alert and send it to you when detecting drastic changes on your website, we need some tools that can detect changes accurately. But, to process raw image without any processing in between will take a lot of time to do it.Hence, we need to reduce noise in picture to give fast processing time yet accurate enough to assess if this changes are dangerous or not.
We use convolutional autoencoder as our algorithm, which is autoencoder with convolution layers. In a simple term, autoencoders are used to help reduce the noice in data. By compressing input data, encoding it, and then reconstructing it as an output, autoencoders allow you to reduce dimensionality and focus only on areas of real value.In other hand, there are also known more sophisticated tools, especially for reducing noise for image data, called convolutional autoencoder. In layman's term, it is an autoencoder with extra steps in it to minimize loss of information when we are compressing the data. The layer consists of Convolutional Layer, relU Step, and Max Pooling Layer (for detailed technical aspect, can refer to LINK)
To find the similarity of the images, we provide three methods, Mean Square Error (MSE), Structural Similarity (SSIM), and Cosine Similarity. If the score or those methods reach some threshold, the engine will follow it up by creating and delivering an alert to the user.
So, for example we use MTI (Ministry of Trade and Industry, Singapore) - one of the Singaporean government agency websites where our system detects drastic changes as you can see in the picture with a green box. Unfortunately, the changes are coming from other parties and will greatly affect credibility of the company. Our system will create alert and enable user to notice and fix it soon as possible.