Detecting Landmarks with Google Cloud Vision API

Whilst doing some research for a product, I wanted to see whether we could use a service for image recognition. I’d heard of ‘Machine learning as a service’ (MLaaS) but hadn’t actually experimented with any of these services.

Google Cloud or Clarifai

I found two services for image recognition: Google Cloud Vision and Clarifai. Both services have a set of pre-trained models which you can use via their APIs.

Google lists the following models which you can use (their descriptions):

Label: Detect broad sets of categories within an image, ranging from modes of transportation to animals.
Explicit Content: Detect explicit content like adult content or violent content within an image.
Logo: Detect popular product logos within an image.
Landmark: Detect popular natural and man-made structures within an image.
Optical Character Recognition: Detect and extract text within an image, with support for a broad range of languages, along with support for automatic language identification.
Face: Detect multiple faces within an image, along with the associated key facial attributes like emotional state or wearing headwear.
Image Attributes: Detect general attributes of the image, such as dominant color.

Clarifai lists the following Public Models (their descriptions):

General: Contains a wide range of tags across many different topics. In most cases, tags returned from the general model will sufficiently recognize what’s inside your image.
Food: Analyzes images and videos and returns probability scores on the likelihood that the image contains a recognized food ingredient and dish.
Travel: Analyzes images and returns probability scores on the likelihood that the image contains a recognized travel related category.
NSFW: Analyzes images and videos and returns probability scores on the likelihood that the image contains pornography.
Weddings: Knows all about weddings including brides, grooms, dresses, flowers, etc.
Color: Retrieves the dominant colors present in your images.
Plus the following listed in ‘Alpha’: Face Detection, Apparel & Celebrity.

Clarifai also lets you train your own model which sounds interesting. I coudn’t see an equivalent feature with Google Cloud Vision.

My specific use case was to look for landmarks in images. As Google Cloud Vision have an API for this, I decided to try it. I will however take a look at Clarifai when I have the time.

Set up on Google Cloud

Google have a feature on their splash page with which you can just drag and drop an image to quickly test the service. I wanted to try out the API however.

As this was the first time I’ve used the Google Cloud Platform for anything, I had to set up a few things. If you want to use the API in a local application, you have to first install the Google Cloud SDK for your OS and then a client library of your choice (C#, Go, Java, Node.js, PHP, Python or Ruby).

Once that was set up, I used the node.js code Google supplies here for my landmark detection test.

// Imports the Google Cloud client library
const Vision = require('@google-cloud/vision');

// Instantiates a client
const vision = Vision();

// The path to the local image file, e.g. "/path/to/image.png"
// const fileName = '/path/to/image.png';

// Performs landmark detection on the local file
vision.detectLandmarks(fileName)
  .then((results) => {
    const landmarks = results[0];

    console.log('Landmarks:');
    landmarks.forEach((landmark) => console.log(landmark));
  });

A Quick Test

For the test itself, I grabbed 4 screengrabs from a corporate video about Berlin. I wanted to use images which Google hadn’t already seen, but which contained Berlin landmarks. You can see the images below and whether Google correctly identifed the landmark.

From left to right (and top to bottom), the results were: “Reichstag Building”, “Checkpoint Charlie”, “No result” and “Brandenburg Gate”.

Google Cloud Vision API Test Results

I was very impressed by this (admittedly brief) test. When I get the time, I’m going to take a look at Clarifai.