Automatically Tag and Caption Images in SharePoint

I was recently asked if SharePoint could meet a certain set of requirements that included auto tagging images uploaded to SharePoint.  The requirements came from a team that needs to manage many pictures for hundreds of locations.  By leveraging cognitive services, I was able to slap a solution together in a fairly short amount of time.

Setting up the Computer Vision API in Azure

Start by logging into your Azure portal and searching for Cognitive Services

cognitive services search.PNG

On the cognitive services blade, click the Add button which will show the “AI + Machine Learning” blade where you can search for “Computer Vision”.

Adding Computer Vision

 

After you select Computer Vision, you can provide you service name, subscription, location, etc as shown below.

Computer Vision Create 2

 

Your new service will have some basic information for getting started.  The keys that your service will use, links to additional information and tutorials, etc.

Computer Vision Quick Start

That’s it for configuring the service.  You’ll need to copy a key which will be needed for the api calls.  To do so, click the Keys link under section 1 “Grab your keys”.  You’ll also want to copy the endpoint in Section 2.  We’ll need both later when setting up the service in Flow.

Computer Vision Keys

Setup your Library

Now that our service is set up, we will need a library.  We’re going to use a simple document library with two text fields.  The first is a single line of text for the caption.  I’m naming this field “Description.”and the second is a multi-line text field to store our tags.  I named the second field “Image Tags.”  Ideally, we’d use a managed metadata field or the Enterprise Keywords but, currently, Microsoft Flow doesn’t support creating new terms and for this solution, it isn’t feasible to know what those tags are going to be in order to pre-populate the terms.

doclib

Create your Flow

Let’s take a look at the Flow at a high level.  We start by listening for files being created/uploaded to our library.  Get file metadata will get your Item Id.  Next, the Get File Properties will get you the columns that we’re going to populate.  Analyze Image will call our Computer Vision API.  Compose will get our tags from the previous action and the last step will populate our item with the Caption and Tags provided by the Computer Vision API.

flow-overview

Now, let’s look at the important parts.  Before we can use the Analyze Image action, we have to set it up.

analyze image

If it’s your first time using the service, you’ll need to do some basic setup.  For the Connection Name, I entered the name of my Azure Resource Group for simplicity.  The Key that I said you’ll want to copy earlier is what you’ll use in the Account Key.  I also said you should copy the endpoint.  The endpoint will be used in the Site URL box.

Computer Vision API Connection

When you hit the Create button, it’ll switch to the following screen.  Here you’ll be able to select a source.  We’ll choose Image Content and that will give you the Image Content field.  For the Image Content, we’ll provide “File Content” which comes from the very first step in the Flow.

Analyze Image 2

The next step is the Compose step which we’ll gloss over because all it’s doing is taking the “Tag Names” output from the Analyze Image step.

The final step is to update the file properties. We’ll going to feed the “Captions Caption Text” to our Description field and we’ll give the output from the Compose step to the Image Tags field.

Computer Vision Update Properties

Once the Flow is saved, and a few images have been uploaded, you’ll have a series of auto-tagged images.

tagged lib

Conclusion

The screenshot above shows an image of a skateboarder.  The computer vision api generated the caption “a man riding a skateboard” and a series of tags in the Image Tags field. The caption and tags are searchable; however, they are not filterable.