Some years ago, we ran a project with a UK retailer to collect video diaries from their customers detailing their shopping experience and general day-to-day life over the Christmas period. The problem was, not many of them had web cams, so we had to buy and mail out multiple webcams to participants to allow them to record their diaries!
These days, it’s easier to collect video data as most people have a great quality video camera with them all the time: on their phone or tablet computer. We are no longer limited to video footage from people sat at their PCs in their spare rooms, but rather have full access when they’re out and about living their daily lives. Further, we can collect video feedback from consumers in the many markets where fixed line infrastructure has been leapfrogged leaving mobile as the only internet entry point for the majority.
Yes, it’s certainly easier than ever to collect video. Great! But then you find yourself with a few hours of video to process, search, make sense of, analyse and present. Added to that, even hosting and processing videos is a data hungry exercise. So, what to do? There are a number of providers offering solutions, as well as a number of factors to consider when choosing a solution, and in this blog, I will guide you through a few of them.
Considerations when Choosing a Video Analytics Insight Provider
Video Processing and Storage
Different providers offer different pricing models which often reference the number of videos and their length. If you have a video question in a survey, you would likely be expecting a large number of very short videos, whereas if you have carried out video focus groups, you would have a much smaller number of longer videos. Consider this when evaluating pricing models. Also note that some providers with pricing orientated towards a large number of very short videos, do have a video length cap of around 2 minutes.
The likely step in between collecting and analysing your videos is transcribing them, i.e. writing down what the participant is saying, possibly with annotations for tone of voice and gestures. There are two ways of doing this: 1) using machine-based transcription and 2) using human transcription.
Machine transcription has the advantage of being very fast to turn around, is cheaper, and, arguably, has fewer data privacy concerns as no ‘real person’ is seeing your participants’ videos. On the downside, despite constant advances in machine learning and natural language processing, it’s not actually all that accurate, with variations in quality depending on audio levels, participants’ accents, the number of people talking (possibly over each other) and background noise.
Human transcription may be handled either through a dedicated team or via crowd-sourced solutions such as Amazon’s Mechanical Turk, which uses a dispersed, casual team of workers to complete tasks that are more easily accomplished by humans than computers. This offers higher quality transcriptions than those of a machine but takes a bit more time to turn around and also means that your participants’ data is being distributed to another 3rd party who you have no knowledge of… Ensure that all of the necessary GDPR stipulations are undertaken before proceeding.
Either way, language is also a consideration; not all providers will offer services across all languages. This will need clarifying.
Searching and Editing Videos
Once your videos are processed and transcribed, you will want to be able to search and filter on key terms. This is where you will really benefit from using a service to manage your video content; otherwise you can neither identify patterns and themes across your responses nor easily find clips to extract for presentation purposes. Just check that the tools under consideration do allow you set up your own search terms and filters to find relevant content.
Once you have found the content you want, you can then select part of the video to extract a clip. There are various interfaces used to do this, from using sliders to set a start and end time and then ‘snipping’ the video at that point, to selecting a segment of text in the transcription, with the corresponding video segment clipped to match.
Once you’ve created clips, some providers offer the facility to put these together in a show reel. Of these providers some offer the show reel as a continuous video and others allow you to add break ‘slides’ to provide context and notes: a fantastic time saver if your presentation requirements are regular and sizable.
As well as searching text of videos, another feature to look out for is automatic facial emotion and sentiment coding. Emotional coding uses a tool to scan participants’ facial expressions in videos; identify one of the 7 claimed basic human emotions (anger, contempt, fear, disgust, happiness, sadness and surprise) and tag segments of the video accordingly. Sentiment coding analyses the transcribed text rather than the visual to similarly calculate an overall sentiment for the video. The major providers tend to offer interactive charts of sentiment/emotion which can be drilled into for the examination of key terms by sentiment.
It has to be said that whilst emotional coding and sentiment coding will give you a starting point in your analysis, results will still need careful checking as nuanced verbal communication is notoriously difficult to read even by a human, never mind a machine!
Video Analytics Insight Providers: A Brief Summary
So, having looked at the video analytics provider selection considerations, who to choose? Who are the key players in video analytics for market research? Here is a brief summary of five providers leading the way:
Voxpopme use human transcription, for the accuracy it provides. Once videos are transcribed, they can be analysed for sentiment and the results presented in interactive charts. VoxPopMe claim they can handle over 80 different languages. They make it easy to compile clips into impressive show reels, with break slides and text captions.
Plotto primarily build their platform around collecting and analysing video in survey responses, though they can handle longer videos on request. They offer their own survey tool with integrated video; alternatively, you can integrate their video widget into your surveys then edit, organise and analyse it using facial and sentiment coding in their platform. Plotto offers machine-based transcription as standard and human transcription upon request.
Living Lens (livinglens.tv)
As well as facial/emotional coding, Living Lens also offer tonal and object recognition. Tonal recognition complements sentiment analysis by providing a guide as to how something was said, thus deriving sentiment from it, whilst the object recognition algorithms should recognise an object in your videos to provide extra context. They also offer an appealing interface for creating and manipulating visualisations based on the above.
Big Sofa (bigsofatech.com)
Big Sofa offer video technology from a Market Research agency background. Their platform allows you to upload pre-collected video; they can at request build APIs to do this. The transcription and translation is done by humans, meaning that a wide range of languages are offered. They have language analytics integrated and are in the process of building object classification, emotional and engagement analytics.
Watch Me Think (watchmethink.com)
Watch Me Think promote themselves as an agency specialising in video. As such, their services of ‘curation’, transcription, translation and analysis are human rather than machine powered. Rather than you uploading your own video they have a panel with panellists coached to provide the best quality (in terms of technical quality and depth of answer) video responses for clients. Their video analytics platform offers emotional and meaning analysis as well as the ability to search, edit and create playlists from videos.