IQ insight: Using data science to measure public opinion |
Author / contact: Damla Arifoglu
How does sentiment analysis work?
Overview
Social media is on the rise! Just think about how many times you have checked your social media today. In 2020, the number of people using social media worldwide was 3.6 billion, and it is predicted to increase to almost 4.41 billion in 2025. Every second 6,000 tweets are being sent, amounting to 500 million tweets per day. Social media is not only a place to connect with friends, but also an ocean of information hidden in unstructured data. Identifying trends and patterns in social media and exploiting that data using analytics has been an interest to many researchers and data scientists all over the world. Predicting stock prices, political opinion mining, measuring customer satisfaction are a few examples of use cases where data analytics has been used on Twitter data. Sentiment can also be useful in measuring risk.
Sentiment Analysis
The use cases above work by measuring public sentiment using a piece of software called a sentiment analyser. Sentiment can be defined as an attitude, opinion, idea or feeling about something such as event, topic, person, etc. Sentiment analysis is the process of identifying the opinions in a piece of text and calculating the severity of the feeling by using Natural Language Processing (NLP) and Machine Learning (ML) techniques. Therefore, a sentiment analyser assigns a score to a given text, usually a value between -1 and 1, indicating whether the writer’s attitude or ‘sentiment’ towards a particular topic is negative, positive or neutral.
Examples
Let’s see how a sentiment analyser works by looking at the example below, which expresses a positive sentiment towards an insurance company.
Our flight from London to New York got cancelled. Did this bum us out? No way, we have insurance. They covered taxis, a great hotel, toiletries, gourmet meals; turned lemons to lemonade.
Off-the-shelf sentiment analyser tools Vader and TextBlob, assign positive sentiment scores of 0.84 and 0.47 to this tweet, respectively, both indicating it is a positive tweet. Let’s check why the assigned scores are different by exploring how sentiment analysers do their job.
Sentiment analyser, Vader works by using a dictionary of words, in which each word has a score between -4 and 4, indicating the severity of that word, i.e. showing how positive or negative it is. For example, considering the example above, the word “cancelled” is rated as -1, while the word “great” is rated as 3.1 in the Vader dictionary. Then the scores of the words are averaged to assign a sentiment score between -1 and 1 to the whole text. However, analysers like Vader suffer from drawbacks. (i) Human experts construct the dictionary by rating each word, which is a tedious and time consuming task. (ii) Although Vader has 7520 words in its dictionary, some words, especially domain specific words, might not be available in the dictionary. Thus, Vader might fail to assign correct labels for some text.
Sentiment analyser, TextBlob on the other hand works in a different way. Relying on a training dataset, first it teaches a classifier what a negative and positive tweet looks like. A training dataset is a collection of tweets and their corresponding labels such as negative or positive. For example, the classifier learns that when the words “great, happy, like” occurs in a piece of text, it is an indication of positive sentiment, whilst when the words such as “hate, dislike, upset” occurs in the text, it is an expression of a negative sentiment. The challenge with analysers like TextBlob is that they require a training dataset to train a classifier how negative and positive tweets look like and a training set might fail to reflect all types of words indicating a sentiment type.
Next blog
In this blog we have explored sentiment analysis and the different approaches of two off-the-shelf sentiment analysers, Vader and TextBlob. While these analysers can be good for a quick solution for a sentiment analysis use case, they sometimes fail to produce reliable results for certain use cases and scenarios. In the next sentiment analysis blog we will look at how Kennedys have exposed some of these reliability issues and to share our thoughts on how to overcome them.
Related news and insights
Kennedys recognised across three categories at the FT Innovative Lawyers Awards Europe 2024
Kennedys are proud to have been shortlisted in three highly competitive categories at the FT Innovative Lawyers Awards Europe 2024, which took place on September 12, 2024, at the Natural History Museum, London.
Kennedys IQ and Vested Impact collaborate on enhanced ESG data offering
Kennedys IQ and Vested Impact have announced a new collaboration to drive the use of environmental, social and corporate governance (ESG) data throughout the insurance industry.
Sentiment analysis – the power of large language models
In recent years the focus on Environmental, Social and Governance (ESG) has grown and an organisation’s actions in these areas can have a considerable impact on overall reputation.
Kennedys recognised across three categories at the FT Innovative Lawyers Awards Europe 2024
Kennedys are proud to have been shortlisted in three highly competitive categories at the FT Innovative Lawyers Awards Europe 2024, which took place on September 12, 2024, at the Natural History Museum, London.
Casualty compensators getting hit twice by recent JCG increases
March saw the arrival of the new 17th Edition of the Judicial College Guidelines and, with it, we saw a significant 22% rise in damages for injuries typically settled within the MoJ Claims Portal process.
Assessing the impact of the 17th Edition of the JCG on OIC Claims
The highly anticipated 17th Edition of the Judicial College Guidelines (JCG) has begun to reach recipients. Let’s delve into the standout updates from this edition and examine their potential implications for Official Injury Claim (OIC) claims.