Which words are positive and which are negative? Sentimenti talks about his research

The analysis of emotions in Sentimenti is based on the results of research on what emotions Polish language users associate with specific words, phrases and longer texts. Thanks to the results of subsequent research stages, our tool is still being improved. What research have we conducted so far?

Thousands of words and research participants

The most important part of the project is to collect evaluations of over 30 thousand words, phrases and texts from over 20 thousand Poles. Due to the scale of the project, we decided to carry out the research using two methods: CAPI (Computer Assisted Personal Interview) and CAWI (Computer Assisted Web Interview). Thanks to this, we were able to combine the benefits of both approaches: CAPI research allowed us to maintain strict control and high reliability of the research, while CAWI research allowed us to effectively reach such a large, representative group of Poles. Finally, with CAPI and CAWI data at our disposal, we can directly compare the assessments collected through both methods and check whether the research conducted online is as reliable as that conducted in a laboratory environment.

EmoTool – our research tool

In both studies we used our proprietary EmoTool application, through which the respondents indicated the emotions associated with particular words. The application includes 2 basic evaluation panels: the emotional dimension panel and the emotional category panel. Emotional dimension panel (left side) is used to determine the emotional overtones in the most general sense, i.e. to determine the direction and strength of emotions. The panel of categories of emotions (right side) allows to determine with which basic emotions a given word is associated to the respondents.

Combination of CAPI and CAWI methodology

What data did you collect? A total of 560 came to the laboratory, and more than 20 thousand unique respondents took part in the online research. CAPI participants evaluated nearly 3000 words, while CAWI participants evaluated over 30000 words (including all words from CAPI). Thanks to that, the biggest in Poland and one of the biggest in the world database of emotionally tagged words was created. Each of them has been evaluated at least 50 times, thanks to which we are able to estimate how particular words are perceived in the Polish population. At the same time, we are dealing with a representative group of Poles, which additionally increases the credibility of the results. The collected demographic data will enable the SentiTool tool to be tuned for more specific applications, profiled for selected demographic groups. In other words, we will be able to approximate not only the average reception of a given text, but also how it may vary depending on the age, beliefs or education of the recipient.

One of the most important conclusions from the research to date is that the results of CAWI and CAPI have proved surprisingly consistent. This means that both methods allow us to obtain qualitatively similar assessments. This is important information for us, because it turns out that by conducting research on such a large scale, we do not give up the high quality of the collected data. In other words, we have shown that emotions can be asked about on the Internet as effectively as in the laboratory. What is more, using this method, we can examine emotions related not only to the text but also to other material chosen by the Client, such as emojis, logos, graphics, recordings or video clips.

The most positive and negative words

Among the words considered by the respondents to be the most positive were:

beautiful, loving, sunny, cheerful, tender, gift, delight, joy, warmth and care.

In turn, the worst associations were found:

terrorist, beatings, war, paedophile, genocide, violence, aggression, pervert, murder and cheating.

These results are quite intuitive and thus confirm the reliability of the data collected by Sentimenti. The next comparison leads us to less obvious conclusions. Sometimes it seems to us that negative statements are more emotional, more striking, just stronger. Meanwhile, in the top ten, the words on the scale of strength are almost all positive:

to love pleasure, kiss, orgasm, mother, aggressive, in love, dinner, pass and euphoria..

The results of the research gathered so far through the EmoTool application are directly applicable to SentiTool, our automatic text analysis tool. Although the Sentimenti project can already boast several successes (for example, SentiStock analyses), we are still working on improving our tools. We have recently completed another stage of research, where participants evaluated not only words, but also phrases and whole texts. The data collected in this study will allow us to take a broader context into account in the process of text analysis and thus increase the effectiveness and accuracy of emotion detection.

The text was written together with PhD Monika Riegel and PhD Małgorzata Wierzba from LOBI.