Outstanding results in detecting cyberbullying
Meet our client
INDUSTRYTMT & Other
How we did it
With the growing importance and popularity of social media, the application of natural language processing techniques to this domain is becoming an increasingly trendy topic among NLP researchers. As part of an R&D project aiming to contribute to the advancement of Polish NLP, deepsense.ai experts have introduced TrelBERT, a language model for Polish social media which has achieved outstanding results in detecting cyberbullying.The challenge
The goal of the project was to prepare a solution capable of determining whether a given social media post is harmful or not. In this particular task, the Tweets were subject to analysis.The solution
The typical approach to training modern NLP models is so-called transfer learning, which assumes the development of one large NLP model that captures general knowledge about the language, and can then be trained to perform various specific tasks. The starting point for the experiments was HerBERT, a solution previously published by Allegro – the company behind the largest e-commerce platform in Poland. Their model already knew the general rules, words and their meanings in different contexts in the Polish language, which it learned by “studying” texts from various sources: CommonCrawl, the National Corpus of Polish, movie subtitles or Wikipedia. What it did not know about was the language of social media, which differs significantly from texts in the resources listed above. And this is what deepsense.ai’s team taught it.
The model created by deepsense.ai utilized HerBERT and was further trained using almost 100 million messages extracted from Polish Twitter. The decision was made to name the model TrelBERT, as “trel” is a Polish word describing the sound made by a bird (referring to “tweet”).
To evaluate TrelBERT against the tasks included in the Polish NLP, a benchmark called KLEJ (analogous to the famous English GLUE benchmark) was used. One of the tasks that was especially interesting was one which served the purposes of cyberbullying detection. The goal was to prepare a solution capable of determining whether a given Tweet is harmful or not. In this particular task, thanks to being trained on Twitter texts, the model surpassed all the other competitors and is currently occupying top spot on the KLEJ leaderboard (in the “CBD” column – results for cyberbullying detection).
Apart from submitting the model’s results to KLEJ, deepsense.ai made TrelBERT available in the popular Hugging Face repository of models. You can give the model a try and read more about it here.The effect
According to estimates, over 10% of Polish Twitter content is harmful. deepsense.ai’s solution is able to detect over 70% of them with a precision of up to 75%, surpassing other existing solutions by a large margin.
The model can be tuned to other scenarios based on data from social media.
We want to hear from you
United States of America
- deepsense.ai, Inc.
- 2100 Geng Road, Suite 210
- Palo Alto, CA 94303
- United States of America
- deepsense.ai Sp. z o.o.
- al. Jerozolimskie 162A
- 02-342 Warsaw
- ul. Łęczycka 59
- 85-737 Bydgoszcz
Let us know how we can help
- Our offer
- Media relations