With the increasing proliferation of online abuse or sexual violence targeting women, the latest study published by Springer Nature has reported that Queensland University of Technology (QUT) researchers have developed a sophisticated and accurate algorithm to detect these posts on Twitter and help drum it out of the Twittersphere.
The team, including Associate Professor Richi Nayak, Professor Nicolas Suzor and research fellow Dr Md Abul Bashar, is a collaboration between QUT’s faculties of Science and Engineering and Law and the Digital Media Research Centre. It mined a dataset of 1 million tweets then refined these by searching for those containing one of three abusive keywords – whore, slut, and rape.
“At the moment, the onus is on the user to report the abuse they receive. We hope our machine-learning solution can be adopted by social media platforms to automatically identify and report this content to protect women and other user groups online. The key challenge in misogynistic tweet detection is understanding the context of a tweet. The complex and noisy nature of tweets makes it difficult,” said Professor Nayak.
Asserting that teaching a machine to understand natural language is one of the more complicated ends of data science, Nayak added, “We developed a text mining system where the algorithm learns the language as it goes, first by developing a base-level understanding then augmenting that knowledge with both tweet-specific and abusive language.”
The team implemented a deep learning algorithm called Long Short-Term Memory with Transfer Learning, which means that the machine could look back at its previous understanding of terminology and change the model as it goes, learning and developing its contextual and semantic understanding over time.
While the system started with a base dictionary and built its vocabulary from there, context and intent had to be carefully monitored by the research team to ensure that the algorithm could differentiate between abuse, sarcasm and friendly use of aggressive terminology.
“Take the phrase ‘get back to the kitchen’ as an example–devoid of context of structural inequality, a machine’s literal interpretation could miss the misogynistic meaning. But seen with the understanding of what constitutes abusive or misogynistic language, it can be identified as a misogynistic tweet,” said Professor Nayak.
The research team’s model identifies misogynistic content with 75 percent accuracy, outperforming other methods that investigate similar aspects of social media language. Other methods based on word distribution or occurrence patterns identify abusive or misogynistic terminology, but the presence of a word by itself doesn’t necessarily correlate with intent.
Once the team had refined the 1 million tweets to 5000, those tweets were then categorised as misogynistic or not based on context and intent, and were input to the machine learning classifier, which used these labelled samples to begin to build its classification model.
The QUT team further added that there’s no shortage of misogynistic data out there to work with, but labelling the data was quite labour-intensive. Professor Nayak and the team hoped the research could translate into platform-level policy that would see Twitter, for example, remove any tweets identified by the algorithm as misogynistic.
According to the researchers, this modelling could also be expanded upon and used in other contexts in the future, such as identifying racism, homophobia, or abuse toward people with disabilities. The researchers further added that the end goal is to take the model to social media platforms and trial it in place. If the researchers can make identifying and removing this content easier, that can help create a safer online space for all users.
For the latest gadget and tech news, and gadget reviews, follow us on Twitter, Facebook and Instagram. For newest tech & gadget videos subscribe to our YouTube Channel. You can also stay up to date using the Gadget Bridge Android App.