Online grooming, alongside the battle against Child Sexual Abuse and Exploitation (CSAE), has become a critical concern within the EU’s digital safety agenda. As online platforms expand, they offer increased opportunities for perpetrators to exploit vulnerable users, particularly children. To tackle this issue, the CESAGRAM platform aims to monitor online content in public spaces by combining data gathering capabilities offered by modern Web and social media crawlers with a number of Artificial Intelligence (AI)-driven tools that analyze linguistic patterns, detect grooming activities, and provide early warnings of potential grooming incidents.
The core of CESAGRAM platform are the crawlers which are focusing on collecting data from social media platforms of end users’ interest such as Twitch, YouTube, as well as the entire Web. The crawlers capture both the textual online content and accompanying metadata, such as usernames, timestamps, and other contextual information, from publicly available accounts and/or relevant publicly accessible webpages. This approach ensures comprehensive tracking of potential grooming activities across platforms while strictly adhering to the applicable national, EU and international law, research ethics standards and data protection regulations, ensuring data minimization and proper pseudonymization as well as applying well-rounded privacy by design techniques, thus reducing any relevant risks associated with personal data handling.
One of the main AI-based linguistic analysis tools of CESAGRAM is Named Entity Recognition (NER). This tool automatically identifies key entities within text (i.e. people, geolocations, corporations, groups etc.), that Law Enforcement Agencies (LEAs) can exploit by focusing on specific instances that may raise their attention in relation to potential grooming activities. For instance, repeated references to a specific location or putting pressure to transfer a conversation from a public forum to private online spaces (e.g., by suggesting private messaging apps), may indicate activities that require LEAs’ attention. Parallel to NER, CESAGRAM is capable of performing Sentiment and Emotion Analysis on online discussions from social media and Web forums. Sentiment analysis classifies messages as positive, negative, or neutral, based on their content, while emotion analysis complements this functionality by categorizing messages into one of the six basic human emotions: anger, fear, happiness, sadness, surprise, and disgust, highlighting emotional cues that could indicate danger. Identifying relevant sentiment and specific emotions early is critical because grooming activities, even at their early stages, often elicit strong emotional reactions, such as repeated expressions of fear or sadness from the victims or happiness and anger from the perpetrators’ viewpoint.
Furthermore, CESAGRAM is equipped with a Taxonomy Classification tool, where end users are provided with the opportunity to categorize messages into one of six predefined grooming-related categories (as identified in the existing literature): initiate contact, rapport building, risk assessment, sexualized content, enticements and control, and arrange in-person meeting. This classification not only categorizes the text into known grooming tactics making it easier for law enforcement to focus on high-risk content, but also enables the system to perform a Risk Assessment of user accounts involved in online discussions of interest, associating the user accounts with the corresponding risk level: low, moderate, or high. When the risk level of a user account escalates to moderate or high, the system automatically generates early warning notifications. This allows for timely action and ensures that high-risk interactions receive immediate attention. Finally, CESAGRAM includes an advanced Authorship Analysis module that is capable of identifying user accounts that are likely to be operated by the same physical entity (i.e., person). Offenders frequently create multiple profiles to either approach victims or avoid being detected by the authorities. By analyzing their writing style, language patterns, and other textual clues, the system can identify user accounts that are likely operated by the same individual. All the AI tools are integrated into the CESAGRAM platform, a user-friendly interface providing a comprehensive view of the collected data, together with insights from the various analysis components, accompanied by early warning alerts, and interactive visuals to increase user engagement. This enables the end users to effectively monitor online spaces and respond to grooming threats in real time, facilitating in that way the project’s ultimate goal: Provide a complete and effective solution to end users for accurately responding to any grooming incident online and to the scientific community for the future development of innovative responses in this field of research.