Social chat moderation program
social-chat-moderationDomain: social-safetyType: mixedDescription
A working chat-moderation program covers a platform's user-to-user messaging surfaces with a policy, an enforcement workflow, and a transparency mechanism that survives regulator review. The components are a chat-moderation policy that defines what scanning, classifier matching, or human review applies (with explicit carve-outs for end-to-end encrypted threads where the platform genuinely cannot see content), an enforcement workflow that triggers on signal (slurs, CSAM hash match, grooming pattern, threat keywords) with graduated action options ranging from warning to message-block to account suspension to NCMEC referral, and a transparency surface (Statement of Reasons under DSA Article 17, takedown ledger for the UK OSA) that makes the enforcement auditable. Private-message moderation has been treated differently from public-content moderation in the regulatory framing, and the differences are narrowing. The audience is smaller, the harms (grooming, sextortion, harassment) are higher-velocity, and the privacy framing of private message has been narrowed by regulators when the platform is in a position to detect and act. DSA Article 28 enhanced minor-protection obligations apply to platforms reaching minors, regardless of whether the surface is public or DM. UK OSA Part 3 illegal-content and child-safety duties cover chat services with UK users under a proportionality test. KOSA, in the dominant Senate version, sets a duty-of-care to mitigate harms to minors that explicitly contemplates chat surfaces. The combined effect is that for platforms with under-13 or under-18 user segments, chat moderation moves from recommended to expected. The encryption carve-out is the operationally hard piece. A genuinely end-to-end encrypted thread means the platform cannot scan content; the program then has to lean on metadata signals (sender age, contact-graph anomaly, reported messages), client-side classifiers running on the device before encryption, and a published policy that explains where content scanning does and does not apply. The NCMEC referral procedure for CSAM-hash matches sits outside the encryption carve-out in most regimes, because the matching is on hashes of known content rather than on message plaintext, and the reporting obligation is statutory. Evidence formats that hold up include the chat-moderation policy with carve-outs documented, the classifier and vendor pipeline diagram showing what runs on what surface, the enforcement-action ledger, the Statement of Reasons export, and the NCMEC referral procedure with reviewer training records attached.
Required by (4 regulations)
- DSA
Art 16-17 (notice-and-action + statement of reasons), Art 28 (minor-protection systems on platforms reaching minors).
DSA Art. 16, 17, 28
- UK OSA
Part 3 illegal-content + child-safety duties — chat services with UK users must implement proportionate moderation systems.
OSA 2023 Part 3
- KOSA
(if enacted) §3 platform duty-of-care to mitigate harms to minors including bullying, sexual exploitation, mental-health-harming content patterns in chat surfaces.
KOSA §3
- China Content Review
China's content-review rules require moderation of user-generated and social chat content.
Administrative Measures for the Administration of Online Publishing Services (NPPA and MIIT Order No. 5, effective March 10, 2016); culture-side rules on online cultural products and online-game content self-examination and recordkeeping; NPPA content-review notices
Fulfilled by (4)
- community-sift-two-hat · full · medium effort · $$$Microsoft / Two Hat Community Sift — real-time chat classification with platform-level risk tagging.
- activefence · full · medium effort · $$$Trust + safety platform with chat-pattern detection, grooming-signal classifiers, and threat-intel feeds.
- hive · partial · low effort · $$Hive's text / image moderation APIs cover chat triage; pair with a workflow tool for enforcement actions.
- In-house build · high effortBuild a chat-moderation pipeline: classifier (managed or custom), enforcement workflow, NCMEC integration, SoR export. Typically takes 3-6 months of engineering effort.
Magist does not accept payment from vendors. Methodology.
Evidence formats
- chat moderation policy doc
- classifier / vendor pipeline diagram
- enforcement action ledger
- Statement of Reasons (SoR) export
- NCMEC referral procedure