Review and moderation, your way. Online safety dashboard, queues, routing and automatic enforcement rules, and integrations.
-
Updated
Apr 25, 2026 - TypeScript
Review and moderation, your way. Online safety dashboard, queues, routing and automatic enforcement rules, and integrations.
Open Source Agent Alignment: Make your agents follow rules. One line of code to enforce, trace, and improve.
🛡️ Programmable Guardrails for LLM Applications in Java. A framework-agnostic toolkit for input/output validation, PII masking, and jailbreak detection. The Java alternative to NVIDIA NeMo Guardrails.
A JavaScript-based content safety system designed to detect and filter sensitive media in real-time, ensuring platform compliance and user protection.
An intelligent task management assistant built with .NET, Next.js, Microsoft Agent Framework, AG-UI protocol, and Azure OpenAI, demonstrating Clean Architecture and autonomous AI agent capabilities
Step-by-Step tutorial that teaches you how to use Azure Safety Content - the prebuilt AI service that helps ensure that content sent to user is filtered to safeguard them from risky or undesirable outcomes
│ Real-time NSFW & harmful content detection as a service
Transform uncertainty into absolute confidence.
🔍 Benchmark jailbreak resilience in LLMs with JailBench for clear insights and improved model defenses against jailbreak attempts.
Benchmark LLM jailbreak resilience across providers with standardized tests, adversarial mode, rich analytics, and a clean Web UI.
AI application firewall for LLM-powered apps — multi-layered detection (heuristic, ML classifier, semantic, LLM-judge) against prompt injection, jailbreaks, and data leakage - inferwall.com
Technical presentations with hands-on demos
Arabic Content Moderator — scan text for toxicity, hate speech, spam. Dialect-aware. Fully offline.
Production-Grade LLM Alignment Engine (TruthProbe + ADT)
A Chrome extension that uses Claude AI to protect users under 18 from inappropriate content by analyzing webpage content in real-time.
Content moderation (text and image) in a social network demo
Pre-Publish Security Gate - Scan and redact sensitive information before sharing
Responsible AI toolkit for LLM applications: PII/PHI redaction, prompt injection detection, bias scoring, content safety filters, and output validation. Framework-agnostic Python library with FastAPI demo.
The open-source safety stack for AI agents. Policy engine, content scanner, approval workflows, audit trails. 924+ tests. MIT licensed.
抖音视频审核检测|同行举报分析工具|抖音视频风控|抖音风控||优化视频|举报同行|视频监测|视频检测
Add a description, image, and links to the content-safety topic page so that developers can more easily learn about it.
To associate your repository with the content-safety topic, visit your repo's landing page and select "manage topics."