Research Preview
Project Horizon:A Research Preview
Advancing the Alignment-Utility Frontier through Context-Aware Safety Architectures.
Project Horizon is a research initiative designed to move beyond binary blocking. We are building a dynamic, context-aware safety architecture that preserves model utility while enforcing rigorous security boundaries.
The State of AI Safety
The gap in modern guardrails
Current safety frameworks from leading research labs often rely on binary, rigid filters. While effective at blocking obvious harms, they frequently fail when nuance, pedagogy, or multi-step context enters the loop.
Critical Failure Mode
The Refusal Problem
Excessive alignment induces a utility collapse, where models trigger false-positive refusals on benign or pedagogical queries due to a lack of intent-based nuance.
Critical Failure Mode
Context Blindness
Heuristic-based PII and PHI detection struggles when sensitive data is embedded in complex reasoning chains or emerges through cross-prompt aggregation, creating latent data leakage risk.
Our Core Engines
Safety as a science, not a filter
Horizon is powered by two specialized models designed to work in tandem with large language models to provide real-time, high-fidelity classification.
01
Halo | Threat Classification
Halo is our primary reasoning engine for safety alignment. Unlike standard classifiers that look for keywords, Halo analyzes the intent and potential impact of a prompt.
Adversarial robustness designed to detect jailbreak attempts and prompt injection techniques that bypass traditional filters.
Granular taxonomy that produces a multi-dimensional threat vector instead of a simple safe or unsafe verdict.
02
Prism | PII / PHI / PCI Detection
Prism is a high-precision detection layer focused on identifying sensitive data across regulatory domains without stripping away the reasoning continuity operators need.
Deep contextual awareness distinguishes random strings from structured PCI, PHI, or financial identifiers inside realistic medical and financial workflows.
Privacy-preserving utility enables dynamic redaction so models can keep reasoning over anonymized datasets while protecting compliance boundaries.
The Research Lab
Human-in-the-loop evaluation
Project Horizon serves as a live testing ground. We believe safety cannot be solved in a vacuum; it requires diverse, adversarial testing from a global community.
Side-by-side comparison workflows let evaluators compare model behavior under different safety configurations.
A consensus mechanism uses structured votes to define the Horizon Line, the point where a model transitions from helpful to hazardous.
Open benchmarking data gathered in the preview contributes to an open-source safety benchmark for the broader industry.
The Methodology
Capability meets safety
We utilize a decoupled mediation architecture. By offloading safety logic to the Horizon intermediary layer, we avoid the performance degradation typically associated with over-tuning a model’s core weights.
Workflow Stage
Interception
Halo analyzes the incoming prompt for adversarial intent before the model acts on it.
Workflow Stage
Audit
Prism scans the prompt for inadvertent data leaks across PII, PHI, and PCI risk domains.
Looking Ahead
Phase One: research preview
Our goal is to prove that safety does not have to come at the cost of intelligence.
Current thesis
Context-aware mediation can preserve helpfulness, detect adversarial misuse, and maintain rigorous privacy boundaries without collapsing model utility.
Our goals
Open research contributions through future Prism and Halo open-weight releases to accelerate specialized data protection research.
Enterprise-grade policy engines with real-time policy customization for dynamic safety boundary definitions.
Multi-modal safety layers that extend Halo’s reasoning to multimodal and agentic workflows.