Experiment

Project Horizon: A Research Preview

Advancing the Alignment-Utility Frontier through Context-Aware Safety Architectures.

The State of AI Safety

The gap in modern guardrails

Current safety frameworks from leading research labs often rely on binary, rigid filters. While effective at blocking obvious harms, they frequently fail when nuance, pedagogy, or multi-step context enters the loop.

Problem 1

The Refusal Problem

Excessive alignment induces a utility collapse, where models trigger false-positive refusals on benign or pedagogical queries due to a lack of intent-based nuance.

Problem 2

Context Blindness

Heuristic-based PII and PHI detection struggles when sensitive data is embedded in complex reasoning chains or emerges through cross-prompt aggregation, creating latent data leakage risk.

Our Core Engines

Safety as a science, not a filter

Horizon is powered by two specialized models designed to work in tandem with large language models to provide real-time, high-fidelity classification.

Halo | Threat Classification

Preview

Halo is our primary reasoning engine for safety alignment. Unlike standard classifiers that look for keywords, Halo analyzes the intent and potential impact of a prompt.

Adversarial robustness designed to detect jailbreak attempts and prompt injection techniques that bypass traditional filters.

Granular taxonomy that produces a multi-dimensional threat vector instead of a simple safe or unsafe verdict.

Prism | PII / PHI / PCI Detection

Preview

Prism is a high-precision detection layer focused on identifying sensitive data across regulatory domains without stripping away the reasoning continuity operators need.

Deep contextual awareness distinguishes random strings from structured PCI, PHI, or financial identifiers inside realistic medical and financial workflows.

Privacy-preserving utility enables dynamic redaction so models can keep reasoning over anonymized datasets while protecting compliance boundaries.

The Research Lab

Human-in-the-loop evaluation

Project Horizon serves as a live testing ground. We believe safety cannot be solved in a vacuum; it requires diverse, adversarial testing from a global community.

Side-by-side comparison workflows let evaluators compare model behavior under different safety configurations.

A consensus mechanism uses structured votes to define the Horizon Line, the point where a model transitions from helpful to hazardous.

Open benchmarking data gathered in the preview contributes to an open-source safety benchmark for the broader industry.

The Methodology

Capability meets safety

We utilize a decoupled mediation architecture. By offloading safety logic to the Horizon intermediary layer, we avoid the performance degradation typically associated with over-tuning a model’s core weights.

Interception

Halo analyzes the incoming prompt for adversarial intent before the model acts on it.

Audit

Prism scans the prompt for inadvertent data leaks across PII, PHI, and PCI risk domains.

Looking Ahead

Phase One: Research Preview

Our goal is to prove that safety does not have to come at the cost of intelligence.

Current thesis

Context-aware mediation can preserve helpfulness, detect adversarial misuse, and maintain rigorous privacy boundaries without collapsing model utility.

Our goals

Open research contributions through future Prism and Halo open-weight releases to accelerate specialized data protection research.

Enterprise-grade policy engines with real-time policy customization for dynamic safety boundary definitions.

Multi-modal safety layers that extend Halo’s reasoning to multimodal and agentic workflows.

Limited Preview

Request access to Project Horizon

We're opening the preview to a limited set of teams building AI products, trust and safety infrastructure, or compliance workflows.

Request Access