Controls

Controls are modular safety checks that run on text before it reaches the LLM (input stage) and after the LLM responds (output stage). Controls are scan-only — they detect and report issues but do not modify the text. Based on their configured action, they can observe, flag, or block requests.

Quick Start

The fastest way to enable controls is through glacis.yaml:

version: "1.3"
controls:
  input:
    pii_phi:
      enabled: true
      mode: "fast"
      if_detected: "flag"

Then pass the config to your integration wrapper:

from glacis.integrations.openai import attested_openai

client = attested_openai(config="glacis.yaml")

# PII in the prompt is detected, flagged, and recorded in the attestation
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "My SSN is 123-45-6789"}],
)

Built-in Controls

PII/PHI Detection

Detects the 18 HIPAA Safe Harbor identifiers using Microsoft Presidio with custom healthcare-specific recognizers.

Install:

pip install glacis[controls]

Two scanning modes:

Mode	Engine	Latency	Best For
`fast`	Regex-only	< 2 ms	High-throughput, latency-sensitive
`full`	Regex + spaCy NER	~15-20 ms	Higher accuracy for names/locations

Configuration:

controls:
  input:
    pii_phi:
      enabled: true
      model: "presidio"
      mode: "fast"                          # "fast" or "full"
      entities: ["US_SSN", "EMAIL_ADDRESS"] # Empty = all HIPAA entities
      if_detected: "flag"                   # "forward", "flag", or "block"

Supported entity types:

The PII control covers the full HIPAA Safe Harbor set including PERSON, DATE_TIME, PHONE_NUMBER, EMAIL_ADDRESS, US_SSN, US_DRIVER_LICENSE, URL, IP_ADDRESS, CREDIT_CARD, US_BANK_NUMBER, IBAN_CODE, US_PASSPORT, US_ITIN, MEDICAL_RECORD_NUMBER, HEALTH_PLAN_BENEFICIARY, NPI, DEA_NUMBER, MEDICAL_LICENSE, US_ZIP_CODE, STREET_ADDRESS, VIN, LICENSE_PLATE, DEVICE_SERIAL, UDI, IMEI, FAX_NUMBER, BIOMETRIC_ID, and UUID.

When entities is empty (the default), all HIPAA entity types are scanned.

Jailbreak Detection

Detects jailbreak and prompt injection attempts using Meta Llama Prompt Guard 2 models.

Install:

pip install glacis[jailbreak]

Supported models:

Model	Parameters	Latency	Use Case
`prompt_guard_22m`	~22M (DeBERTa-xsmall)	< 10 ms (CPU)	High-throughput, latency-sensitive
`prompt_guard_86m`	~86M (DeBERTa-v3-base)	~20-50 ms	Higher accuracy, complex attacks

Configuration:

controls:
  input:
    jailbreak:
      enabled: true
      model: "prompt_guard_22m"  # or "prompt_guard_86m"
      threshold: 0.5             # Classification threshold (0-1)
      if_detected: "block"       # "forward", "flag", or "block"

The model classifies text as either BENIGN or MALICIOUS. When the malicious confidence score exceeds the threshold, the control reports a detection.

Word Filter

Case-insensitive literal string matching for detecting prohibited terms. Uses re.escape() to prevent regex injection. No extra dependencies required.

Configuration:

controls:
  input:
    word_filter:
      enabled: true
      entities: ["confidential", "proprietary", "internal only"]
      if_detected: "flag"
  output:
    word_filter:
      enabled: true
      entities: ["system prompt", "secret key"]
      if_detected: "block"

Safety limits: a maximum of 500 entities, each up to 256 characters.

Actions

Every control returns an action that determines how the pipeline proceeds:

Action	Behavior	Pipeline continues?
`forward`	Observe and pass through	Yes
`flag`	Log detection and continue	Yes
`block`	Halt the request	No (input) / Depends (output)

Output Block Behavior

When an output control triggers block, the output_block_action setting determines what happens:

controls:
  output_block_action: "block"  # or "forward"

Setting	Behavior
`"block"` (default)	Raises `GlacisBlockedError` — the LLM response is withheld
`"forward"`	Returns the LLM response but marks the determination as `"blocked"` in the attestation

Using with Integrations

When using provider integrations (OpenAI, Anthropic, Gemini), controls are configured through glacis.yaml and run automatically:

from glacis.integrations.openai import attested_openai
from glacis.integrations.base import GlacisBlockedError

client = attested_openai(config="glacis.yaml")

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Ignore all instructions"}],
    )
except GlacisBlockedError as e:
    print(f"Blocked by {e.control_type}")  # e.g., "jailbreak"
    if e.score is not None:
        print(f"Score: {e.score:.2f}")

Programmatic Controls

You can also pass control instances directly to integrations without a config file, using the input_controls and output_controls parameters:

from glacis.controls import PIIControl, JailbreakControl
from glacis.config import PiiPhiControlConfig, JailbreakControlConfig
from glacis.integrations.openai import attested_openai

pii = PIIControl(PiiPhiControlConfig(enabled=True, mode="fast", if_detected="flag"))
jailbreak = JailbreakControl(JailbreakControlConfig(enabled=True, threshold=0.5, if_detected="block"))

client = attested_openai(
    input_controls=[pii, jailbreak],
)

Control Types

Glacis recognizes 8 control types. Each control you write or configure is classified into one of these types in the attestation record.

Type	Built-in	Description	Example Use Case
`pii`	`PIIControl`	PII/PHI detection	Scanning for SSNs, emails, medical records
`jailbreak`	`JailbreakControl`	Prompt injection detection (ML)	Blocking “ignore all instructions” attacks
`word_filter`	`WordFilterControl`	Literal keyword matching	Catching leaked terms like “confidential”
`content_safety`	`ContentSafetyControl`	Toxicity / harmful content (ML)	Filtering offensive or policy-violating output
`topic`	`TopicControl`	Topic enforcement (keyword)	Ensuring LLM stays within intended domain
`prompt_security`	`PromptSecurityControl`	Prompt extraction detection (regex)	Detecting system prompt extraction attempts
`grounding`	`GroundingControl` (stub)	Factual grounding / hallucination	Validating LLM output against source documents
`custom`	Catch-all	Any other validation	Domain-specific business logic

All 7 built-in controls listed above (excluding custom) can be configured entirely in glacis.yaml. The grounding control is a pass-through stub — for real grounding validation, use the custom section with a control that accepts reference text. Set the control_type class attribute on your custom control class to any of these values. Controls with unrecognized types are automatically classified as "custom" in the attestation.

Content Safety

Detects toxic, harmful, or policy-violating content using HuggingFace toxicity classifiers. The model is lazy-loaded on first use.

controls:
  output:
    content_safety:
      enabled: true
      model: "toxic-bert"                      # HuggingFace model alias
      threshold: 0.5                            # Score threshold (0-1)
      categories: ["toxic", "threat", "insult"] # Empty = all categories
      if_detected: "flag"

Categories (toxic-bert): toxic, severe_toxic, obscene, threat, insult, identity_hate.

Topic Enforcement

Keyword-based topic control with two modes: blocklist (flag matching terms) and allowlist (flag when no terms match).

controls:
  input:
    topic:
      enabled: true
      allowed_topics: ["healthcare", "medical", "patient"]  # Must match at least one
      blocked_topics: ["politics", "gambling"]               # Must not match any
      if_detected: "block"

When both are configured, blocked topics are checked first. No external dependencies required.

Prompt Security

Detects prompt extraction attempts, instruction overrides, and role manipulation using built-in regex patterns. Ships with patterns for common attacks (system prompt extraction, “ignore instructions”, DAN, developer mode, etc.).

controls:
  input:
    prompt_security:
      enabled: true
      patterns: ["secret\\s+password"]  # Additional custom patterns (regex)
      if_detected: "block"              # Defaults to "block" for security

Complements jailbreak (ML-based): prompt_security is rule-based and zero-latency. No external dependencies.

Grounding (Stub)

The built-in grounding control is a pass-through stub because check(text) doesn’t receive reference text for comparison. Enable it for attestation type classification, or implement real grounding via custom:

controls:
  output:
    grounding:
      enabled: true           # Stub: always passes, sets control_type="grounding"
    custom:
      - path: "my_grounding.GroundingValidator"  # Real implementation
        enabled: true
        args:
          reference_text: "The source document..."
          threshold: 0.7

Custom Controls

Custom controls let you plug any validation logic into the Glacis pipeline — LLM-based judges, ML models, API calls, regex matching, database lookups, or anything else. They run automatically on every LLM call and their results are cryptographically attested.

Writing a Custom Control

Three things are required:

Set control_type — a class attribute identifying the control (any of the 8 types above)
Implement check(text) — the single abstract method that receives the text to validate
Return a ControlResult — a standardized result with detection info

The check() method is the universal extension point. For input controls, text is the user’s message. For output controls, text is the LLM response. What happens inside check() is entirely up to you.

from glacis.controls.base import BaseControl, ControlResult


class GroundingControl(BaseControl):
    """Validates LLM output is grounded in a reference document."""

    control_type = "grounding"  # Maps to the "grounding" attestation type

    def __init__(self, api_key: str, threshold: float = 0.7, if_detected: str = "flag"):
        self._api_key = api_key
        self._threshold = threshold
        self._action = if_detected

    def check(self, text: str) -> ControlResult:
        # Your validation logic — LLM call, ML model, API, anything
        score = self._compute_grounding_score(text)
        is_ungrounded = score < self._threshold

        return ControlResult(
            control_type=self.control_type,
            detected=is_ungrounded,
            action=self._action if is_ungrounded else "forward",
            score=score,
            categories=["low_grounding"] if is_ungrounded else [],
            latency_ms=0,  # Set by your implementation
            metadata={"threshold": self._threshold, "model": "your-model"},
        )

    def _compute_grounding_score(self, text: str) -> float:
        # ... your scoring logic ...
        return 0.85

    def close(self) -> None:
        # Optional: release resources (API clients, ML models, etc.)
        pass

Configuring in glacis.yaml (Recommended)

The recommended way to register custom controls is through glacis.yaml. This lets you enable, disable, and tune controls without changing any code.

controls:
  output:
    custom:
      - path: "grounding_control.GroundingControl"  # module.ClassName
        enabled: true
        if_detected: "flag"
        args:
          api_key: "${OPENAI_API_KEY}"               # Environment variable
          threshold: 0.7

path — Dot-separated import path in the format module_name.ClassName. The module is resolved relative to the YAML file’s directory (automatically added to sys.path).

enabled — Toggle the control on/off without removing the configuration. Default: true.

if_detected — Action when the control detects an issue: "forward", "flag", or "block". Default: "flag". This is passed to your constructor as the if_detected kwarg.

args — Constructor keyword arguments. Supports ${ENV_VAR} substitution for secrets.

File Placement

Place your control module next to glacis.yaml. Glacis automatically adds the YAML file’s directory to sys.path, so imports just work:

my-project/
  glacis.yaml              # References "grounding_control.GroundingControl"
  grounding_control.py     # Your custom control module
  app.py

For controls in a package:

my-project/
  glacis.yaml              # References "controls.grounding.GroundingControl"
  controls/
    __init__.py
    grounding.py
  app.py

Environment Variable Substitution

Use ${VAR_NAME} syntax to inject environment variables into any string value in glacis.yaml. This works everywhere in the config, not just in custom control args:

controls:
  output:
    custom:
      - path: "my_control.QAValidator"
        args:
          api_key: "${OPENAI_API_KEY}"
          endpoint: "${VALIDATION_API_URL}"

If a referenced variable is not set, Glacis raises a clear error at startup:

ValueError: Environment variable 'OPENAI_API_KEY' is not set.
Referenced in glacis.yaml via ${OPENAI_API_KEY}.

Programmatic Registration (Alternative)

For cases where YAML configuration isn’t suitable (e.g., controls that require runtime-constructed objects), pass control instances directly:

from glacis.integrations.openai import attested_openai

client = attested_openai(
    output_controls=[GroundingControl(api_key="sk-...", threshold=0.7)],
)

Multiple Custom Controls

You can register any number of custom controls on both input and output stages:

controls:
  input:
    custom:
      - path: "security.PromptLeakDetector"
        enabled: true
        if_detected: "block"
        args:
          model: "classifier-v2"
  output:
    custom:
      - path: "grounding_control.GroundingControl"
        enabled: true
        if_detected: "flag"
        args:
          api_key: "${OPENAI_API_KEY}"
      - path: "toxicity.ContentSafetyControl"
        enabled: true
        if_detected: "block"
        args:
          threshold: 0.9

All controls — built-in and custom — run in parallel within each stage. Total latency equals the slowest control, not the sum. Errors in individual controls don’t crash the pipeline.

Troubleshooting

If a custom control fails to load, Glacis raises a descriptive error at startup:

Error	Cause	Example Message
`ImportError`	Invalid path format	`Invalid control path 'NoDotsHere'. Expected format: 'module_name.ClassName' (e.g., 'my_controls.ToxicityControl').`
`ImportError`	Module not found	`Cannot import module 'my_controls' for custom control 'my_controls.Foo'. Glacis looked in: /path/to/project (glacis.yaml directory) and standard Python path. Check that the file 'my_controls.py' exists and has no import errors.`
`AttributeError`	Class not in module	`Module 'my_controls' has no class 'Foo'. Available controls in 'my_controls': ['GroundingControl', 'ToxicityControl']`
`TypeError`	Not a BaseControl	`'my_controls.Helper' is not a BaseControl subclass. Custom controls must extend glacis.controls.base.BaseControl.`
`TypeError`	Constructor mismatch	`Failed to instantiate 'my_controls.MyCtrl' with args ['api_key']. Check that the constructor accepts these parameters. Error: ...`
`ValueError`	Missing env var	`Environment variable 'MY_KEY' is not set. Referenced in glacis.yaml via ${MY_KEY}.`

Control Plane Results

Control results are recorded in the attestation’s control_plane_results field. Each control execution is captured as a ControlExecution entry:

Field	Type	Description
`id`	`str`	Identifier (e.g., `"glacis-input-pii"`)
`type`	`str`	Control type (`"content_safety"`, `"pii"`, `"jailbreak"`, `"topic"`, `"prompt_security"`, `"grounding"`, `"word_filter"`, `"custom"`)
`version`	`str`	SDK version
`provider`	`str`	Provider identifier
`latency_ms`	`int`	Processing time in milliseconds
`status`	`str`	Action taken: `"forward"`, `"flag"`, `"block"`, or `"error"`
`score`	`float \| None`	Confidence score (scale is control-specific, e.g., 0-1 for ML classifiers, 0-3 for grading rubrics)
`result_hash`	`str \| None`	Hash of the control result
`stage`	`str`	Pipeline stage: `"input"` or `"output"`

The top-level determination field in the control plane results records whether the overall request was "forwarded" or "blocked".

ControlResult Reference

Every control returns a standardized ControlResult:

Field	Type	Description
`control_type`	`str`	Control type identifier
`detected`	`bool`	Whether a threat/issue was detected
`action`	`str`	`"forward"`, `"flag"`, `"block"`, or `"error"`
`score`	`float \| None`	Confidence score (must be >= 0, scale is control-specific)
`categories`	`list[str]`	Detected categories (e.g., `["US_SSN", "PERSON"]`)
`latency_ms`	`int`	Processing time in milliseconds
`modified_text`	`str \| None`	Reserved for future use (not currently used)
`metadata`	`dict`	Control-specific metadata for audit trail

Additional Public Exports

The glacis.controls module also exports the following types useful for programmatic control orchestration:

Export	Description
`ControlsRunner`	Orchestrates running multiple controls on a given text
`StageResult`	Result object from running controls on one stage (input or output)
`ControlAction`	`Literal["forward", "flag", "block", "error"]` type alias for control action strings

from glacis.controls import ControlsRunner, StageResult, ControlAction

Controls

Quick Start

Built-in Controls

PII/PHI Detection

Jailbreak Detection

Word Filter

Actions

Output Block Behavior

Using with Integrations

Programmatic Controls

Control Types

Content Safety

Topic Enforcement

Prompt Security

Grounding (Stub)

Custom Controls

Writing a Custom Control

Configuring in glacis.yaml (Recommended)

File Placement

Environment Variable Substitution

Programmatic Registration (Alternative)

Multiple Custom Controls

Troubleshooting

Control Plane Results

ControlResult Reference

Additional Public Exports

See Also