ADR-007: Schema and Validation¶
Status¶
- Proposed by: Ryan on 2026-01-07
- Accepted on: 2026-01-07
Context¶
Users need a way to define and enforce the expected shape of their configuration. This includes:
- Documenting what keys exist and what they mean
- Specifying types, allowed values, ranges, and patterns
- Providing example values and descriptions
- Validating configs at runtime with helpful error messages
Key constraints:
- The schema format must be language-agnostic (not Python-specific, not JS-specific)
- Must work with multiple input formats (YAML, JSON)
- Should be simple and intuitive to write
- Must interact sensibly with lazy resolution (ADR-005)
Alternatives Considered¶
Alternative 1: JSON Schema¶
Use the industry-standard JSON Schema format.
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"database": {
"type": "object",
"properties": {
"host": { "type": "string", "description": "Database hostname" },
"port": { "type": "integer", "minimum": 1, "maximum": 65535 }
},
"required": ["host"]
}
}
}
- Pros: Industry standard, extensive tooling, editor support, language-agnostic
- Cons: Verbose, JSON syntax is tedious to write by hand, advanced features are complex
- Status: Under consideration
Alternative 2: YAML Schema (JSON Schema in YAML)¶
JSON Schema but written in YAML for readability.
type: object
properties:
database:
type: object
description: Database connection settings
properties:
host:
type: string
description: Database hostname
examples: ["localhost", "db.example.com"]
port:
type: integer
minimum: 1
maximum: 65535
default: 5432
required: [host]
- Pros: Same power as JSON Schema, more readable, language-agnostic
- Cons: Still verbose for simple cases, learning curve for JSON Schema concepts
- Status: Under consideration
Alternative 3: Inline Schema (schema in config file)¶
Define schema inline within the config file itself using special syntax.
# @schema: { type: string, description: "Database hostname" }
host: localhost
# @schema: { type: integer, min: 1, max: 65535 }
port: 5432
- Pros: Schema lives with the config, no separate file
- Cons: Clutters config files, awkward for nested structures, non-standard
- Rejected: Mixes concerns, makes configs harder to read
Alternative 4: Schema-as-Config (holoconf-native format)¶
A custom schema format designed specifically for holoconf, optimized for the common case.
# schema.holoconf.yaml
database:
_description: Database connection settings
host:
_type: string
_required: true
_description: Database hostname
_examples: [localhost, db.example.com]
port:
_type: integer
_range: [1, 65535]
_default: 5432
password:
_type: string
_description: Database password (typically from SSM)
- Pros: Mirrors config structure exactly, easy to understand, minimal syntax
- Cons: Yet another schema format, no existing tooling
- Status: Under consideration
Alternative 5: Language-Native Types (codegen)¶
Generate schemas from language-native type definitions.
# Python
@holoconf.schema
class DatabaseConfig:
host: str
port: int = 5432
password: str | None = None
- Pros: Type-safe access in each language, IDE autocomplete
- Cons: Language-specific, requires codegen step, schema not portable
- Rejected: Violates language-agnostic constraint
Open Questions (Proposal Phase)¶
All resolved - see Decision section.
Next Steps (Proposal Phase)¶
- [ ] Prototype JSON Schema validation in holoconf-core (using a Rust JSON Schema library)
- [ ] Implement
$refresolution for schema composition - [ ] Test two-phase validation with real-world configs
- [ ] Design error message format for validation failures
Decision¶
JSON Schema (YAML-serialized) with Two-Phase Validation
- Schema format: JSON Schema (Draft 2020-12), written in YAML or JSON
- Schema composition: Support
$reffor splitting schemas across files - Validation timing: Two-phase (structural after merge, type/value after resolution)
additionalProperties: Use JSON Schema default behavior (permissive unless explicitly set tofalse)- Interpolation in schemas: Not explicitly supported or documented, but not disabled (schemas are parsed like any YAML file)
- Language-native type generation: Out of scope for core; may be added as language-specific tooling later
Design¶
Schema Format¶
Schemas use standard JSON Schema, but can be written in any format holoconf supports (YAML, JSON):
# schema.yaml
type: object
required: [database, api]
properties:
database:
type: object
description: Database connection settings
required: [host]
properties:
host:
type: string
description: Database hostname
examples: ["localhost", "db.example.com"]
port:
type: integer
minimum: 1
maximum: 65535
default: 5432
password:
type: string
description: Database password (typically from SSM resolver)
api:
type: object
properties:
timeout:
type: number
minimum: 0
description: Request timeout in seconds
Schema Composition with $ref¶
Large schemas can be split across files using JSON Schema's $ref:
# schema.yaml (main schema)
type: object
properties:
networking:
$ref: "./schemas/networking.yaml"
application:
$ref: "./schemas/application.yaml"
database:
$ref: "./schemas/database.yaml"
# schemas/networking.yaml
type: object
required: [vpc_id]
properties:
vpc_id:
type: string
pattern: "^vpc-[a-f0-9]+$"
subnets:
type: array
items:
type: string
pattern: "^subnet-[a-f0-9]+$"
# schemas/database.yaml
type: object
required: [host]
properties:
host:
type: string
port:
type: integer
default: 5432
Refs are resolved relative to the schema file's location.
Two-Phase Validation¶
Validation happens in two phases to accommodate lazy resolution:
┌─────────────────────────────────────────────────────────────────┐
│ Config.load("base.yaml", "env.yaml", schema="schema.yaml") │
│ │
│ 1. Parse all config files │
│ 2. Merge configs (ADR-004) │
│ 3. PHASE 1: Structural validation │
│ - Required keys present (after merge) │
│ - Nesting structure correct │
│ - No unknown keys (if additionalProperties: false) │
│ - Interpolations like ${...} are allowed (not yet resolved) │
│ │
│ Returns Config object (or raises StructuralValidationError) │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ config.database.port (access triggers resolution) │
│ │
│ 1. Resolve interpolation: ${env:DB_PORT} -> "5432" │
│ 2. PHASE 2: Type/value validation │
│ - Type check (is it an integer?) │
│ - Constraints (minimum: 1, maximum: 65535) │
│ - Pattern matching (if specified) │
│ │
│ Returns resolved value (or raises TypeValidationError) │
└─────────────────────────────────────────────────────────────────┘
Phase 1 (Structural) - Runs after merge, before any resolution:
- Validates required keys are present
- Validates object/array nesting matches schema
- Validates additionalProperties constraints
- Treats ${...} interpolations as valid placeholders (any type)
Phase 2 (Type/Value) - Runs after each value is resolved: - Validates resolved value matches declared type - Validates constraints (minimum, maximum, pattern, enum, etc.) - Runs automatically when accessing values with lazy resolution
API Surface¶
# Load with schema (structural validation on load)
config = Config.load(
"base.yaml", "environment.yaml",
schema="schema.yaml"
)
# Access triggers resolution + type validation
port = config.database.port # Validates integer, range
# Validate entire config explicitly (resolves all + validates all)
await config.resolve_all()
config.validate() # Re-runs full validation on resolved values
# Load without schema, validate later
config = Config.load("config.yaml")
config.validate(schema="schema.yaml") # Both phases at once
Error Messages¶
Validation errors include path and context:
StructuralValidationError: Missing required key
Path: database.host
Schema: schema.yaml#/properties/database/required
Help: Add 'host' key to database section
TypeValidationError: Invalid type after resolution
Path: database.port
Expected: integer
Got: string ("not-a-number")
Resolved from: ${env:DB_PORT}
Schema: schema.yaml#/properties/database/properties/port
Rationale¶
- JSON Schema is an industry standard with existing tooling, documentation, and developer familiarity
- YAML serialization makes schemas readable and consistent with config file format
- Two-phase validation accommodates lazy resolution while still catching structural errors early
- Schema composition via
$refenables teams to share and reuse schema definitions - Permissive by default avoids surprising users with strict validation they didn't opt into
Trade-offs Accepted¶
- JSON Schema verbosity in exchange for standard format with existing tooling
- Two-phase validation adds complexity in exchange for correct handling of lazy resolution
- No language-native type generation in core in exchange for keeping core language-agnostic
Migration¶
N/A - This is a new feature.
Consequences¶
- Positive: Clear config documentation, early error detection, consistent validation across languages
- Negative: Additional file to maintain (schema), learning curve for JSON Schema
- Neutral: Schema validation is optional - configs work without schemas