ADR-013: Testing Architecture¶
Status¶
- Proposed by: Ryan on 2026-01-07
- Accepted on: 2026-01-07
Context¶
Holoconf is a multi-language configuration library with a Rust core and bindings for Python, JavaScript, and Go. We need a testing strategy that:
- Ensures feature parity across all language implementations
- Expresses tests in domain language (configuration, resolvers, schemas, etc.)
- Allows writing tests once and running them against all implementations
- Separates test logic from implementation details
This is critical because: - Bugs in one binding but not another are hard to catch without shared tests - Configuration semantics must be identical across languages - Maintaining separate test suites per language leads to divergence
Alternatives Considered¶
Alternative 1: Language-Specific Test Suites¶
Each language binding has its own independent test suite written idiomatically.
- Pros: Idiomatic tests, easy to write, no shared infrastructure
- Cons: No feature parity guarantee, duplicate effort, tests diverge over time
Alternative 2: Rust-Only Testing¶
Test only the Rust core extensively; assume bindings are thin wrappers.
- Pros: Simple, fast, single test suite
- Cons: Binding bugs undetected, FFI edge cases missed, no validation that bindings work correctly
Alternative 3: Cucumber/Gherkin¶
Use Gherkin syntax for test definitions with Cucumber runners per language.
- Pros: Human-readable "Given/When/Then" format, established tooling
- Cons: Step definition mapping overhead, natural language parsing required, Gherkin's strength (stakeholder readability) isn't valuable when our audience is developers, adds dependency across all languages
Alternative 4: Four-Tier Testing Architecture with YAML¶
A layered architecture separating test definitions from execution, using YAML for definitions.
┌─────────────────────────────────────────────────────────┐
│ Test Definitions │
│ (YAML - what to test, expected results) │
├─────────────────────────────────────────────────────────┤
│ DSL Layer │
│ (Domain operations: given_config, when_resolved...) │
├──────────┬──────────┬──────────┬──────────┬────────────┤
│ Rust │ Python │ JS │ Go │ Future │
│ Driver │ Driver │ Driver │ Driver │ Driver │
├──────────┼──────────┼──────────┼──────────┼────────────┤
│ Rust │ Python │ JS │ Go │ Future │
│ Library │ Bindings │ Bindings │ Bindings │ Bindings │
└──────────┴──────────┴──────────┴──────────┴────────────┘
- Pros: Feature parity enforced, write once run everywhere, domain-focused tests, YAML matches our config domain
- Cons: Upfront infrastructure investment, shared test format design needed
Open Questions (Proposal Phase)¶
All resolved - see Decision section.
Next Steps (Proposal Phase)¶
- [ ] Define the DSL operations vocabulary
- [ ] Implement Rust driver as reference
- [ ] Implement one binding driver (Python) to validate architecture
- [ ] Create initial acceptance test suite
Decision¶
Four-Tier Testing Architecture with YAML test definitions, covering acceptance tests, unit tests, and performance benchmarks.
Why YAML over Gherkin¶
- Config files are already YAML - Test definitions like
given: config: |feel natural and consistent with the domain - No step definition mapping - YAML is already structured; drivers consume it directly without parsing natural language
- Universal parsing - Every language has robust YAML support; Cucumber support varies
- Developer audience - Gherkin's strength is stakeholder readability; our users are developers comfortable with structured data
Design¶
Tier 1: Test Definitions (Universal)¶
Tests are defined in YAML that describes: - Setup (config files, environment, resolvers) - Actions (load, access, resolve, validate) - Assertions (expected values, errors, behaviors)
# tests/acceptance/resolvers/env_resolver.yaml
suite: env_resolver
description: Environment variable resolver behavior
tests:
- name: resolves_environment_variable
given:
env:
PORT: "8080"
config: |
port: ${env:PORT}
when:
access: port
then:
value: "8080"
- name: uses_default_when_missing
given:
config: |
port: ${env:UNDEFINED_VAR,default=3000}
when:
access: port
then:
value: "3000"
- name: errors_when_missing_no_default
given:
config: |
port: ${env:UNDEFINED_VAR}
when:
access: port
then:
error:
type: ResolverError
message_contains: "Environment variable not found"
Tier 2: DSL Layer (Per-Language)¶
The DSL provides domain operations in each language. These are high-level, readable operations that map to the test definition vocabulary.
Python DSL Example:
# holoconf_test/dsl.py
class ConfigTestDSL:
def given_env(self, env: dict[str, str]) -> Self: ...
def given_config(self, yaml_content: str) -> Self: ...
def given_config_file(self, path: str) -> Self: ...
def given_schema(self, schema: dict) -> Self: ...
def given_resolver(self, name: str, resolver: Callable) -> Self: ...
def when_load(self) -> Self: ...
def when_access(self, path: str) -> Self: ...
def when_resolve_all(self) -> Self: ...
def when_validate(self) -> Self: ...
def when_export(self, format: str) -> Self: ...
def then_value_equals(self, expected: Any) -> Self: ...
def then_value_is_type(self, expected_type: type) -> Self: ...
def then_error(self, error_type: str, message_contains: str = None) -> Self: ...
def then_config_equals(self, expected: dict) -> Self: ...
Rust DSL Example:
// holoconf_test/src/dsl.rs
impl ConfigTestDSL {
pub fn given_env(mut self, env: HashMap<String, String>) -> Self { ... }
pub fn given_config(mut self, yaml: &str) -> Self { ... }
pub fn given_schema(mut self, schema: Value) -> Self { ... }
pub fn when_load(mut self) -> Self { ... }
pub fn when_access(mut self, path: &str) -> Self { ... }
pub fn then_value_equals<T: PartialEq>(self, expected: T) -> Self { ... }
pub fn then_error(self, error_type: &str) -> Self { ... }
}
Tier 3: Drivers (Per-Language)¶
Drivers translate DSL operations into actual library calls. They handle language-specific details like error handling, type conversion, and FFI boundaries.
Drivers are hand-written for each language binding. This is preferred over auto-generation because: - Only 4 languages to support - Driver logic involves language-specific idioms - Auto-generation would require a meta-schema adding complexity
Python Driver:
# holoconf_test/driver.py
class HoloconfDriver:
def __init__(self):
self._config = None
self._last_result = None
self._last_error = None
def load_config(self, yaml_content: str, **options):
"""Actually calls holoconf.Config.load()"""
try:
self._config = Config.loads(yaml_content, **options)
except Exception as e:
self._last_error = e
def access_path(self, path: str):
"""Actually accesses config.path.to.value"""
try:
self._last_result = self._config.get(path)
except Exception as e:
self._last_error = e
def get_last_result(self): return self._last_result
def get_last_error(self): return self._last_error
Tier 4: System Under Test¶
The actual holoconf libraries - Rust core and language bindings.
Test Categories¶
Acceptance Tests (Cross-Language)¶
High-level tests that verify documented behavior from a user's perspective. Run against all implementations to ensure feature parity.
Characteristics: - Written in YAML, executed via the four-tier architecture - Test observable behavior, not implementation details - One test definition runs against Rust, Python, JS, and Go - If it's in the documentation, there's an acceptance test for it
What they cover: - Resolver behavior (env, self, file, http) - Merging semantics - Schema validation - Type coercion - Error messages and help text - Interpolation syntax - API surface (load, access, export)
Location: tests/acceptance/
tests/acceptance/
├── resolvers/
│ ├── env_resolver.yaml
│ ├── self_resolver.yaml
│ ├── file_resolver.yaml
│ └── http_resolver.yaml
├── merging/
│ ├── deep_merge.yaml
│ └── override_semantics.yaml
├── schema/
│ ├── type_validation.yaml
│ └── coercion.yaml
├── interpolation/
│ ├── syntax.yaml
│ ├── escaping.yaml
│ └── nesting.yaml
└── errors/
├── resolver_errors.yaml
└── validation_errors.yaml
Unit Tests (Language-Specific)¶
Low-level tests for implementation details, internal functions, and edge cases not visible through the public API.
Characteristics: - Written idiomatically in each language's test framework - Test internal implementation details - May test private/internal APIs - Not required to be cross-language
What they cover: - Parser internals and edge cases - Memory management (Rust) - FFI boundary correctness (bindings) - Language-specific type conversions - Internal helper functions
Location:
- Rust: crates/holoconf-core/src/**/*_test.rs or crates/holoconf-core/tests/
- Python: bindings/python/tests/unit/
- JS: bindings/js/tests/unit/
- Go: bindings/go/**/*_test.go
Performance Tests¶
Benchmark tests that verify performance characteristics. These use the same four-tier architecture with performance-specific assertions.
Characteristics:
- Written in YAML like acceptance tests
- Use then: performance: assertions instead of value assertions
- Run separately from functional tests (longer execution time)
- Establish baselines and detect regressions
What they cover: - Config loading time (small, medium, large files) - Resolution latency (single value, bulk resolution) - Memory usage during resolution - Concurrent access performance
Location: tests/performance/
# tests/performance/loading.yaml
suite: config_loading_performance
tests:
- name: small_config_loads_quickly
given:
config_file: fixtures/small_config.yaml # ~50 keys
when:
load: {}
then:
performance:
max_duration_ms: 10
max_memory_mb: 5
- name: large_config_loads_reasonably
given:
config_file: fixtures/large_config.yaml # ~10,000 keys
when:
load: {}
then:
performance:
max_duration_ms: 500
max_memory_mb: 50
- name: bulk_resolution_scales_linearly
given:
config_file: fixtures/many_resolvers.yaml # 1000 env vars
env_from_file: fixtures/env_vars.json
when:
resolve_all: {}
then:
performance:
max_duration_ms: 200
Test Runner¶
A universal test runner loads YAML test definitions and executes them against each driver:
# tools/test_runner.py
def run_tests(driver: str, test_files: list[str], category: str = "acceptance"):
"""
Run tests against a specific driver.
Usage:
python tools/test_runner.py --driver python tests/acceptance/**/*.yaml
python tools/test_runner.py --driver rust tests/performance/**/*.yaml
"""
driver_impl = load_driver(driver) # Python, Rust, JS, Go
for test_file in test_files:
suite = load_test_suite(test_file)
for test in suite.tests:
dsl = ConfigTestDSL(driver_impl)
execute_test(dsl, test)
Coverage Requirements¶
| Category | Coverage Target | Rationale |
|---|---|---|
| Acceptance Tests | 100% of documented features | Feature parity guarantee |
| Unit Tests (Core) | 90% line coverage | Critical path coverage |
| Unit Tests (Bindings) | 80% line coverage | FFI edge cases |
| Performance Tests | Key operations baselined | Regression detection |
CI Integration¶
# .github/workflows/test.yml
jobs:
acceptance-tests:
strategy:
matrix:
driver: [rust, python, js, go]
steps:
- run: python tools/test_runner.py --driver ${{ matrix.driver }} tests/acceptance/**/*.yaml
unit-tests-rust:
steps:
- run: cargo test --workspace
unit-tests-bindings:
strategy:
matrix:
binding: [python, js, go]
steps:
- run: cd bindings/${{ matrix.binding }} && make test
performance-tests:
# Run on main branch only, or on-demand
if: github.ref == 'refs/heads/main'
strategy:
matrix:
driver: [rust, python] # Primary implementations
steps:
- run: python tools/test_runner.py --driver ${{ matrix.driver }} tests/performance/**/*.yaml
Rationale¶
- Four-tier separation keeps test logic independent of implementation
- YAML over Gherkin because our domain is already YAML-based and our audience is developers
- Acceptance + Unit separation allows high-level feature parity tests while still testing implementation details
- Performance in same architecture reuses infrastructure while keeping concerns separate
- Hand-written drivers are simpler than auto-generation for 4 languages
Trade-offs Accepted¶
- Upfront infrastructure investment to build test framework, in exchange for long-term maintainability and feature parity
- YAML less readable than Gherkin for non-developers, in exchange for simpler implementation and domain consistency
- Acceptance tests may not cover all edge cases, addressed by language-specific unit tests
- Performance tests in YAML have limited expressiveness, in exchange for consistent tooling
Migration¶
N/A - This is the initial architecture decision.
Consequences¶
- Positive: Feature parity enforced across all bindings
- Positive: Tests written once, run everywhere
- Positive: Adding new language bindings just requires writing a driver
- Positive: Clear separation between "what behavior" (acceptance) and "how implemented" (unit)
- Negative: Initial setup requires building test infrastructure
- Neutral: Test failures clearly indicate which binding has issues