SEOS · /proto · 24 JUN 2026

We ran every screen past a senior craft bar. Here is the honest state.

All 64 authenticated screens, seven lenses, 76 AI agents, and one adversarial pass whose only job was to kill false positives. 346 findings. The bones are real engineering taste, not template slop. But two systemic defects each break a promise the product makes about itself: accessible in both themes, and built for any industry. This is exactly what we found, and the plan to clear it.

Read the findings → The any-industry plan

screens audited

346

findings logged

P0 blockers, verified

AI agents run

review lenses

2 ▲

systemic roots

The job

One audit agent walked each of the 64 authenticated project screens against the product spec and a senior craft standard, scoring seven lenses: accessibility, performance, responsive behaviour, theming, visual craft, information architecture and copy, and a primary one this round, industry-neutrality. A separate analysis traced how hard-wired the app is to its six seeded engineering domains.

We did not eyeball 64 screens. Each screen's findings were then re-checked against the actual source by a different, skeptical agent, which discarded roughly forty percent of the first-pass high-severity findings as false positives or already-handled. So the blockers below are verified against real code, not first-pass guesses.

The verdict

Structurally sound and built with real taste: one cyan accent, monospaced tabular numbers, hairline rules, a proper per-theme token system, density without clutter. This does not read as a generic template under the hood. The damage is concentrated, and the weakest lens by far, accessibility, is lifted across dozens of screens by just three shared fixes.

Accessibility

2.19 / 4 weakest lens

Industry-neutral

2.66 / 4 systemic

IA & copy

2.88 / 4

Theming

3.20 / 4

Responsive

3.22 / 4

Visual craft

3.47 / 4

Performance

3.97 / 4 excellent

Every one of the 346 first-pass findings was then re-checked against source across two adversarial passes. 328 confirmed (9 P0, 53 P1, 145 P2, 121 P3); 18 dropped as false-positive or already-handled. Nothing here is directional. Anti-slop verdict: fail as shipped, but from a small set of fixable tells (repeated eyebrow labels on about eighteen screens, one glassmorphism legend, a few generic empty states, run-on copy, and the hard-coded domain data below), not from pervasive slop.

Two systemic defects

These are not 64 separate problems. Both are centralised, and both have centralised fixes. Each one breaks a promise SyntheraOS makes about itself.

DEFECT

Semantic colours used as text on their own tinted backgrounds

Across roughly 28 screens, the helpers that colour status and score values paint the text in the raw --danger / --warning / --success hue, on top of the soft tint of that same hue. Contrast lands far below the AA requirement of 4.5:1, and is catastrophic in light mode (measured around 1.9 to 2.5:1), on the signals an accountable engineer most needs to read: verification verdicts, change-approval status, risk bands, severity pills, the headline quality number. Light mode is effectively a second-class theme, which the spec forbids.

The fix is already in the codebase. The AA-verified, per-theme -ink tokens exist for exactly this. Make every colour helper return the -ink token for text and icons, and reserve the raw hue for fills, borders and dots. A token swap, not a redesign. Seven of the nine verified P0s are this single bug.

WCAG AA · ~28 screensscore-bar.tsx · gps/stage-detail · traceability-matrix-tableadd lint rule: no text-{danger,warning,success} on *-soft

DEFECT

Hard-coded desalination data appears on every project

The product's whole premise is cited, source-grounded output. Three places break that by emitting fixed desalination content regardless of the actual project. The AI assistant brief renders a verbatim desalination power single-point-of-failure closer (risk RSK-011) for every project. The risk-prediction engine branches UAV versus desalination and silently defaults any unrecognised industry to desalination. The completeness checker feeds desalination checklists to non-water projects.

Why it matters: for a domain expert opening a software, pharma or automotive project, this is confidently-wrong output, and it destroys the trust the product is built on. The good news: the core ontology (requirements, risks, verification, cost, schedule) is already neutral. This is a shell fix, detailed in the next section.

trust + any-industryassistant/page.tsx:312 · risk-prediction.ts:285 · completeness.ts:510

From six domains to any industry

The finding is encouraging: the core ontology is neutral. A thin shell couples the app to its six seeded domains (water/desalination, energy, marine, aerospace, infrastructure, "other"). The fix is a preset registry, a custom path, and replacing hard-coded branches with data-driven ones. The six stay as presets, no one is locked out of them.

PLAN / 01

A preset registry, not a closed enum

Turn the domain type from a fixed six-member union into a string id carrying its own vocabulary, standards and sample seeds. Ship the six as presets over a neutral core, and add a first-class "describe your industry" card in the new-project wizard with sensible neutral defaults.

EFFORT · L

PLAN / 02

De-binary the AI engines

The assistant brief reads from the project's own risk gaps. Risk prediction reads structural signals in the graph (single-source components, missing failover links, unlinked compliance, long-lead items, unverified requirements) with sector libraries as optional packs. Never silently fall back to desalination.

EFFORT · L

PLAN / 03

Neutralise the vocabulary

Make verification stages project-driven, so a software project shows unit / integration / UAT and a pharma project shows IQ / OQ / PQ with no code change. Drive readiness-gate titles, digital-twin lifecycle copy and role names from per-industry label maps with neutral fallbacks.

EFFORT · L

PLAN / 04

Prove it holds

Widen the standards catalogue, add a software seed and a pharma seed to prove that adding an industry equals adding one more preset, and add a regression check that an unrecognised industry never renders desalination data anywhere.

EFFORT · M

The order of work

Sequenced on purpose. The shared roots land first, so the per-screen work does not fight itself, and the largest accessibility and trust gains come from the cheapest, lowest-risk changes.

The three P0 roots

The -ink token swap across the colour helpers (about 28 screens), one global :focus-visible rule (no keyboard focus renders anywhere today, this fixes about 20 screens at once), and stripping the hard-coded desalination data from the AI paths. Mostly mechanical, low risk.

P0 · NOW

The DomainPreset registry

The industry-neutrality root from the section above. More invasive, but it is the single change that makes the product genuinely sector-agnostic.

Remaining accessibility and trust

Pair colour-only state with an icon or label plus ARIA, accessible names on form controls, recompute a risk's derived score in the same action that edits it, make the graph, tree and Gantt operable by assistive tech, the scroll pattern on clipping tables, missing empty states, and fix the open-issues count.

Polish

Migrate the hard-pixel font sizes to the rem scale for proper 200% zoom, and remove the eyebrow labels and the graph-legend glass effect. The visual tells that read as machine-made.

How we ran it

A deterministic, verifiable operation, not a long manual read. Seventy-six agents across the run, every claim checked by a different agent than the one that made it.

METHOD / 01

One agent per screen

Each of the 64 screens got a dedicated reviewer scoring all seven lenses against the spec and a senior craft bar, returning structured findings with exact file and line locations.

METHOD / 02

Adversarial verify, every finding

A second skeptical agent re-opened the real file for every finding, P0 through P3, and tried to refute each one. 18 of 346 were discarded as false positive or already-handled; the remaining 328 are confirmed against the code.

METHOD / 03

Coupling traced in parallel

A dedicated analysis read the intake, generator, intelligence and seed layers to map exactly where the app is hard-wired to a sector, and how to neutralise each point.

METHOD / 04

Deterministic synthesis

Scores, counts and the systemic patterns were aggregated in code, not guessed, then a high-effort lead reviewer wrote the cross-screen verdict and the remediation order.

One honest flag

An engineering OS should tell you the truth, so here is the one caveat about this report itself.

▲ This is a source-level audit, not a runtime pass

Every finding, all 346, was verified against the actual code, so nothing here is directional. The one honest caveat is the method itself: this is a static read of the source, not a runtime test. Contrast figures like ~1.9-2.5:1 are computed from the design-token values, not measured live in a browser, and the keyboard and screen-reader findings are read from the markup rather than driven through assistive tech. The next rung, if you want measured proof, is a Playwright and axe pass on the live build with real per-theme contrast and a keyboard walk of the graph, tree and timeline. The complete per-screen record, every finding with its location, impact and fix, is saved alongside this report.