21 KiB
Architectural Decision Record: Real Obsidian End-to-End Test Runner
Status
Proposed / Spike Implemented
Release
Not yet. Planned after the serviceFeature refactoring branch is reviewed.
Context
The current end-to-end tests run through Vitest browser mode and a mocked Obsidian environment in test/harness. This has been useful for exercising synchronisation flows without launching Obsidian, but it is no longer a reliable final signal for plug-in behaviour.
The main issues are:
- The harness reimplements a large part of the Obsidian API surface, including vault files, workspace events, settings, and lifecycle behaviour. This mock can drift from real Obsidian behaviour without failing.
- The tests run inside a browser-style environment, while the desktop plug-in runs inside Obsidian's Electron environment with its own application lifecycle, storage paths, command registry, and event ordering.
- Several high-value regressions are about integration boundaries: boot-up sequence timing, real vault file reflection, Obsidian command registration, settings persistence, restart prompts, and file watcher behaviour. These are precisely the areas where a mock harness gives weak confidence.
- Maintaining the harness competes with maintaining the plug-in. Adding behaviour to the plug-in often requires teaching the mock another Obsidian detail before the actual regression can be tested.
The current harness should therefore stop being treated as the primary E2E layer.
Decision
Introduce a new E2E layer that launches real Obsidian with temporary vaults and the built Self-hosted LiveSync plug-in installed into those vaults.
The long-term test pyramid should be:
- Unit tests for deterministic operations and serviceFeature boundaries.
- Integration tests for CouchDB, Object Storage, P2P services, database operations, and replication protocols.
- Real Obsidian E2E tests for boot-up sequence, vault reflection, command registration, settings dialogues, restart scheduling, and user-visible workflows.
The existing test/harness should be demoted to a transitional compatibility layer. It may remain temporarily while the real Obsidian runner reaches parity for critical flows, but new high-level E2E coverage should target the real runner.
Non-Goals
- Do not replace unit or integration tests with slow UI tests.
- Do not keep extending the Obsidian mock to cover new Obsidian APIs unless a short-term compatibility bridge is required.
- Do not require real Obsidian E2E for every pull request initially. The first CI integration should be opt-in or nightly until stability is proven.
- Do not test every setting dialogue through UI clicks if the behaviour is already covered by unit or integration tests. Use UI automation only for workflows whose risk is in real Obsidian integration.
Proposed Architecture
Runner
Create a dedicated runner under test/e2e-obsidian/.
The runner should:
- Create one or more temporary vault directories.
- Build the plug-in once with
npm run buildor a narrower production build command. - Install
main.js,manifest.json, andstyles.csswhen present into.obsidian/plugins/obsidian-livesync/. - Prepare
.obsidian/community-plugins.jsonand.obsidian/plugins/obsidian-livesync/data.jsonas needed. - Launch Obsidian against the temporary vault.
- Wait until the plug-in reports readiness through a deterministic probe.
- Drive assertions through a narrow control channel rather than fragile visual selectors wherever possible.
- Dispose of Obsidian and temporary vaults after each scenario.
Obsidian Launch
The preferred desktop target is the installed Obsidian application. The launch mechanism should be platform-specific but hidden behind a small adapter:
- Linux: launch the Obsidian executable with a vault path or Obsidian URI, depending on what is most reliable. If an AppImage is used and FUSE is not available, extract it with
--appimage-extractand launch the extractedsquashfs-root/obsidianbinary. - macOS: launch the app bundle through
openor the executable inside the bundle. - Windows: launch the installed executable or the registered application protocol.
The first implementation can support Linux only if that is the local and CI target. Cross-platform support can be added after the runner contract is stable.
In headless Linux environments, launch through xvfb-run, pass Electron flags such as --no-sandbox and --disable-gpu, and isolate HOME, XDG_CONFIG_HOME, and --user-data-dir per temporary vault.
Control Channel
The runner needs a stable way to observe readiness and issue test commands. Prefer a test-only plug-in bridge compiled only in test builds or enabled only by an environment variable.
Possible bridge options:
- The official Obsidian CLI, using the installed
obsidian-clihelper to open vaults, reload the plug-in, runeval, and call developer commands. - A local HTTP/WebSocket bridge bound to
127.0.0.1with a random port and token. - A file-based bridge in the vault, where Obsidian writes status files and consumes command files.
- A DevTools protocol bridge if Obsidian exposes a stable debugging port in the test environment.
The first implementation uses Obsidian's CLI for orchestration and readiness checks. The CLI handles vault opening through obsidian://open?path=..., enables community plug-ins through app.plugins.setEnable(true), reloads Self-hosted LiveSync through plugin:reload id=obsidian-livesync, and verifies that app.plugins.plugins['obsidian-livesync'] is loaded.
This keeps E2E-only behaviour out of the production plug-in bundle. The runner should not require Self-hosted LiveSync to write marker files or expose a test server merely to prove that Obsidian loaded it.
The DevTools protocol remains useful for diagnostics. Obsidian's CLI exposes developer commands such as dev:cdp, dev:errors, and dev:console, so the runner should prefer the CLI path first and fall back to direct DevTools attachment only if the CLI cannot provide the required signal.
Test Data and Services
Keep the existing Docker scripts for CouchDB, MinIO, and P2P services. The real Obsidian runner should reuse these service fixtures instead of creating another service orchestration stack.
Each test should use unique database names, bucket prefixes, vault names, and P2P room IDs. This prevents tests from depending on cleanup and makes interrupted runs less harmful.
Migration Plan
Phase 0: Discovery
- Confirm how Obsidian can be launched reliably on the local development environment.
- Confirm whether Obsidian accepts a vault path directly, requires an Obsidian URI, or needs a pre-existing vault registry.
- Identify where Obsidian stores per-user state in the test environment and decide how to isolate it.
- Decide whether the first bridge is file-based or HTTP/WebSocket.
Initial discovery on Linux ARM64 found that:
Obsidian-1.12.7-arm64.AppImagerequireslibfuse.so.2for direct AppImage execution.- Extracting the AppImage with
--appimage-extractworks without FUSE. - Launching the extracted
squashfs-root/obsidianbinary underxvfb-runwith isolated user data stays alive for the smoke timeout. - No missing shared libraries were reported by
lddfor the extracted binary in the tested environment. - Obsidian's CLI is disabled unless the global
obsidian.jsoncontainscli: true. - Passing only
.obsidian/community-plugins.jsonis not enough to load community plug-ins on Obsidian 1.12. The runner also has to enable the global community plug-in switch throughapp.plugins.setEnable(true). - The reliable launch sequence is: start Obsidian, send
obsidian://open?path=...throughobsidian-cli, wait until the vault-side CLI exposes the plug-in catalogue, enable community plug-ins, reload Self-hosted LiveSync, and verify plug-in readiness throughobsidian-cli eval.
Phase 1: Smoke Runner
- Add
test/e2e-obsidian/runnerutilities for temporary vault creation, plug-in installation, launch, readiness wait, and cleanup. - Add one smoke test:
- launch Obsidian with an empty vault,
- load Self-hosted LiveSync,
- wait for the boot-up sequence to become ready,
- read the plug-in version or status through the control channel,
- close Obsidian cleanly.
- Add an npm script such as
test:e2e:obsidian.
Current implementation status:
- Added
test/e2e-obsidian/runnerhelpers for Obsidian discovery, CLI discovery, temporary vault creation, plug-in installation, process launch, CLI execution, and readiness polling. - Added
test:e2e:obsidian:discover,test:e2e:obsidian:cli-help,test:e2e:obsidian:smoke,test:e2e:obsidian:vault-reflection,test:e2e:obsidian:couchdb-upload,test:e2e:obsidian:minio-upload,test:e2e:obsidian:startup-scan,test:e2e:obsidian:two-vault-sync,test:e2e:obsidian:hidden-file-snippet-sync,test:e2e:obsidian:customisation-sync,test:e2e:obsidian:setting-markdown-export,test:e2e:obsidian:local-suite,test:e2e:obsidian:local-suite:services, andtest:e2e:obsidian:install-appimage. - Added
startObsidianLiveSyncSession()so future workflows can reuse the launch, trusted temporary vault state, vault open, community plug-in reload, and readiness sequence without duplicating smoke runner code. - Added CouchDB runner utilities that reuse
.test.env/process environment values, create unique temporary databases, query uploaded documents directly, and clean up the database unlessE2E_OBSIDIAN_KEEP_COUCHDB=trueis set. - Added a manual AppImage installer that downloads Obsidian
1.12.7forarm64orx86_64, stores it under_testdata/obsidian, and extracts it for FUSE-free execution. - Confirmed the smoke runner on Linux ARM64 with the extracted Obsidian
1.12.7AppImage,xvfb-run, and the built Self-hosted LiveSync bundle. - Confirmed the runner can enable the Obsidian CLI through isolated
obsidian.jsonstate, pre-seed the temporary Chromium local storage so the generated vault ID is trusted for community plug-ins, open the temporary vault throughobsidian-cli, reload Self-hosted LiveSync, and verify readiness throughobsidian-cli eval. - Removed the first test-only ready-marker bridge from the plug-in bundle. The current runner observes readiness from outside the plug-in through Obsidian's own CLI, so normal user vaults do not receive E2E marker files.
Current verification:
npm run tsc-checkpasses.npm run buildpasses with existing Svelte warnings.npm run test:e2e:obsidian:discoverfinds_testdata/obsidian/squashfs-root/obsidianwhen the extracted AppImage is present.E2E_OBSIDIAN_SMOKE_TIMEOUT_MS=1000 npm run test:e2e:obsidian:smokepasses locally.npm run test:e2e:obsidian:vault-reflectioncreates a note through Obsidian's vault API, verifies the reflected file on disk, and reads it back through Obsidian.npm run test:e2e:obsidian:couchdb-uploadconfigures a unique CouchDB database, creates a note through Obsidian, commits it into the local database, runs one-shot synchronisation, and verifies that CouchDB contains the metadata document and all referenced chunk documents.npm run test:e2e:obsidian:minio-uploadconfigures a unique Object Storage bucket prefix, creates a note through Obsidian, runs one-shot Journal Sync, and verifies through the AWS SDK that objects were written to the S3-compatible bucket.npm run test:e2e:obsidian:startup-scanverifies that a file written while Obsidian is stopped is picked up during the next real Obsidian boot and uploaded to CouchDB after one-shot synchronisation.npm run test:e2e:obsidian:two-vault-syncverifies two-vault note synchronisation: creation, update, rename, deletion, per-device target-filter differences, and a separate encrypted round-trip with Path Obfuscation enabled. The experimental Markdown conflict automatic merge check is available withE2E_OBSIDIAN_INCLUDE_MARKDOWN_CONFLICT=truebut is not part of the default local suite.npm run test:e2e:obsidian:hidden-file-snippet-syncverifies hidden file synchronisation as a two-vault round-trip: creation, deletion, automatic JSON conflict merging with the merged result propagated by a second synchronisation, manual JSON Resolve dialogue application through Obsidian's UI, and per-device target-pattern differences.npm run test:e2e:obsidian:customisation-syncverifies a two-vault Customisation Sync workflow: scan a real snippet CSS file, config JSON file, and sample plug-in fixture into per-file Customisation Sync data, synchronise them through CouchDB, apply them on the second vault, assert the resulting.obsidianfiles, propagate a snippet update, and verify deletion of the source-vault snippet sync data without confusing it with the target vault's own applied copy.npm run test:e2e:obsidian:setting-markdown-exportverifies that setting Markdown export creates a vault file and omits credentials when credential export is disabled.npm run test:e2e:obsidian:install-appimagereuses the existing AppImage and extracted binary when they are already present.npm run test:e2e:obsidian:local-suiteruns the local verification sequence for the real Obsidian runner after CouchDB and MinIO have been started.npm run test:e2e:obsidian:local-suite:servicesstops leftover CouchDB and MinIO fixtures, starts fresh fixtures, runs the local suite, and stops the fixtures again.npm run test:e2e:obsidian:local-suite:serviceshas been verified locally with real Obsidian, CouchDB, and MinIO. The run completed discovery, smoke, vault reflection, CouchDB upload, Object Storage upload, startup scan, two-vault synchronisation, Hidden File Sync, Customisation Sync, and setting Markdown export. The build step still emits existing Svelte warnings.
Known limits:
- The smoke runner currently proves only one-vault launch and plug-in load readiness. Broader workflows are covered by separate real Obsidian scripts, including CouchDB upload, startup scan, two-vault note synchronisation, Hidden File Sync, Customisation Sync, and setting Markdown export.
- Cross-platform support is still discovery-level. The working path has been validated on Linux ARM64. macOS and Windows should be validated in their own environments as follow-up work.
- CI wiring is intentionally not implemented. The runner depends on a licensed desktop application and is treated as a local verification tool.
Phase 2: First Real Workflow
- Add a one-vault local workflow:
- configure a temporary CouchDB database,
- create a note in the real vault,
- wait for metadata and chunks to be stored,
- restart Obsidian,
- verify that the plug-in loads and the note remains consistent.
This validates real boot-up, settings persistence, vault file access, database writes, and restart-sensitive state.
Current implementation status:
- Added a pre-CouchDB workflow that creates a note through Obsidian's vault API, confirms the note is reflected as a real vault file, and reads the same note back through Obsidian. This covers the vault reflection part of the Phase 2 path before remote database setup is introduced.
- Added a first CouchDB-backed upload workflow, modelled after the CLI Deno tests: reuse the standard CouchDB environment variables, create a unique remote database, apply CouchDB settings through the plug-in's setting service, commit the note through the real Obsidian vault path, run one-shot synchronisation, and assert that remote metadata and chunks exist.
- Added an Object Storage-backed upload workflow against MinIO to exercise Journal Sync and the AWS SDK path from real Obsidian.
- Added Obsidian-specific workflows for boot-time vault scanning, two-vault note synchronisation, hidden
.obsidian/snippetsfile round-tripping, hidden JSON conflict resolution, Customisation Sync application for snippets, config JSON files, and plug-in fixtures, per-device target-filter differences, and setting Markdown export. These scenarios assert against CouchDB documents, vault files, or real Obsidian UI outcomes instead of internal service state.
Phase 3: Two-Vault Synchronisation
- Launch two Obsidian instances with two temporary vaults.
- Configure both against the same temporary remote database.
- Create, modify, rename, and delete notes in one vault.
- Verify reflection in the other vault.
- Cover encrypted and non-encrypted configurations separately.
Current implementation status:
test:e2e:obsidian:two-vault-synccovers creation, update, rename, deletion, and per-device target-filter behaviour for a non-encrypted CouchDB configuration. Markdown conflict automatic merging remains an optional check because it needs a dedicated, less timing-sensitive fixture.- The same script creates a separate temporary CouchDB database and temporary vault pair for an encrypted two-vault round-trip with Path Obfuscation enabled.
Phase 4: Harness Retirement
- Mark
test/harnessas deprecated in documentation. - Stop adding new tests to
test/suiteunless they are explicitly transitional. - Do not mechanically port
test/suiteinto real Obsidian E2E. Scenarios that can already be exercised and asserted through the CLI test layer should stay there or move to lower-level integration tests. - Prioritise real Obsidian coverage for behaviours that the CLI cannot prove well, especially RedFlag flag-file recovery flows, Fast Setup (Simple Fetch), boot-up sequencing, restart-sensitive initial synchronisation, and user-visible recovery dialogues.
- Remove the harness only after the new runner covers the critical boot-up and synchronisation workflows.
Current implementation status:
test/harnessis now documented as a transitional compatibility layer.- New broad E2E work should target
test/e2e-obsidian/when real Obsidian behaviour is the risk being tested. - The next high-value scenarios are RedFlag variants and Fast Setup (Simple Fetch) variants, not a line-by-line migration of
test/suite.
Local Verification Strategy
Real Obsidian E2E is a local verification layer. It should not be wired into the default CI gate.
- Keep the scripts individually runnable for focused local debugging.
- Provide
test:e2e:obsidian:local-suitefor a broader local pass after the CouchDB and MinIO fixtures have been started. - Provide
test:e2e:obsidian:local-suite:servicesfor a broader local pass that manages the CouchDB and MinIO fixtures itself. - Use
OBSIDIAN_BINARYwhen testing against an installed desktop application. - Use
test:e2e:obsidian:install-appimageon Linux when a local AppImage copy is needed, and reuse the extracted_testdata/obsidian/squashfs-rootdirectory between local runs. - Capture Obsidian logs, plug-in logs, vault snapshots, and service logs manually when investigating failures.
- Fail fast on launch failures, readiness timeouts, and cleanup failures with clear diagnostics.
Risks and Mitigations
- Obsidian licensing and installation: Keep the runner local-first and capable of using
OBSIDIAN_BINARY. - Flakiness from UI timing: Prefer a control channel and service-level probes over visual selectors.
- Multiple instances: Obsidian may not support multiple independent instances cleanly on all platforms. Start with one-instance smoke tests, then validate two-instance behaviour on Linux before expanding scope.
- State leakage: Isolate vault directories, Obsidian user data, remote database names, and bridge tokens per test.
- Security of E2E controls: Keep readiness and control outside the production plug-in bundle. Prefer Obsidian CLI probes over E2E-only plug-in code.
- Runtime cost: Keep real Obsidian E2E out of the default PR gate. Use focused scripts or the local suite when a change touches real Obsidian integration.
Open Questions
- Which launch mechanism is most reliable for Obsidian on each supported desktop platform?
- Can two Obsidian instances run with isolated user data at the same time?
- Do future scenarios need a richer control channel than Obsidian CLI, or can CLI
evaland developer commands cover the required workflows? - Should any future E2E-only plug-in code live in a separate test build, or should the production bundle remain free of E2E controls?
- Which RedFlag and Fast Setup (Simple Fetch) variants should be added first?
Initial Implementation Checklist
- Add an Obsidian launch discovery script that prints the detected executable, version, and launch mode.
- Add temporary vault and plug-in installation helpers.
- Add CLI-based plug-in readiness polling.
- Add
test:e2e:obsidian:smokefor one-vault plug-in load. - Document required local environment variables, especially
OBSIDIAN_BINARY. - Port one CouchDB-backed workflow after the smoke test is stable.
- Mark
test/harnessas transitional and block new broad E2E work from targeting it. - Add the local suite script for broader local verification.
Consequences
- Real Obsidian E2E becomes the source of truth for plug-in lifecycle and vault integration.
- Unit and integration tests remain the primary fast feedback loops.
- The old browser harness can be deleted once the new runner covers the critical workflows.
- The project will gain slower but higher-confidence tests for the behaviours most likely to differ between mocks and Obsidian itself.