refine

Overhaul ObjectStorage Replicator
2026-06-22 22:24:06 +00:00 · 2026-06-23 01:17:09 +09:00 · 2026-06-22 13:38:31 +09:00
4 changed files with 61 additions and 2 deletions
@@ -18,7 +18,7 @@ Original title: Synchronise without CouchDB

 ### Methods and implementations

-Ordinarily, local pouchDB and the remote CouchDB are synchronised by sending each missing document through several conversations in their replication protocol. However, to achieve this plan, we cannot rely on CouchDB and its protocols. This limitation is so harsh. However, Overcoming this means gaining new possibilities. After some trials, It was concluded that synchronisation could be completed even if the actions that could be performed were limited to uploading, downloading and retrieving the list. This means we can use any old-fashioned WebDAV server, and Sophisticated “Object storages” such as Self-hosted MinIO, S3, and R2 or any we like. This is realised by sharing and complementing the differences of the journal by each client. Therefore, The focus is therefore on how to identify which are the differences and send them without dynamic communication.
+Ordinarily, local pouchDB and the remote CouchDB are synchronised by sending each missing document through several conversations in their replication protocol. However, to achieve this plan, we cannot rely on CouchDB and its protocols. This limitation is so harsh. However, Overcoming this means gaining new possibilities. After some trials, It was concluded that synchronisation could be completed even if the actions that could be performed were limited to uploading, downloading, and retrieving the list. This means we can use any old-fashioned WebDAV server, and sophisticated 'object storages' such as Self-hosted MinIO, S3, and R2, or any we like. This is realised by sharing and complementing the differences of the journal by each client. Therefore, The focus is therefore on how to identify which are the differences and send them without dynamic communication.

 All clients manage their data in PouchDB. I know this is probably known information, but it has its own journal. 

@@ -0,0 +1,49 @@
+## The design document of the Journal Replicator 2nd Edition
+
+### Goal
+- Build a robust and memory-efficient replication foundation that decouples the physical storage layer by leveraging the Web Streams API.
+- Maintain strict compliance with the data consistency and replication protocols of CouchDB/PouchDB.
+- Support 'Connection Strings' to easily extend compatibility to various object storages (e.g., S3 and MinIO).
+
+### Motivation
+- The original Journal Replicator used a custom queue mechanism called `Trench` to manage backpressure, which had limitations regarding memory efficiency when dealing with a massive number of files.
+- The storage operation logic was tightly coupled with `JournalSyncAbstract`, making it difficult to swap out the physical storage layer (e.g., S3 and WebDAV).
+- The transfer of revision trees (`_revisions`) conforming to PouchDB's replication protocol was implicitly managed. There was a need for a stricter, more deterministic application of document histories.
+
+### Methods and implementations
+
+#### Pipeline Construction using Web Streams API
+We replaced `Trench` with standard Web Streams APIs (`ReadableStream`, `TransformStream`, and `WritableStream`) to build the sending and receiving pipelines.
+- **Sending Pipeline**: Reads documents from the PouchDB changes stream, passes them through a compression `TransformStream`, and pipes them to an upload `WritableStream`. This enables automatic backpressure, keeping memory consumption stable even during large-scale synchronisation.
+- **Receiving Pipeline**: Processes storage file listing, downloading/decompression, and bulk application to PouchDB in a streamlined manner.
+
+#### Decoupling the Physical Layer via IJournalStorage
+To detach the storage operations from the core synchronisation logic (`JournalSyncCore`), we introduced the `IJournalStorage` interface.
+When adding new backend storages in the future (e.g., R2 and WebDAV), developers only need to add an Adapter that implements this interface, without modifying the core replicator.
+
+#### Strict Application of PouchDB Replication Protocols
+To synchronise precisely according to the CouchDB/PouchDB protocol, the following steps were optimised:
+1. **Transferring History**: Using `bulkGet({ revs: true })`, the replicator transfers not only the latest revision of a document but its entire history tree (`_revisions`) alongside the deletion flag (`_deleted`).
+2. **Applying History**: On the receiving end, the replicator uses `revsDiff` to identify which incoming revisions are missing locally. It then applies them using `bulkDocs(saveDocs, { new_edits: false })`.
+By specifying `new_edits: false`, PouchDB integrates the received history exactly as it is without treating them as new local edits. This prevents unexpected conflicts and redundant branching of the revision tree.
+
+#### Connection String Support
+To seamlessly connect to various physical storages, we introduced Connection Strings (e.g., `s3://accessKey:secretKey@endpoint/bucket/prefix?region=auto`).
+The connection string acts as a user-friendly configuration. Each Storage Adapter exposes an `isCompatible` and `parseConnectionString` method to verify if it can handle the connection string, and if so, dynamic configuration overrides are applied to establish the connection.
+
+### Performance and Speed Characteristics
+
+By migrating from the previous `Trench` architecture to the Web Streams API and strict PouchDB protocol compliance, the replication speed characteristics have changed in the following ways:
+
+1. **Consistent Throughput via Backpressure**:
+   The `Trench` mechanism occasionally loaded too many items into memory or stalled during massive transfers. The Web Streams API applies automatic backpressure across the pipeline (Read `changes` -> Compress -> Upload). While peak burst speeds might appear slightly smoothed out, the **sustained throughput is far more stable**, preventing out-of-memory crashes on mobile devices and keeping network utilisation optimal.
+
+2. **Faster Receive-Side Application (`new_edits: false`)**:
+   In the previous version, incoming documents were sometimes evaluated as new local edits. By utilising PouchDB's `bulkDocs({ new_edits: false })` alongside the proper `_revisions` tree, we bypass unnecessary conflict generation and local revision hashing. This drastically **speeds up the document insertion process** on the receiving end.
+
+3. **Optimised Network Traffic**:
+   Because conflicts are resolved deterministically and revision trees are replicated exactly as they exist, the system avoids generating 'echoes' (redundant synchronisations triggered by a device misunderstanding a history tree). This reduces unnecessary background traffic significantly.
+
+### Consideration and Conclusion
+The Journal Replicator 2nd Edition achieves robust and scalable storage synchronisation through enhanced memory efficiency (via Web Streams), decoupled extensibility (via IJournalStorage), and strict protocol compliance (via `new_edits: false`).
+Moving forward, this foundation will make it much easier to officially support a wider variety of backend storages.
@@ -3,6 +3,16 @@ Since 19th July, 2025 (beta1 in 0.25.0-beta1, 13th July, 2025)

 The head note of 0.25 is now in [updates_old.md](https://github.com/vrtmrz/obsidian-livesync/blob/main/updates_old.md). Because 0.25 got a lot of updates, thankfully, compatibility is kept and we do not need breaking changes! In other words, when get enough stabled. The next version will be v1.0.0. Even though it my hope.

+
+## Unreleased
+
+### Improved
+
+- Overhauled the Object Storage (e.g., MinIO and S3) replication engine ('Journal Replicator 2nd Edition'). It now leverages the standard Web Streams API for a resilient, backpressure-aware architecture, reducing memory footprints on large vaults.
+- Decoupled the physical storage logic to make it easier to add new storage backends in the future.
+- Stricter compliance with CouchDB's replication protocol (proper `_revisions` transfers with `new_edits: false`) when using Object Storage.
+- Introduced Connection String support for setup configuration.
+
 ## 0.25.77

 19th June, 2026
Author	SHA1	Message	Date
vorotamoroz	877d1b09f4	refine	2026-06-23 01:17:09 +09:00
vorotamoroz	9004c194b3	Overhaul ObjectStorage Replicator	2026-06-22 13:38:31 +09:00