Experiment with PDF generation

Signed-off-by: David Young <davidy@funkypenguin.co.nz>
2025-12-29 17:41:44 +00:00 · 2022-08-19 16:40:53 +12:00
parent c051e0bdad
commit abf9309cb1
317 changed files with 124 additions and 546 deletions
--- a/docs/recipes/paperless-ng.md
+++ b/docs/recipes/paperless-ng.md
@@ -0,0 +1,182 @@
+---
+title: Run paperless-ngx under Docker
+description: Easily index, search, and view archive all of your scanned dead-tree documents with Paperless NGX, under Docker, now using the linuxserver image since the fork from from paperless-ng to paperless-ngx!
+---
+
+# Paperless NGX
+
+Paper is a nightmare. Environmental issues aside, there’s no excuse for it in the 21st century. It takes up space, collects dust, doesn’t support any form of a search feature, indexing is tedious, it’s heavy and prone to damage & loss. [^1] Paperless NGX will OCR, index, and store data about your documents so they are easy to search and view, unlike that hulking metal file cabinet you have in your office.
+
+![Paperless-ngx Screenshot](../images/paperless-ngx.png){ loading=lazy }
+
+!!! question "What's this fork 🍴 thing about, and is it Paperless, Paperless-NG, or Paperless-NGX?"
+    It's now.. Paperless-NGX. Paperless-ngx is a fork of paperless-ng, which itself was a fork of paperless. As I understand it, the original "forker" of paperless to paperless-ng has "gone dark", and [stopped communicating](https://github.com/jonaswinkler/paperless-ng/issues/1599), so while all are hopeful that he's OK and just busy/distracted, the [community formed paperless-ngx](https://github.com/jonaswinkler/paperless-ng/issues/1632) to carry on development work under a shared responsibility model. To save some typing though, we'll just call it "Paperless", although you'll note belowe that we're using the linuxserver paperless-ngx image. (Also, if you use the automated tooling in the Premix Repo, Ansible *really* doesn't like the hypen!)
+
+--8<-- "recipe-standard-ingredients.md"
+
+## Preparation
+
+### Setup data locations
+
+We'll need a folder to store a docker-compose configuration file and an associated environment file. If you're following my filesystem layout, create `/var/data/config/paperless` (*for the config*). We'll also need to create `/var/data/paperless` and a few subdirectories (*for the metadata*). Lastly, we need a directory for the database backups to reside in as well.
+
+```bash
+mkdir /var/data/config/paperless
+mkdir /var/data/paperless
+mkdir /var/data/paperless/consume
+mkdir /var/data/paperless/data
+mkdir /var/data/paperless/export
+mkdir /var/data/paperless/media
+mkdir /var/data/runtime/paperless/pgdata
+mkdir /var/data/paperless/database-dump
+```
+
+### Create environment
+
+To stay consistent with the other recipes, we'll create a file to store environment variables in. There's more than 1 service in this stack, but we'll only create one one environment file that will be used by the web server (more on this later).
+
+```bash
+cat << EOF > /var/data/config/paperless/paperless.env
+PAPERLESS_TIME_ZONE:<timezone>
+PAPERLESS_ADMIN_USER=<admin_user>
+PAPERLESS_ADMIN_PASSWORD=<admin_password>
+PAPERLESS_ADMIN_MAIL=<admin_email>
+PAPERLESS_REDIS=redis://broker:6379
+PAPERLESS_DBHOST=db
+PAPERLESS_TIKA_ENABLED=1
+PAPERLESS_TIKA_GOTENBERG_ENDPOINT=http://gotenberg:3000
+PAPERLESS_TIKA_ENDPOINT=http://tika:9998
+EOF
+```
+
+You'll need to replace some of the text in the snippet above:
+
+* `<timezone>` - Replace with an entry from [the timezone database](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) (eg: America/New_York)
+* `<admin_user>` - Username of the superuser account that will be created on first run. Without this and the *&lt;admin_password&gt;* you won't be able to log into Paperless
+* `<admin_password>` - Password of the superuser account above.
+* `<admin_email>` - Email address of the superuser account above.
+
+### Setup Docker Swarm
+
+Create a docker swarm config file in docker-compose syntax (v3), something like the following example:
+
+--8<-- "premix-cta.md"
+
+```yaml
+version: "3.2"
+services:
+  
+  broker:
+    image: redis:6.0
+    networks:
+      - internal
+
+  webserver:
+    image: linuxserver/paperless-ngx
+    env_file: paperless.env
+    volumes:
+      - /var/data/paperless/data:/usr/src/paperless/data
+      - /var/data/paperless/media:/usr/src/paperless/media
+      - /var/data/paperless/export:/usr/src/paperless/export
+      - /var/data/paperless/consume:/usr/src/paperless/consume
+    deploy:
+      replicas: 1
+      labels:
+        # traefik
+        - traefik.enable=true
+        - traefik.docker.network=traefik_public
+
+        # traefikv1
+        - traefik.frontend.rule=Host:paperless.example.com
+        - traefik.port=8000    
+        - traefik.frontend.auth.forward.address=http://traefik-forward-auth:4181
+        - traefik.frontend.auth.forward.authResponseHeaders=X-Forwarded-User
+        - traefik.frontend.auth.forward.trustForwardHeader=true        
+
+        # traefikv2
+        - "traefik.http.routers.paperless.rule=Host(`paperless.example.com`)"
+        - "traefik.http.routers.paperless.entrypoints=https"
+        - "traefik.http.services.paperless.loadbalancer.server.port=8000"
+        - "traefik.http.routers.paperless.middlewares=forward-auth"
+    networks:
+      - internal
+      - traefik_public
+
+  gotenberg:
+    image: thecodingmachine/gotenberg
+    environment:
+      DISABLE_GOOGLE_CHROME: 1
+    networks:
+      - internal
+
+  tika:
+    image: apache/tika
+    networks:
+      - internal
+
+  db:
+    image: postgres:13
+    volumes:
+      - /var/data/runtime/paperless/pgdata:/var/lib/postgresql/data
+    environment:
+      POSTGRES_DB: paperless
+      POSTGRES_USER: paperless
+      POSTGRES_PASSWORD: paperless
+    networks:
+      - internal
+  
+  db-backup:
+    image: postgres:13
+    volumes:
+      - /var/data/paperless/database-dump:/dump
+      - /etc/localtime:/etc/localtime:ro
+    environment:
+      PGHOST: db
+      PGDATABASE: paperless
+      PGUSER: paperless
+      PGPASSWORD: paperless
+      BACKUP_NUM_KEEP: 7
+      BACKUP_FREQUENCY: 1d
+    entrypoint: |
+      bash -c 'bash -s <<EOF
+      trap "break;exit" SIGHUP SIGINT SIGTERM
+      sleep 2m
+      while /bin/true; do
+        pg_dump -Fc > /dump/dump_\`date +%d-%m-%Y"_"%H_%M_%S\`.psql
+        (ls -t /dump/dump*.psql|head -n $$BACKUP_NUM_KEEP;ls /dump/dump*.psql)|sort|uniq -u|xargs rm -- {}
+        sleep $$BACKUP_FREQUENCY
+      done
+      EOF'
+    networks:
+    - internal
+
+networks:
+  traefik_public:
+    external: true
+  internal:
+    driver: overlay
+    ipam:
+      config:
+        - subnet: 172.16.58.0/24 
+
+```
+
+You'll notice that there are several items under "services" in this stack. Let's take a look at what each one does:
+
+* broker - Redis server that other services use to share data
+* webserver - The UI that you will use to add and view documents, edit document metadata, and configure the application settings.
+* gotenburg - Tool that facilitates converting MS Office documents, HTML, Markdown and other document types to PDF
+* tika - The OCR engine that extracts text from image-only documents
+* db - PostgreSQL database engine to store metadata for all the documents. [^2]
+* db-backup - Service to dump the PostgreSQL database to a backup file on disk once per day
+
+## Serving
+
+Launch the paperless stack by running ```docker stack deploy paperless -c <path -to-docker-compose.yml>```. You can then log in with the username and password that you specified in the environment variables file above.
+
+Head over to the [Paperless documentation](https://paperless-ng.readthedocs.io/en/latest) to see how to configure and use the application then revel in the fact you can now search all your scanned documents to to your heart's content.
+
+[^1]: Taken directly from [Paperless documentation](https://paperless-ng.readthedocs.io/en/latest)
+[^2]: This particular stack configuration was chosen because it includes a "real" database in PostgreSQL versus the more lightweight SQLite database. After all, if you go to the trouble of scanning and importing a pile of documents, you want to know the database is robust enough to keep your data safe.
+
+--8<-- "recipe-footer.md"