1
0
mirror of https://github.com/funkypenguin/geek-cookbook/ synced 2025-12-29 17:41:44 +00:00

Experiment with PDF generation

Signed-off-by: David Young <davidy@funkypenguin.co.nz>
This commit is contained in:
David Young
2022-08-19 16:40:53 +12:00
parent c051e0bdad
commit abf9309cb1
317 changed files with 124 additions and 546 deletions

View File

@@ -0,0 +1,182 @@
---
title: Run paperless-ngx under Docker
description: Easily index, search, and view archive all of your scanned dead-tree documents with Paperless NGX, under Docker, now using the linuxserver image since the fork from from paperless-ng to paperless-ngx!
---
# Paperless NGX
Paper is a nightmare. Environmental issues aside, theres no excuse for it in the 21st century. It takes up space, collects dust, doesnt support any form of a search feature, indexing is tedious, its heavy and prone to damage & loss. [^1] Paperless NGX will OCR, index, and store data about your documents so they are easy to search and view, unlike that hulking metal file cabinet you have in your office.
![Paperless-ngx Screenshot](../images/paperless-ngx.png){ loading=lazy }
!!! question "What's this fork 🍴 thing about, and is it Paperless, Paperless-NG, or Paperless-NGX?"
It's now.. Paperless-NGX. Paperless-ngx is a fork of paperless-ng, which itself was a fork of paperless. As I understand it, the original "forker" of paperless to paperless-ng has "gone dark", and [stopped communicating](https://github.com/jonaswinkler/paperless-ng/issues/1599), so while all are hopeful that he's OK and just busy/distracted, the [community formed paperless-ngx](https://github.com/jonaswinkler/paperless-ng/issues/1632) to carry on development work under a shared responsibility model. To save some typing though, we'll just call it "Paperless", although you'll note belowe that we're using the linuxserver paperless-ngx image. (Also, if you use the automated tooling in the Premix Repo, Ansible *really* doesn't like the hypen!)
--8<-- "recipe-standard-ingredients.md"
## Preparation
### Setup data locations
We'll need a folder to store a docker-compose configuration file and an associated environment file. If you're following my filesystem layout, create `/var/data/config/paperless` (*for the config*). We'll also need to create `/var/data/paperless` and a few subdirectories (*for the metadata*). Lastly, we need a directory for the database backups to reside in as well.
```bash
mkdir /var/data/config/paperless
mkdir /var/data/paperless
mkdir /var/data/paperless/consume
mkdir /var/data/paperless/data
mkdir /var/data/paperless/export
mkdir /var/data/paperless/media
mkdir /var/data/runtime/paperless/pgdata
mkdir /var/data/paperless/database-dump
```
### Create environment
To stay consistent with the other recipes, we'll create a file to store environment variables in. There's more than 1 service in this stack, but we'll only create one one environment file that will be used by the web server (more on this later).
```bash
cat << EOF > /var/data/config/paperless/paperless.env
PAPERLESS_TIME_ZONE:<timezone>
PAPERLESS_ADMIN_USER=<admin_user>
PAPERLESS_ADMIN_PASSWORD=<admin_password>
PAPERLESS_ADMIN_MAIL=<admin_email>
PAPERLESS_REDIS=redis://broker:6379
PAPERLESS_DBHOST=db
PAPERLESS_TIKA_ENABLED=1
PAPERLESS_TIKA_GOTENBERG_ENDPOINT=http://gotenberg:3000
PAPERLESS_TIKA_ENDPOINT=http://tika:9998
EOF
```
You'll need to replace some of the text in the snippet above:
* `<timezone>` - Replace with an entry from [the timezone database](https://en.wikipedia.org/wiki/List_of_tz_database_time_zones) (eg: America/New_York)
* `<admin_user>` - Username of the superuser account that will be created on first run. Without this and the *&lt;admin_password&gt;* you won't be able to log into Paperless
* `<admin_password>` - Password of the superuser account above.
* `<admin_email>` - Email address of the superuser account above.
### Setup Docker Swarm
Create a docker swarm config file in docker-compose syntax (v3), something like the following example:
--8<-- "premix-cta.md"
```yaml
version: "3.2"
services:
broker:
image: redis:6.0
networks:
- internal
webserver:
image: linuxserver/paperless-ngx
env_file: paperless.env
volumes:
- /var/data/paperless/data:/usr/src/paperless/data
- /var/data/paperless/media:/usr/src/paperless/media
- /var/data/paperless/export:/usr/src/paperless/export
- /var/data/paperless/consume:/usr/src/paperless/consume
deploy:
replicas: 1
labels:
# traefik
- traefik.enable=true
- traefik.docker.network=traefik_public
# traefikv1
- traefik.frontend.rule=Host:paperless.example.com
- traefik.port=8000
- traefik.frontend.auth.forward.address=http://traefik-forward-auth:4181
- traefik.frontend.auth.forward.authResponseHeaders=X-Forwarded-User
- traefik.frontend.auth.forward.trustForwardHeader=true
# traefikv2
- "traefik.http.routers.paperless.rule=Host(`paperless.example.com`)"
- "traefik.http.routers.paperless.entrypoints=https"
- "traefik.http.services.paperless.loadbalancer.server.port=8000"
- "traefik.http.routers.paperless.middlewares=forward-auth"
networks:
- internal
- traefik_public
gotenberg:
image: thecodingmachine/gotenberg
environment:
DISABLE_GOOGLE_CHROME: 1
networks:
- internal
tika:
image: apache/tika
networks:
- internal
db:
image: postgres:13
volumes:
- /var/data/runtime/paperless/pgdata:/var/lib/postgresql/data
environment:
POSTGRES_DB: paperless
POSTGRES_USER: paperless
POSTGRES_PASSWORD: paperless
networks:
- internal
db-backup:
image: postgres:13
volumes:
- /var/data/paperless/database-dump:/dump
- /etc/localtime:/etc/localtime:ro
environment:
PGHOST: db
PGDATABASE: paperless
PGUSER: paperless
PGPASSWORD: paperless
BACKUP_NUM_KEEP: 7
BACKUP_FREQUENCY: 1d
entrypoint: |
bash -c 'bash -s <<EOF
trap "break;exit" SIGHUP SIGINT SIGTERM
sleep 2m
while /bin/true; do
pg_dump -Fc > /dump/dump_\`date +%d-%m-%Y"_"%H_%M_%S\`.psql
(ls -t /dump/dump*.psql|head -n $$BACKUP_NUM_KEEP;ls /dump/dump*.psql)|sort|uniq -u|xargs rm -- {}
sleep $$BACKUP_FREQUENCY
done
EOF'
networks:
- internal
networks:
traefik_public:
external: true
internal:
driver: overlay
ipam:
config:
- subnet: 172.16.58.0/24
```
You'll notice that there are several items under "services" in this stack. Let's take a look at what each one does:
* broker - Redis server that other services use to share data
* webserver - The UI that you will use to add and view documents, edit document metadata, and configure the application settings.
* gotenburg - Tool that facilitates converting MS Office documents, HTML, Markdown and other document types to PDF
* tika - The OCR engine that extracts text from image-only documents
* db - PostgreSQL database engine to store metadata for all the documents. [^2]
* db-backup - Service to dump the PostgreSQL database to a backup file on disk once per day
## Serving
Launch the paperless stack by running ```docker stack deploy paperless -c <path -to-docker-compose.yml>```. You can then log in with the username and password that you specified in the environment variables file above.
Head over to the [Paperless documentation](https://paperless-ng.readthedocs.io/en/latest) to see how to configure and use the application then revel in the fact you can now search all your scanned documents to to your heart's content.
[^1]: Taken directly from [Paperless documentation](https://paperless-ng.readthedocs.io/en/latest)
[^2]: This particular stack configuration was chosen because it includes a "real" database in PostgreSQL versus the more lightweight SQLite database. After all, if you go to the trouble of scanning and importing a pile of documents, you want to know the database is robust enough to keep your data safe.
--8<-- "recipe-footer.md"