System Architecture

Overview of the collabrains.eu infrastructure architecture and how components interact.

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                        Internet / Users                       │
└────────────────────┬────────────────────────────────────────┘
                     │
          ┌──────────▼──────────┐
          │    Hetzner VPS      │
          │ (FSN1, 8-core, 16GB)│
          └──────────┬──────────┘
                     │
        ┌────────────▼────────────┐
        │  Traefik v3 Reverse     │
        │  Proxy (80/443/8080)    │
        │  + Let's Encrypt SSL    │
        └───┬──────────┬──────┬───┘
            │          │      │
      ┌─────▼──┐  ┌────▼─┐  ┌▼─────────┐
      │Services│  │DBs   │  │Monitoring│
      │(Docker)│  │(PG)  │  │(Prom/   │
      │        │  │      │  │ Grafana) │
      └────────┘  └──────┘  └──────────┘

Component Layers

1. Reverse Proxy Layer

Traefik v3 - Entry points: 80 (HTTP), 443 (HTTPS), 8080 (admin) - Routes traffic to backend services - Automatic SSL via Let's Encrypt ACME - Middleware: gzip compression, HTTPS redirect - Load balancing (not needed for single server)

2. Application Layer

Docker Containers (managed by systemd)

Service	Port	Stack	Purpose
Immich	3001	Node.js + PostgreSQL	Photo management
n8n	5678	Node.js + PostgreSQL + Redis	Workflow automation
Grist	8089	Node.js + PostgreSQL + Redis	Spreadsheets
Authentik	9000	Django + PostgreSQL	SSO / Identity
Jitsi Meet	8000	Jitsi stack + UDP 10000	Video conferencing
Jellyfin	8096	.NET	Media streaming
Paperless-NGX	8000	Django + PostgreSQL + Redis	Document management
Grafana	3000	Grafana + PostgreSQL	Monitoring UI
Actual Budget	3002	Node.js	Finance tracking
Open WebUI + Ollama	8111 / 11434	Python	LLM interface
Pocket ID	3000	Node.js + PostgreSQL	OIDC provider
Coolify	8000	Node.js	Infrastructure management

3. Data Layer

PostgreSQL (v14 & v16) - Multiple instances per deployment pattern - Authentik: 1 instance - Immich: 1 instance (+ VectorChord) - n8n: 1 instance - Grist: 1 instance - Grafana: 1 instance - Paperless: shared/separate instance - Pocket ID: 1 instance - Jellyfin: 1 instance (optional)

Redis (v6 & v7.4) - n8n: Redis 6 (queues, sessions) - Grist: Redis 7.4 (session store) - Paperless: Redis 7.4 (celery tasks)

Storage - Docker volumes for persistence - /data/coolify/services/ for configs - /backups/ for daily backups

4. Monitoring Layer

Prometheus - Scrapes metrics from: - Node Exporter (system metrics) - cAdvisor (container metrics) - Services (if they expose /metrics)

Grafana - Visualizes Prometheus data - Dashboards for: - System CPU, RAM, disk - Container stats - Service-specific metrics

Network Topology

Docker Networks

Each service group has its own Docker network:

Service Group 1 (Immich)
├─ immich-server
├─ immich-microservices
└─ postgres-IMMICH_ID
   (network: IMMICH_NET)

Service Group 2 (n8n)
├─ n8n-main
├─ n8n-worker
├─ n8n-task-runner
├─ postgres-N8N_ID
└─ redis-N8N_ID
   (network: N8N_NET)

Service Group 3 (Grist)
├─ grist
├─ postgres-GRIST_ID
└─ redis-GRIST_ID
   (network: GRIST_NET)

...etc (each service isolated)

External Access

User → HTTP (80) / HTTPS (443) → Traefik
                                    ↓
                           Service Docker Network
                                    ↓
                          Application Container

Data Flow Examples

Upload Photo to Immich

User Browser
    ↓
HTTPS (Traefik)
    ↓
immich-server:3001
    ↓
PostgreSQL (with VectorChord)
    ↓
Docker Volume (immich-media)

Run Workflow in n8n

Webhook / Trigger
    ↓
n8n-main (receives event)
    ↓
Redis queue
    ↓
n8n-worker (executes)
    ↓
PostgreSQL (stores execution log)
    ↓
External APIs / Services

Document Processing in Paperless

PDF Upload / Email
    ↓
paperless-consumer
    ↓
Redis task queue
    ↓
Tesseract OCR
    ↓
PostgreSQL (metadata)
    ↓
Docker Volume (documents)

Resource Allocation

CPU

8-core AMD EPYC
No CPU limits per container (shared)
Burstable workloads: Immich AI, Paperless OCR

Memory

16GB RAM
8GB swap (for OOM protection)
Per-container limits can be set in docker-compose.yml

Storage

Root partition: / (OS + Docker)
Data partition: /data/coolify/ (services)
Backup: /backups/ (daily snapshots)

High Availability Considerations

Current Setup: Single-server (no HA)

For HA, would require: - Multiple servers - Shared storage (NAS/SAN) - Load balancer (instead of Traefik) - PostgreSQL replication - Redis clustering

Current Limitations: - Single point of failure (server hardware) - Mitigation: Daily backups, Hetzner hardware support

Security Architecture

Perimeter Security

UFW firewall: Only ports 22, 80, 443 open
SSH key-based auth (no password login)
Fail2ban (optional)

Application Security

Authentik SSO for identity
HTTPS everywhere (Let's Encrypt)
Service isolation via Docker networks

Data Security

Database passwords in .env files
Redis passwords in docker-compose.yml
Secrets stored in environment variables
Daily encrypted backups (location: /backups/)

Scaling Considerations

Vertical Scaling (current server)

Upgrade RAM (16GB → 32GB)
Add swap (8GB → 16GB)
Increase CPU resources

Horizontal Scaling (future)

Multiple PostgreSQL instances (with replication)
Redis cluster
Separate monitoring server
Dedicated backup server
Load balancer (HAProxy/Nginx)

Dependencies Graph

Traefik (reverse proxy)
  ├─ Immich
  ├─ n8n
  │  ├─ PostgreSQL
  │  └─ Redis
  ├─ Grist
  │  ├─ PostgreSQL
  │  └─ Redis
  ├─ Authentik
  │  └─ PostgreSQL
  ├─ Paperless
  │  ├─ PostgreSQL
  │  └─ Redis
  ├─ Jellyfin
  ├─ Jitsi
  ├─ Grafana
  │  └─ PostgreSQL
  ├─ Actual Budget
  ├─ Open WebUI + Ollama
  └─ Pocket ID
     └─ PostgreSQL

Monitoring Stack
  ├─ Prometheus
  │  ├─ Node Exporter
  │  ├─ cAdvisor
  │  └─ [service metrics]
  └─ Grafana
     └─ PostgreSQL

Backup Architecture

Services
    ↓ (Daily 2 AM UTC)
backup-collabrains.sh
    ↓
  ├─ pg_dump (PostgreSQL databases)
  ├─ tar.gz (Docker volumes)
  └─ tar.gz (configs)
    ↓
/backups/YYYY-MM-DD/
    ↓ (30-day retention)
Auto-cleanup (mtime +30)

Disaster Recovery RTO/RPO

Component	RTO	RPO
Single database	30 min	24 hours
Single service	5 min	24 hours
Entire server	4 hours	24 hours
Server + off-site backup	1 day	7 days

(RTO = Recovery Time Objective, RPO = Recovery Point Objective)

Future Architecture Improvements

Distributed backups: Off-site backup to external storage
Monitoring alerts: Slack/email notifications
Service mesh: Better inter-service communication
Caching layer: Redis for frequently accessed data
CDN: For static assets (photos, media)
Database replication: Read replicas for scaling