Skip to content

System Architecture

Overview of the collabrains.eu infrastructure architecture and how components interact.

High-Level Architecture

┌─────────────────────────────────────────────────────────────┐
│                        Internet / Users                       │
└────────────────────┬────────────────────────────────────────┘
                     │
          ┌──────────▼──────────┐
          │    Hetzner VPS      │
          │ (FSN1, 8-core, 16GB)│
          └──────────┬──────────┘
                     │
        ┌────────────▼────────────┐
        │  Traefik v3 Reverse     │
        │  Proxy (80/443/8080)    │
        │  + Let's Encrypt SSL    │
        └───┬──────────┬──────┬───┘
            │          │      │
      ┌─────▼──┐  ┌────▼─┐  ┌▼─────────┐
      │Services│  │DBs   │  │Monitoring│
      │(Docker)│  │(PG)  │  │(Prom/   │
      │        │  │      │  │ Grafana) │
      └────────┘  └──────┘  └──────────┘

Component Layers

1. Reverse Proxy Layer

Traefik v3 - Entry points: 80 (HTTP), 443 (HTTPS), 8080 (admin) - Routes traffic to backend services - Automatic SSL via Let's Encrypt ACME - Middleware: gzip compression, HTTPS redirect - Load balancing (not needed for single server)

2. Application Layer

Docker Containers (managed by systemd)

Service Port Stack Purpose
Immich 3001 Node.js + PostgreSQL Photo management
n8n 5678 Node.js + PostgreSQL + Redis Workflow automation
Grist 8089 Node.js + PostgreSQL + Redis Spreadsheets
Authentik 9000 Django + PostgreSQL SSO / Identity
Jitsi Meet 8000 Jitsi stack + UDP 10000 Video conferencing
Jellyfin 8096 .NET Media streaming
Paperless-NGX 8000 Django + PostgreSQL + Redis Document management
Grafana 3000 Grafana + PostgreSQL Monitoring UI
Actual Budget 3002 Node.js Finance tracking
Open WebUI + Ollama 8111 / 11434 Python LLM interface
Pocket ID 3000 Node.js + PostgreSQL OIDC provider
Coolify 8000 Node.js Infrastructure management

3. Data Layer

PostgreSQL (v14 & v16) - Multiple instances per deployment pattern - Authentik: 1 instance - Immich: 1 instance (+ VectorChord) - n8n: 1 instance - Grist: 1 instance - Grafana: 1 instance - Paperless: shared/separate instance - Pocket ID: 1 instance - Jellyfin: 1 instance (optional)

Redis (v6 & v7.4) - n8n: Redis 6 (queues, sessions) - Grist: Redis 7.4 (session store) - Paperless: Redis 7.4 (celery tasks)

Storage - Docker volumes for persistence - /data/coolify/services/ for configs - /backups/ for daily backups

4. Monitoring Layer

Prometheus - Scrapes metrics from: - Node Exporter (system metrics) - cAdvisor (container metrics) - Services (if they expose /metrics)

Grafana - Visualizes Prometheus data - Dashboards for: - System CPU, RAM, disk - Container stats - Service-specific metrics

Network Topology

Docker Networks

Each service group has its own Docker network:

Service Group 1 (Immich)
├─ immich-server
├─ immich-microservices
└─ postgres-IMMICH_ID
   (network: IMMICH_NET)

Service Group 2 (n8n)
├─ n8n-main
├─ n8n-worker
├─ n8n-task-runner
├─ postgres-N8N_ID
└─ redis-N8N_ID
   (network: N8N_NET)

Service Group 3 (Grist)
├─ grist
├─ postgres-GRIST_ID
└─ redis-GRIST_ID
   (network: GRIST_NET)

...etc (each service isolated)

External Access

User → HTTP (80) / HTTPS (443) → Traefik
                                    ↓
                           Service Docker Network
                                    ↓
                          Application Container

Data Flow Examples

Upload Photo to Immich

User Browser
    ↓
HTTPS (Traefik)
    ↓
immich-server:3001
    ↓
PostgreSQL (with VectorChord)
    ↓
Docker Volume (immich-media)

Run Workflow in n8n

Webhook / Trigger
    ↓
n8n-main (receives event)
    ↓
Redis queue
    ↓
n8n-worker (executes)
    ↓
PostgreSQL (stores execution log)
    ↓
External APIs / Services

Document Processing in Paperless

PDF Upload / Email
    ↓
paperless-consumer
    ↓
Redis task queue
    ↓
Tesseract OCR
    ↓
PostgreSQL (metadata)
    ↓
Docker Volume (documents)

Resource Allocation

CPU

  • 8-core AMD EPYC
  • No CPU limits per container (shared)
  • Burstable workloads: Immich AI, Paperless OCR

Memory

  • 16GB RAM
  • 8GB swap (for OOM protection)
  • Per-container limits can be set in docker-compose.yml

Storage

  • Root partition: / (OS + Docker)
  • Data partition: /data/coolify/ (services)
  • Backup: /backups/ (daily snapshots)

High Availability Considerations

Current Setup: Single-server (no HA)

For HA, would require: - Multiple servers - Shared storage (NAS/SAN) - Load balancer (instead of Traefik) - PostgreSQL replication - Redis clustering

Current Limitations: - Single point of failure (server hardware) - Mitigation: Daily backups, Hetzner hardware support

Security Architecture

Perimeter Security

  • UFW firewall: Only ports 22, 80, 443 open
  • SSH key-based auth (no password login)
  • Fail2ban (optional)

Application Security

  • Authentik SSO for identity
  • HTTPS everywhere (Let's Encrypt)
  • Service isolation via Docker networks

Data Security

  • Database passwords in .env files
  • Redis passwords in docker-compose.yml
  • Secrets stored in environment variables
  • Daily encrypted backups (location: /backups/)

Scaling Considerations

Vertical Scaling (current server)

  • Upgrade RAM (16GB → 32GB)
  • Add swap (8GB → 16GB)
  • Increase CPU resources

Horizontal Scaling (future)

  • Multiple PostgreSQL instances (with replication)
  • Redis cluster
  • Separate monitoring server
  • Dedicated backup server
  • Load balancer (HAProxy/Nginx)

Dependencies Graph

Traefik (reverse proxy)
  ├─ Immich
  ├─ n8n
  │  ├─ PostgreSQL
  │  └─ Redis
  ├─ Grist
  │  ├─ PostgreSQL
  │  └─ Redis
  ├─ Authentik
  │  └─ PostgreSQL
  ├─ Paperless
  │  ├─ PostgreSQL
  │  └─ Redis
  ├─ Jellyfin
  ├─ Jitsi
  ├─ Grafana
  │  └─ PostgreSQL
  ├─ Actual Budget
  ├─ Open WebUI + Ollama
  └─ Pocket ID
     └─ PostgreSQL

Monitoring Stack
  ├─ Prometheus
  │  ├─ Node Exporter
  │  ├─ cAdvisor
  │  └─ [service metrics]
  └─ Grafana
     └─ PostgreSQL

Backup Architecture

Services
    ↓ (Daily 2 AM UTC)
backup-collabrains.sh
    ↓
  ├─ pg_dump (PostgreSQL databases)
  ├─ tar.gz (Docker volumes)
  └─ tar.gz (configs)
    ↓
/backups/YYYY-MM-DD/
    ↓ (30-day retention)
Auto-cleanup (mtime +30)

Disaster Recovery RTO/RPO

Component RTO RPO
Single database 30 min 24 hours
Single service 5 min 24 hours
Entire server 4 hours 24 hours
Server + off-site backup 1 day 7 days

(RTO = Recovery Time Objective, RPO = Recovery Point Objective)

Future Architecture Improvements

  1. Distributed backups: Off-site backup to external storage
  2. Monitoring alerts: Slack/email notifications
  3. Service mesh: Better inter-service communication
  4. Caching layer: Redis for frequently accessed data
  5. CDN: For static assets (photos, media)
  6. Database replication: Read replicas for scaling