System Architecture
Overview of the collabrains.eu infrastructure architecture and how components interact.
High-Level Architecture
┌─────────────────────────────────────────────────────────────┐
│ Internet / Users │
└────────────────────┬────────────────────────────────────────┘
│
┌──────────▼──────────┐
│ Hetzner VPS │
│ (FSN1, 8-core, 16GB)│
└──────────┬──────────┘
│
┌────────────▼────────────┐
│ Traefik v3 Reverse │
│ Proxy (80/443/8080) │
│ + Let's Encrypt SSL │
└───┬──────────┬──────┬───┘
│ │ │
┌─────▼──┐ ┌────▼─┐ ┌▼─────────┐
│Services│ │DBs │ │Monitoring│
│(Docker)│ │(PG) │ │(Prom/ │
│ │ │ │ │ Grafana) │
└────────┘ └──────┘ └──────────┘
Component Layers
1. Reverse Proxy Layer
Traefik v3 - Entry points: 80 (HTTP), 443 (HTTPS), 8080 (admin) - Routes traffic to backend services - Automatic SSL via Let's Encrypt ACME - Middleware: gzip compression, HTTPS redirect - Load balancing (not needed for single server)
2. Application Layer
Docker Containers (managed by systemd)
| Service | Port | Stack | Purpose |
|---|---|---|---|
| Immich | 3001 | Node.js + PostgreSQL | Photo management |
| n8n | 5678 | Node.js + PostgreSQL + Redis | Workflow automation |
| Grist | 8089 | Node.js + PostgreSQL + Redis | Spreadsheets |
| Authentik | 9000 | Django + PostgreSQL | SSO / Identity |
| Jitsi Meet | 8000 | Jitsi stack + UDP 10000 | Video conferencing |
| Jellyfin | 8096 | .NET | Media streaming |
| Paperless-NGX | 8000 | Django + PostgreSQL + Redis | Document management |
| Grafana | 3000 | Grafana + PostgreSQL | Monitoring UI |
| Actual Budget | 3002 | Node.js | Finance tracking |
| Open WebUI + Ollama | 8111 / 11434 | Python | LLM interface |
| Pocket ID | 3000 | Node.js + PostgreSQL | OIDC provider |
| Coolify | 8000 | Node.js | Infrastructure management |
3. Data Layer
PostgreSQL (v14 & v16) - Multiple instances per deployment pattern - Authentik: 1 instance - Immich: 1 instance (+ VectorChord) - n8n: 1 instance - Grist: 1 instance - Grafana: 1 instance - Paperless: shared/separate instance - Pocket ID: 1 instance - Jellyfin: 1 instance (optional)
Redis (v6 & v7.4) - n8n: Redis 6 (queues, sessions) - Grist: Redis 7.4 (session store) - Paperless: Redis 7.4 (celery tasks)
Storage
- Docker volumes for persistence
- /data/coolify/services/ for configs
- /backups/ for daily backups
4. Monitoring Layer
Prometheus - Scrapes metrics from: - Node Exporter (system metrics) - cAdvisor (container metrics) - Services (if they expose /metrics)
Grafana - Visualizes Prometheus data - Dashboards for: - System CPU, RAM, disk - Container stats - Service-specific metrics
Network Topology
Docker Networks
Each service group has its own Docker network:
Service Group 1 (Immich)
├─ immich-server
├─ immich-microservices
└─ postgres-IMMICH_ID
(network: IMMICH_NET)
Service Group 2 (n8n)
├─ n8n-main
├─ n8n-worker
├─ n8n-task-runner
├─ postgres-N8N_ID
└─ redis-N8N_ID
(network: N8N_NET)
Service Group 3 (Grist)
├─ grist
├─ postgres-GRIST_ID
└─ redis-GRIST_ID
(network: GRIST_NET)
...etc (each service isolated)
External Access
User → HTTP (80) / HTTPS (443) → Traefik
↓
Service Docker Network
↓
Application Container
Data Flow Examples
Upload Photo to Immich
User Browser
↓
HTTPS (Traefik)
↓
immich-server:3001
↓
PostgreSQL (with VectorChord)
↓
Docker Volume (immich-media)
Run Workflow in n8n
Webhook / Trigger
↓
n8n-main (receives event)
↓
Redis queue
↓
n8n-worker (executes)
↓
PostgreSQL (stores execution log)
↓
External APIs / Services
Document Processing in Paperless
PDF Upload / Email
↓
paperless-consumer
↓
Redis task queue
↓
Tesseract OCR
↓
PostgreSQL (metadata)
↓
Docker Volume (documents)
Resource Allocation
CPU
- 8-core AMD EPYC
- No CPU limits per container (shared)
- Burstable workloads: Immich AI, Paperless OCR
Memory
- 16GB RAM
- 8GB swap (for OOM protection)
- Per-container limits can be set in docker-compose.yml
Storage
- Root partition:
/(OS + Docker) - Data partition:
/data/coolify/(services) - Backup:
/backups/(daily snapshots)
High Availability Considerations
Current Setup: Single-server (no HA)
For HA, would require: - Multiple servers - Shared storage (NAS/SAN) - Load balancer (instead of Traefik) - PostgreSQL replication - Redis clustering
Current Limitations: - Single point of failure (server hardware) - Mitigation: Daily backups, Hetzner hardware support
Security Architecture
Perimeter Security
- UFW firewall: Only ports 22, 80, 443 open
- SSH key-based auth (no password login)
- Fail2ban (optional)
Application Security
- Authentik SSO for identity
- HTTPS everywhere (Let's Encrypt)
- Service isolation via Docker networks
Data Security
- Database passwords in
.envfiles - Redis passwords in docker-compose.yml
- Secrets stored in environment variables
- Daily encrypted backups (location:
/backups/)
Scaling Considerations
Vertical Scaling (current server)
- Upgrade RAM (16GB → 32GB)
- Add swap (8GB → 16GB)
- Increase CPU resources
Horizontal Scaling (future)
- Multiple PostgreSQL instances (with replication)
- Redis cluster
- Separate monitoring server
- Dedicated backup server
- Load balancer (HAProxy/Nginx)
Dependencies Graph
Traefik (reverse proxy)
├─ Immich
├─ n8n
│ ├─ PostgreSQL
│ └─ Redis
├─ Grist
│ ├─ PostgreSQL
│ └─ Redis
├─ Authentik
│ └─ PostgreSQL
├─ Paperless
│ ├─ PostgreSQL
│ └─ Redis
├─ Jellyfin
├─ Jitsi
├─ Grafana
│ └─ PostgreSQL
├─ Actual Budget
├─ Open WebUI + Ollama
└─ Pocket ID
└─ PostgreSQL
Monitoring Stack
├─ Prometheus
│ ├─ Node Exporter
│ ├─ cAdvisor
│ └─ [service metrics]
└─ Grafana
└─ PostgreSQL
Backup Architecture
Services
↓ (Daily 2 AM UTC)
backup-collabrains.sh
↓
├─ pg_dump (PostgreSQL databases)
├─ tar.gz (Docker volumes)
└─ tar.gz (configs)
↓
/backups/YYYY-MM-DD/
↓ (30-day retention)
Auto-cleanup (mtime +30)
Disaster Recovery RTO/RPO
| Component | RTO | RPO |
|---|---|---|
| Single database | 30 min | 24 hours |
| Single service | 5 min | 24 hours |
| Entire server | 4 hours | 24 hours |
| Server + off-site backup | 1 day | 7 days |
(RTO = Recovery Time Objective, RPO = Recovery Point Objective)
Future Architecture Improvements
- Distributed backups: Off-site backup to external storage
- Monitoring alerts: Slack/email notifications
- Service mesh: Better inter-service communication
- Caching layer: Redis for frequently accessed data
- CDN: For static assets (photos, media)
- Database replication: Read replicas for scaling