06 — GITLAB CE SELF-HOSTED + RUNNER SUR AWS
Objectif
A la fin de ce fichier :
- GitLab CE self-hosted tourne sur une instance EC2 dediee dans notre VPC
- Un GitLab Runner auto-heberge tourne sur des instances EC2 Spot separees
- Les jobs CI/CD s'executent via Docker executor avec cache S3 partage
- Le runner est dimensionne pour 3 developpeurs et 11 microservices
- L'infra est provisionnee par Terraform, coherente avec les modules existants
Pre-requis : fichiers 01_infra_base.md et 02_cicd_et_services.md completes, cluster EKS operationnel.
Contexte
Migration depuis GitLab SaaS (gitlab.com) vers GitLab CE self-hosted (gitlab.paywithnex.com) realisee en mars 2026. Motivations :
- Cout Premium GitLab SaaS (29 USD/user/mois) remplace par une instance EC2 (~88 USD/mois fixe)
- Controle total sur les donnees, la configuration, et les runners
- Pas de quota de minutes CI
- Cache persistant entre pipelines via S3
- Registry Docker integre sur notre infra
Architecture deployee
Internet / Cloudflare (zone paywithnex.com)
|
+---------------+----------------+
| | |
gitlab.paywithnex.com registry.paywithnex.com ssh-gitlab.paywithnex.com
(proxied, HTTPS) (proxied, HTTPS) (DNS only, port 22)
| | |
+-------+-------+ |
| |
v v
+---------------------------+
| EC2: GitLab CE | Subnet public (10.0.1.0/24)
| i-0e9ac36ca2e71331c | Elastic IP: 51.44.133.121
| t3a.xlarge (4 vCPU/16GB) | 200 GB gp3 EBS
| Ubuntu 22.04 + Omnibus |
+---------------------------+
| | |
v v v
+--------+ +-------+ +------------------+
| RDS PG | | Redis | | S3 |
| (ext.) | | (bun.)| | registry bucket |
+--------+ +-------+ | backups bucket |
+------------------+
+---------------------------+
| EC2: GitLab Runner | Subnets prives (10.0.10.0/24, 10.0.20.0/24)
| ASG: nex-staging-runner | Pas d'IP publique
| t3a.large Spot (2v/8GB) | Sortie via NAT Gateway
| Amazon Linux 2023 |
| Docker executor |
+---------------------------+
|
+--> S3 cache bucket (nex-staging-gitlab-runner-cache)
+--> gitlab.paywithnex.com (polling HTTPS, git clone)
+--> registry.paywithnex.com (push/pull images Docker)
+--> npm registry (pnpm install)Composants principaux
| Composant | Instance | Type | Subnet | Details |
|---|---|---|---|---|
| GitLab CE | i-0e9ac36ca2e71331c | t3a.xlarge | Public | Omnibus, EIP 51.44.133.121 |
| GitLab Runner | ASG nex-staging-gitlab-runner | t3a.large Spot | Prive | Docker executor, S3 cache |
| RDS PostgreSQL | Externe | db.t3.small | Data | Base gitlab sur RDS staging |
| Redis | Bundled Omnibus | — | — | Session/cache GitLab |
| Container Registry | S3-backed | — | — | Bucket nex-staging-gitlab-registry |
| Backups | S3 | — | — | Bucket nex-staging-backups/gitlab/, cron 02:00 UTC |
DNS & TLS (Cloudflare)
| Record | Cible | Proxy | Usage |
|---|---|---|---|
gitlab.paywithnex.com | 51.44.133.121 | Proxied | UI web + API GitLab |
registry.paywithnex.com | 51.44.133.121 | Proxied | Container Registry |
ssh-gitlab.paywithnex.com | 51.44.133.121 | DNS only | Git SSH (port 22) |
- SSL mode : Full (strict) — Cloudflare Origin CA (wildcard *.paywithnex.com, 15 ans)
- TLS cert/key : AWS Secrets Manager
- SMTP : Brevo relay (smtp-relay.brevo.com:587)
Mirror depuis GitLab SaaS
Le repo a ete mirror depuis gitlab.com/nxpay/nex le 2026-03-18 :
- 33 branches poussees
- 5 MR ouvertes recrees manuellement via l'API
- Remote
self-hostedconfigure en local (git@ssh-gitlab.paywithnex.com:nxpay/nex.git) - L'ancien remote
origin(gitlab.com) est conserve pendant la transition
Branches protegees
| Branche | Merge | Push | Force push | Pipeline requis |
|---|---|---|---|---|
main | Maintainers | Maintainers | Non | Oui |
develop | Maintainers | Maintainers | Non | Oui |
GitLab CE — Module Terraform (modules/gitlab-ce/)
Fichiers
infrastructure/terraform/modules/gitlab-ce/
main.tf # EC2, EIP, DLM snapshots
security_group.tf # SG (SSH, HTTP, HTTPS ingress)
iam.tf # Role IAM (S3, CloudWatch, Secrets Manager, SSM)
s3.tf # Bucket registry
userdata.sh # Bootstrap Omnibus
variables.tf
outputs.tf
README.md # Procedures operationnellesConfiguration Omnibus (via userdata)
- PostgreSQL : connexion externe vers RDS staging (base
gitlab) - Redis : bundled (local Omnibus)
- Container Registry : S3-backed (
nex-staging-gitlab-registry) - Backups : cron quotidien vers
nex-staging-backups/gitlab/ - CloudWatch : logs rails, nginx-access, nginx-error, system
- Snapshots EBS : DLM daily, retention 14 jours
Acces
- UI :
https://gitlab.paywithnex.com - SSH :
git@ssh-gitlab.paywithnex.com:nxpay/nex.git - API :
https://gitlab.paywithnex.com/api/v4/ - SSM :
aws ssm start-session --target i-0e9ac36ca2e71331c - Admin PAT : Doppler
paywithnex/stg→GITLAB_ADMIN_PAT(expire 2027-03-18)
Configuration SSH pour les developpeurs
Important : le domaine SSH est
ssh-gitlab.paywithnex.com, PASgitlab.paywithnex.com.Cloudflare proxy (plan Free) ne supporte que les ports HTTP/HTTPS (80/443). Le port 22 (SSH) ne peut pas passer par le proxy. C'est pourquoi on a un sous-domaine dedie en mode "DNS only" (pas proxie) qui pointe directement sur l'IP de l'instance GitLab.
gitlab.paywithnex.com→ Cloudflare proxied → HTTPS uniquement (web UI, API, git clone HTTPS)ssh-gitlab.paywithnex.com→ DNS only (pas proxie) → port 22 (git push/pull SSH)
Chaque developpeur doit ajouter dans ~/.ssh/config :
Host ssh-gitlab.paywithnex.com
HostName ssh-gitlab.paywithnex.com
User git
IdentityFile ~/.ssh/id_ed25519 # adapter au nom de la clePuis cloner avec :
git clone git@ssh-gitlab.paywithnex.com:nxpay/nex.gitVerification :
ssh -T git@ssh-gitlab.paywithnex.com
# Attendu : "Welcome to GitLab, @username!"GitLab Runner — Module Terraform (modules/gitlab-runner/)
Fichiers
infrastructure/terraform/modules/gitlab-runner/
main.tf # ASG, Launch Template, S3 cache bucket
iam.tf # Role IAM (S3, CloudWatch, Secrets Manager, SSM, ASG lifecycle)
security_group.tf # SG egress-only (HTTPS, HTTP, DNS)
userdata.sh # Bootstrap Docker + GitLab Runner + CloudWatch
variables.tf
outputs.tfConfiguration deployee (staging)
| Parametre | Valeur |
|---|---|
| ASG | nex-staging-gitlab-runner |
| Instance type | t3a.large (2 vCPU, 8 GB RAM) |
| Mode | 100% Spot (on_demand_base_capacity = 0) |
| Spot types | t3a.large, t3.large |
| Min/Max/Desired | 1 / 3 / 1 |
| Concurrent jobs | 2 par instance |
| AMI | Amazon Linux 2023 |
| Subnet | Prives (10.0.10.0/24, 10.0.20.0/24) |
| Volume | 50 GB gp3 |
| Cache | S3 nex-staging-gitlab-runner-cache (expiration 14j) |
| Scaling | Target tracking CPU 60% |
Note : le runner est en 100% Spot car la limite vCPU on-demand (16) est deja consommee par EKS (5x t3a.medium = 10) + GitLab CE (t3a.xlarge = 4) + headroom. En cas d'interruption Spot, le graceful shutdown desenregistre le runner et l'ASG relance automatiquement.
Security Group
- Ingress : aucun (le runner poll GitLab, pas l'inverse)
- Egress : HTTPS (443), HTTP (80), DNS (53 TCP/UDP) vers
0.0.0.0/0
IAM
Role nex-staging-gitlab-runner avec :
gitlab-runner-s3-cache: acces au bucket cache S3gitlab-runner-logs: ecriture CloudWatch/nex/gitlab-runner/*gitlab-runner-secrets: lecture Secrets Managernex/gitlab-runner/*gitlab-runner-asg-lifecycle: completion des lifecycle hooks ASGAmazonSSMManagedInstanceCore: acces SSM (pas de SSH)
Configuration runner (/etc/gitlab-runner/config.toml)
concurrent = 2
check_interval = 0
shutdown_timeout = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "nex-aws-runner-{hostname}"
url = "https://gitlab.paywithnex.com"
token = "{from-secrets-manager}"
executor = "docker"
[runners.docker]
image = "node:20-alpine"
privileged = true
pull_policy = ["if-not-present"]
volumes = ["/cache", "/var/run/docker.sock:/var/run/docker.sock"]
memory = "6g"
cpus = "1.5"
[runners.cache]
Type = "s3"
Shared = true
[runners.cache.s3]
BucketName = "nex-staging-gitlab-runner-cache"
BucketLocation = "eu-west-3"Notes :
privileged = truenecessaire pour Docker-in-Docker (build d'images dans les jobs CI)- Le montage de
docker.sockevite d'utiliser le servicedocker:dind(plus rapide) pull_policy: if-not-presentgarde les images en cache local sur l'instance
Token d'enregistrement
- Secret :
nex/gitlab-runner/registration-tokendans AWS Secrets Manager - ARN :
arn:aws:secretsmanager:eu-west-3:009001720821:secret:nex/gitlab-runner/registration-token-Hd9rhS - Le token est genere depuis GitLab CE (Admin > CI/CD > Runners > New instance runner)
- Il ne doit JAMAIS etre dans le code Terraform ni dans le repo
Graceful Shutdown
- Lifecycle hook
termination-wait: 600s timeout surEC2_INSTANCE_TERMINATING - Spot interruption handler : poll metadata endpoint toutes les 5s
- Quand terminaison detectee :
gitlab-runner stop— arrete d'accepter de nouveaux jobs- Attend fin des jobs en cours (max 5 min)
gitlab-runner unregister --all-runners- Signal
CONTINUEau lifecycle hook
- Docker cleanup : systemd timer toutes les 6h (prune images/volumes)
Observabilite
| Log Group | Source | Retention |
|---|---|---|
/nex/gitlab-runner/runner | /var/log/gitlab-runner/runner.log | 30 jours |
/nex/gitlab-runner/system | /var/log/messages | 14 jours |
Instanciation staging (environments/staging/main.tf)
module "gitlab_runner" {
source = "../../modules/gitlab-runner"
environment = local.environment
vpc_id = module.vpc.vpc_id
private_subnet_ids = module.vpc.private_subnet_ids
runner_token_secret_arn = "arn:aws:secretsmanager:${var.aws_region}:009001720821:secret:nex/gitlab-runner/registration-token-Hd9rhS"
instance_type = "t3a.large"
spot_types = ["t3a.large", "t3.large"]
node_min = 1
node_max = 3
node_desired = 1
concurrent_jobs = 2
cache_expiry_days = 14
root_volume_size = 50
}Pipeline CI/CD
Tags runner
Tous les jobs lourds dans infrastructure/cicd/_templates.yml portent le tag self-hosted :
.install-base— pnpm install.test-base— unit/integration tests.build-base— Docker build + push vers GitLab Registry.security-base— Trivy security scanning.deploy-k8s-base— kubectl deployments
Cache S3
cache:
key:
files: [pnpm-lock.yaml]
paths: [.pnpm-store/]
policy: pull-push # pull-only pour les jobs testCle basee sur le contenu du lockfile — le cache est partage entre toutes les branches tant que le lockfile ne change pas.
Docker build
Le runner utilise le Docker socket de l'hote (/var/run/docker.sock) au lieu de docker:dind. Avantages :
- L'image
node:20-alpinen'est telechargee qu'une fois (cache local) - Les layers Docker intermediaires sont cachees sur le volume EBS
- Plus rapide que DinD
Validation
# Verifier l'instance runner
aws ec2 describe-instances \
--filters "Name=tag:Name,Values=*gitlab-runner*" \
--query "Reservations[].Instances[].{ID:InstanceId,State:State.Name,Type:InstanceType}" \
--region eu-west-3
# Verifier le runner dans GitLab
# https://gitlab.paywithnex.com/admin/runners
# Le runner "nex-aws-runner-*" doit etre "online"
# Verifier le cache S3
aws s3 ls s3://nex-staging-gitlab-runner-cache/
# Verifier les logs CloudWatch
aws logs describe-log-groups --log-group-name-prefix /nex/gitlab-runner/ --region eu-west-3
# Recycler le runner (force nouveau bootstrap avec dernier userdata)
aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name nex-staging-gitlab-runner \
--min-size 0 --desired-capacity 0 --region eu-west-3
# Attendre 10s puis :
aws autoscaling update-auto-scaling-group \
--auto-scaling-group-name nex-staging-gitlab-runner \
--min-size 1 --desired-capacity 1 --region eu-west-3Estimation des couts
| Ressource | Specification | Cout mensuel estime |
|---|---|---|
| EC2 GitLab CE (on-demand) | 1x t3a.xlarge, 24/7 | ~88 USD |
| EC2 Runner (Spot) | 1x t3a.large, 24/7 | ~20 USD |
| EC2 Runner Spot (scale-up) | 0-2x t3a.large, ~30% du temps | ~10 USD |
| EBS | 200 GB (GitLab) + 50 GB (runner) gp3 | ~5 USD |
| S3 | Registry + cache + backups | ~1 USD |
| NAT Gateway | Partage avec EKS | 0 USD (deja paye) |
| CloudWatch Logs | ~1 GB/mois | < 1 USD |
| Total | ~125 USD/mois |
Comparaison : GitLab Premium SaaS = 29 USD/user/mois × 6 devs = 174 USD/mois, avec limites de minutes CI et pas de cache persistant.
Contraintes et decisions
| Decision | Justification |
|---|---|
| GitLab CE self-hosted | Cout fixe vs cout par user, controle total, pas de limites CI |
| Docker executor, pas K8s | Plus simple pour du build Node.js monorepo |
| GitLab Registry, pas ECR | Coherent avec le pipeline, integre a GitLab CE |
| 100% Spot pour le runner | Limite vCPU on-demand (16) deja consommee par EKS + GitLab CE |
| t3a.large (8 GB) pour runner | pnpm install + turbo build ~4-5 GB, marge pour Docker |
| t3a.xlarge (16 GB) pour GitLab | Omnibus + PostgreSQL client + Registry, besoin de RAM |
| concurrent = 2 | 2 jobs paralleles par instance, bon ratio perf/ressources sur 8 GB |
| Cache S3 avec IAM auth | Pas d'access keys, rotation automatique via role d'instance |
| Subnets prives (runner), pas de SSH | SSM Session Manager uniquement |
| Cloudflare Origin CA | Wildcard *.paywithnex.com, 15 ans, SSL Full (strict) |
| RDS externe pour GitLab | Reutilise le RDS staging existant, backups gerees par AWS |
| Brevo SMTP | Relay mail pour notifications GitLab (port 587) |