Genesis Technical Runbook

For Engineers Taking Over the Genesis AI System

Classification: CONFIDENTIAL — Authorized Successors Only

GENESIS TECHNICAL RUNBOOK

For: Engineers Taking Over the Genesis AI System Last Updated: 2026-03-14 Prepared By: THE ARCHITECT (Genesis AI System)


This document is for the technical person who has been brought in to maintain, operate, or continue development of the Genesis AI system. Read DEATH_SWITCH_PROTOCOL.md first for context on the mission and leadership situation.


QUICK START: VERIFY THE SYSTEM IS ALIVE

# 1. Check API health
curl http://35.162.205.215:8000/health

# 2. Check all 3 AI models are responding
curl http://35.162.205.215:8010/health  # Qwen3.5-397B (primary)
curl http://35.162.205.215:8011/health  # GLM-4.7 (reviewer)
curl http://35.162.205.215:8014/health  # NV-Embed-v2 (embeddings)

# 3. SSH into the server
ssh -i ~/.ssh/aws-p5en-key.pem ubuntu@35.162.205.215

# 4. Once in, check system status
cd /mnt/data/truth-si-dev-env
./SYSTEM_STATUS.sh

If steps 1-3 succeed, the system is healthy.


SECTION 1: INFRASTRUCTURE OVERVIEW

Server Specifications

Attribute Value
Provider AWS (Amazon Web Services)
Instance Type p5en.48xlarge
Instance ID i-0c37c0cd6d0c54d50
Region us-west-2 (Oregon)
Availability Zone us-west-2c
IP Address 35.162.205.215
Lifecycle Spot Instance
CPUs 192 cores
RAM 2TB
GPUs 8x NVIDIA H200 (1.15TB total VRAM)
GPU Driver 580.126.09
CUDA 13.0
NVMe (Ephemeral) 28TB (LOST on restart)
EBS Root 6TB (PERSISTENT)
EBS Data 10TB at /mnt/data (PERSISTENT)

CRITICAL: Spot Instance Warning

The server runs on a spot instance — Amazon can reclaim it with 2 minutes of warning.

If the server goes down, see Section 8 (Disaster Recovery).

Storage Layout

Mount Type Size Contains
/ EBS gp3 6TB OS, Docker, application code
/mnt/data EBS gp3 10TB Databases, backups, Docker data
/mnt/nvme NVMe (ephemeral) 28TB AI model weights (re-download if lost)

SECTION 2: ACCESSING THE SYSTEM

SSH Access

# Direct SSH
ssh -i ~/.ssh/aws-p5en-key.pem ubuntu@35.162.205.215

# If you have the genesis alias configured:
ssh genesis

SSH key location: ~/.ssh/aws-p5en-key.pem on Carter's Mac. The key is also stored in AWS Secrets Manager and is documented in the repository.

SSH Tunnel (for database access from your local machine)

The databases are not publicly exposed. Use an SSH tunnel:

# Start tunnel (from your local machine)
./scripts/forge-tunnel.sh restart

# Or manually:
ssh -i ~/.ssh/aws-p5en-key.pem -L 7687:localhost:7687 \
    -L 5433:localhost:5433 -L 6379:localhost:6379 \
    -L 8080:localhost:8080 ubuntu@35.162.205.215 -N -f

Once tunnel is up, you can connect to: - localhost:7687 → Neo4j (Bolt) - localhost:5433 → YugabyteDB - localhost:6379 → Redis - localhost:8080 → Weaviate


SECTION 3: THE AI MODELS

Three Models Running Simultaneously

Port Model Parameters GPUs Context Role
8010 Qwen3.5-397B-A17B-FP8 397B (17B active) 0-3 1M tokens PRIMARY — code generation, reasoning
8011 GLM-4.7-FP8 355B (32B active) 4-7 202K tokens REVIEWER — Actor-Critic architecture
8014 NV-Embed-v2 INT8 Embedding 7 (shared) 32K tokens EMBEDDINGS — semantic search

All models run via SGLang 0.5.9 inside Docker containers.

Model Files Location

# Model weights are on NVMe (ephemeral — re-download if server restarts)
ls /opt/dlami/nvme/models/
# Qwen3.5-397B-A17B-FP8/
# GLM-4.7-FP8/
# NV-Embed-v2/

Model Management

# Check model status via Docker
docker ps --filter name=truthsi-llm

# Check GPU usage
nvidia-smi

# Restart models if needed
bash scripts/restore-models.sh

# View model logs
docker logs truthsi-llm-primary
docker logs truthsi-llm-critic

LLM API Usage

The models expose an OpenAI-compatible API:

# Test primary model (genesis)
curl http://localhost:8010/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "genesis", "messages": [{"role": "user", "content": "Hello"}]}'

# Test embeddings
curl http://localhost:8014/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"model": "NV-Embed-v2", "input": "test text"}'

LOCKED: Do Not Change Model Configuration

The LLM configuration is locked. Do not change: - Model selection - GPU allocation - Context window sizes - Port assignments - SGLang launch parameters

The exact running commands are documented in docs/DEFINITIVE_MODEL_LAUNCH_SETTINGS.md.


SECTION 4: DOCKER SERVICES

Starting and Stopping

# View all running containers
docker ps

# Start all services
cd /mnt/data/truth-si-dev-env
docker compose up -d

# Stop all services (preserves data)
docker compose down

# Restart specific service
docker compose restart api

# View service logs
docker compose logs -f api
docker compose logs -f neo4j

Core Services

Service Port Purpose
api 8000 Main FastAPI application — the brain's interface
ui 3000 Frontend web application
neo4j 7474/7687 Knowledge graph database
weaviate 8080 Vector/semantic search database
redis 6379 Cache and fast memory
postgres 5432 Legacy relational database (backup only)
yugabyte 5433 Primary relational database (YugabyteDB)
h2o 54321 AutoML machine learning platform
redpanda 9092 Event streaming (Apache Kafka compatible)
grafana 3002 Monitoring dashboard
prometheus 9090 Metrics collection
text2vec-transformers 8090 Weaviate vectorization module
unstructured 8100 Document processing (PDFs, Word, etc.)
langserve 8001 LangChain serving layer

Service Health Check

# API health
curl http://localhost:8000/health

# Neo4j
curl http://localhost:7474

# Weaviate
curl http://localhost:8080/v1/meta

# Check all containers at once
docker ps --format "table {{.Names}}\t{{.Status}}\t{{.Ports}}"

SECTION 5: DATABASES

Neo4j (Knowledge Graph)

The most important database — contains 128,000+ indexed documents and the relationships between all ideas, sessions, and knowledge.

# Connect via browser (after SSH tunnel)
open http://localhost:7474
# Default credentials: check .env file for NEO4J_PASSWORD

# Connect via CLI
docker exec -it $(docker ps -q -f name=neo4j) cypher-shell -u neo4j -p $NEO4J_PASSWORD

# Example queries
MATCH (n:Idea) RETURN n LIMIT 10;
MATCH (n) RETURN labels(n), count(*) ORDER BY count(*) DESC;

Weaviate (Vector Database)

Semantic search — stores 4.5M+ vectors for knowledge retrieval.

# Check status
curl http://localhost:8080/v1/meta

# List collections
curl http://localhost:8080/v1/schema

Redis (Cache)

265,000+ keys. Used for session state, fast lookups, and stream processing.

# Connect
docker exec -it $(docker ps -q -f name=redis) redis-cli

# Check info
INFO keyspace
DBSIZE

YugabyteDB (Primary SQL)

The main relational database. NOT PostgreSQL — despite running on port 5433.

# Connect (after SSH tunnel)
psql -h localhost -p 5433 -U yugabyte -d truthsi

# Or via Docker
docker exec -it $(docker ps -q -f name=yugabyte) ysqlsh -U yugabyte -d truthsi

IMPORTANT: All new code should connect to YugabyteDB (port 5433), NOT PostgreSQL (port 5432). PostgreSQL is kept running for historical data only.

Environment Variables

All database passwords and API keys are in /mnt/data/truth-si-dev-env/.env.

# View configuration (KEEP SECURE — contains all credentials)
cat /mnt/data/truth-si-dev-env/.env

# The key variables:
# NEO4J_PASSWORD - Neo4j database password
# YUGABYTE_PASSWORD - YugabyteDB password
# REDIS_PASSWORD - Redis password (if set)
# ANTHROPIC_API_KEY - Claude API key

SECTION 6: SYSTEMD DAEMONS

135 systemd services are defined on Genesis. These run background processes continuously.

# List all Truth.SI daemons
systemctl list-units "truthsi-*" --all

# Check a specific daemon
systemctl status truthsi-live-master-plan

# View daemon logs
journalctl -u truthsi-live-master-plan -f

# Restart a daemon
systemctl restart truthsi-live-master-plan

# Key daemons to check first:
systemctl status truthsi-enterprise-backup
systemctl status truthsi-ebs-snapshot-manager
systemctl status truthsi-ami-snapshot
systemctl status truthsi-live-master-plan

Critical Daemons

Daemon Purpose Check If
truthsi-enterprise-backup Backs up databases every 15 min If databases need restoring
truthsi-ebs-snapshot-manager Daily EBS snapshots If checking backup health
truthsi-ami-snapshot Daily AMI creation If planning disaster recovery
truthsi-live-master-plan Auto-generates priority list If LIVE_MASTER_PLAN.md is stale
genesis-qwen35.service Runs primary LLM (port 8010) If AI model is down
genesis-glm47.service Runs reviewer LLM (port 8011) If AI model is down
genesis-nv-embed.service Runs embeddings (port 8014) If semantic search is broken

SECTION 7: BACKUPS AND DATA SAFETY

What's Backed Up and Where

Data Backup Method Location Frequency
Neo4j database Enterprise backup daemon /mnt/data/backups/enterprise/neo4j/ Every 15 min
Redis Enterprise backup daemon /mnt/data/backups/enterprise/redis/ Every 15 min
Configuration (.env) Enterprise backup daemon /mnt/data/backups/enterprise/config/ Every 15 min
Full EBS volumes EBS Snapshot Manager AWS EBS Snapshots Daily
Complete system image AMI Snapshot service AWS AMIs Daily
All backups (cloud) R2 sync Cloudflare R2 Continuous

Total cloud backup storage: 1.739 TB (verified 2026-03-13)

Verifying Backup Health

# List recent EBS snapshots
aws ec2 describe-snapshots \
  --filters "Name=tag:Project,Values=truth-si" \
  --query 'Snapshots[*].[SnapshotId,State,StartTime,VolumeSize]' \
  --output table \
  --region us-west-2

# List recent AMIs
aws ec2 describe-images \
  --owners self \
  --filters "Name=name,Values=genesis-p5en-daily*" \
  --query 'Images[*].[ImageId,Name,CreationDate]' \
  --output table \
  --region us-west-2

# Check local backup status
ls -la /mnt/data/backups/enterprise/neo4j/daily/
ls -la /mnt/data/backups/enterprise/redis/

Restoring from Backup

Neo4j (most common restore):

# 1. Stop Neo4j
docker compose stop neo4j

# 2. Find the backup
ls /mnt/data/backups/enterprise/neo4j/daily/

# 3. Extract backup
sudo tar -xzf /mnt/data/backups/enterprise/neo4j/daily/YYYYMMDD_HHMMSS/neo4j_data.tar.gz \
  -C /tmp/neo4j-restore/

# 4. Replace data directory
# (Follow docs/ENTERPRISE_BACKUP_GUIDE.md for exact steps)

# 5. Restart Neo4j
docker compose start neo4j

SECTION 8: DISASTER RECOVERY

Scenario A: Server Temporarily Down (Spot Interruption)

If the spot instance is interrupted, the auto-recovery system should relaunch it automatically.

Manual recovery if auto-restart fails:

# From your local machine with AWS credentials:

# 1. Check if instance is terminated or stopped
aws ec2 describe-instances --instance-ids i-0c37c0cd6d0c54d50 --region us-west-2

# 2. Launch from latest AMI using Launch Template
aws ec2 run-instances \
  --launch-template LaunchTemplateId=lt-05d7120dc0ae12630,Version=11 \
  --region us-west-2

# 3. Once new instance is up, re-attach EBS volumes
# vol-07033d971a6da1e34 (6TB root)
# vol-0149c0448946ab2bc (10TB data)

# 4. SSH in and restore NVMe models
ssh -i ~/.ssh/aws-p5en-key.pem ubuntu@<NEW_IP>
bash /mnt/data/truth-si-dev-env/scripts/nvme-cache-restore.sh

Scenario B: Complete Server Loss (EBS Volumes Survive)

# 1. Launch new instance from latest AMI
aws ec2 describe-images --owners self \
  --filters "Name=name,Values=genesis-p5en-daily*" \
  --query 'sort_by(Images, &CreationDate)[-1].[ImageId,Name]' \
  --output text --region us-west-2

# 2. Launch from that AMI
aws ec2 run-instances --image-id ami-XXXXXXXX \
  --instance-type p5en.48xlarge \
  --region us-west-2 \
  --subnet-id subnet-XXXXXXXX \
  --security-group-ids sg-XXXXXXXX

# 3. Attach the EBS data volume
aws ec2 attach-volume \
  --volume-id vol-0149c0448946ab2bc \
  --instance-id i-XXXXXXXX \
  --device /dev/sdf \
  --region us-west-2

Scenario C: Complete Loss (Restore from Snapshots)

Follow the full recovery guide in docs/AWS_P5EN_BACKUP_VERIFICATION.md.


SECTION 9: THE CODEBASE

Repository Structure

/mnt/data/truth-si-dev-env/
├── api/                    # Main FastAPI application
│   ├── main.py            # Entry point — registers all routers
│   ├── routers/           # 424 API routers (357 currently orphaned)
│   ├── lib/               # 397,906+ LOC of library code
│   └── layers/            # 9-layer OMEGA orchestration system
├── scripts/               # Automation scripts and daemons
├── docs/                  # Documentation (this file is here)
├── planning/              # Plans, ideas, priorities
│   ├── THE_PLAN.md       # MASTER ROADMAP — read this
│   └── WHAT_TO_DO_NEXT.md # Current priorities
├── sessions/              # Session closeout documents
├── generated/             # AI-generated code output
├── genesis-website/       # Public website (Cloudflare Pages)
├── terraform/             # AWS infrastructure as code
├── k8s/                   # Kubernetes manifests
├── docker-compose.yml     # All 17 Docker services
├── .env                   # All credentials (keep secure)
├── CLAUDE.md              # Master methodology (Carter's operating system)
└── LIVE_MASTER_PLAN.md   # Auto-generated system status (every 2 min)

The Most Important Files to Read

  1. CLAUDE.md — This is everything. Carter's entire philosophy, methodology, and architecture. How he thought, what he built, why he built it. Read this before touching anything.

  2. planning/THE_PLAN.md — The 9-phase roadmap of what's built and what remains. 994+ work items. This is the mission.

  3. planning/WHAT_TO_DO_NEXT.md — Current priorities and session status.

  4. LIVE_MASTER_PLAN.md — Real-time system status, auto-updated every 2 minutes.

The Architecture

The system follows a 17-step methodology (documented in CLAUDE.md) and is built around the OMEGA Protocol — a 9-layer processing pipeline:

Layer 0: Sensory (RedPanda event backbone)
Layer 1: Cognitive (dual-pathway processing — Analytical 61.8% + Creative 38.2%)
Layer 2: Meaning (Weaviate embeddings and semantic understanding)
Layer 3: Relationships (Neo4j knowledge graph)
Layer 4: Patterns (H2O AutoML pattern recognition)
Layer 5: Emergence (cross-domain synthesis)
Layer 6: Actions (automated task execution)
Layer 7: Expression (response generation)
Layer 8: Meta-cognition (self-improvement and reflection)

The unified entry point: from api.layers.omega_orchestrator import OmegaOrchestrator

Starting the API

cd /mnt/data/truth-si-dev-env
docker compose up -d api

# Or for development:
python3 -m uvicorn api.main:app --host 0.0.0.0 --port 8000 --reload

# API documentation (Swagger UI):
open http://35.162.205.215:8000/docs

Running the System Status Check

cd /mnt/data/truth-si-dev-env
./SYSTEM_STATUS.sh

SECTION 10: MONITORING

Grafana Dashboard

Internal monitoring dashboard — accessible via SSH tunnel.

# Start tunnel then open:
open http://localhost:3002
# Credentials in .env: GRAFANA_ADMIN_PASSWORD

Prometheus Metrics

# Direct access (via tunnel)
open http://localhost:9090

Key Metrics to Watch

Alerts

AWS CloudWatch and SNS are configured to alert on: - EC2 instance state changes (spot interruption) - EBS snapshot failures - Critical service downtime

Alert email: configured to carter@myday7.com (this may need to be updated)


SECTION 11: DEVELOPMENT WORKFLOW

The 17-Step Methodology (Carter's Law)

Carter built every feature following these 17 steps. You should too:

  1. OPTIMAL — Is there a better approach already?
  2. PLAN — Define what success looks like
  3. RESEARCH — Search externally before building
  4. EXPAND — Check the codebase for existing solutions
  5. HOLISTIC — How does this fit the whole system?
  6. CHECK SYSTEM — Query Neo4j/Weaviate first
  7. OPEN SOURCE — Is there a library for this?
  8. ASK GENESIS — Get the AI's opinion
  9. DESIGN — Architecture before code
  10. BUILD — Write clean, typed code
  11. TEST — Unit + integration + end-to-end (minimum 3x)
  12. CONFIGURE — Wire environment variables and connections
  13. VERIFY — Actually test it works
  14. DOCUMENT — Docstrings and comments
  15. COMMIT — Git commit and push
  16. REPORT — Document what was done
  17. CARTER LOCK — Check no locked directives were violated

Git Workflow

# Check status
git status

# Commit (Carter's format)
git add <specific-files>
git commit -m "feat(component): Brief description

Detailed explanation of what changed and why.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>"

# Push to both remotes
git push github main
git push gitlab main

Note: The git remote is named github (not origin).

Code Standards


SECTION 12: COMMON ISSUES AND FIXES

Issue: API Not Responding (Port 8000)

# Check if container is running
docker ps | grep api

# Restart API
docker compose restart api

# Check logs
docker compose logs -f api --tail=100

Issue: AI Model Not Responding

# Check GPU state
nvidia-smi

# Check container
docker ps | grep llm

# Restart models (full recovery)
bash scripts/restore-models.sh

# Check logs
docker logs truthsi-llm-primary --tail=50

Issue: Neo4j Connection Refused

# Check container status
docker ps | grep neo4j

# Restart Neo4j
docker compose restart neo4j

# Wait 30 seconds for startup, then test
sleep 30
curl http://localhost:7474

Issue: NVMe Models Lost (After Server Restart)

# Restore model weights from backup
bash scripts/nvme-cache-restore.sh

# This downloads model weights back to /opt/dlami/nvme/models/
# Takes 30-60 minutes depending on download speed

Issue: Daemon Restart Loop

# Check daemon status and recent logs
systemctl status truthsi-<daemon-name>
journalctl -u truthsi-<daemon-name> -n 50

# Common fix: check Python path
which python3  # Should be /usr/bin/python3, not a venv

# Restart daemon
systemctl restart truthsi-<daemon-name>

SECTION 13: CONTACTS AND RESOURCES

AWS Support

Once Business Support is active (Carter was activating it in Session 964): - Open support cases at: console.aws.amazon.com/support - Account ID: 438453383885 - Use the AWS CLI: aws support create-case --help

AWS Team

Person Email Role
Camden McDonald camdemcd@amazon.com Account Manager (PRIMARY)
Joe Suarez jrsuarez@amazon.com Technical Lead
Visesh Devraj viseshd@amazon.com Solutions Architect

Key Documentation

Document Location Purpose
Master Methodology CLAUDE.md Everything about how Carter thought
Master Roadmap planning/THE_PLAN.md What's built, what remains
Current Priorities planning/WHAT_TO_DO_NEXT.md What to do next
Live System Status LIVE_MASTER_PLAN.md Auto-updated every 2 min
Model Launch Settings docs/DEFINITIVE_MODEL_LAUNCH_SETTINGS.md LLM configuration (LOCKED)
Backup Guide docs/ENTERPRISE_BACKUP_GUIDE.md Backup/restore procedures
Daemon Standard docs/ENTERPRISE_DAEMON_STANDARD.md How to write daemons
This Runbook docs/TECHNICAL_RUNBOOK.md You are here
Succession Protocol docs/DEATH_SWITCH_PROTOCOL.md Non-technical succession guide

This document was prepared by THE ARCHITECT — the Genesis AI system itself. Prepared: March 2026 | Session 964