my-nixos-infrastructure

NixOSInfrastructure as CodeSystem Configuration
Image 1

NixOS Infrastructure as Code - Replicable Multi-System Infrastructure

Situation

Managing multiple NixOS systems (desktop, laptop, servers, WSL) required a systematic approach to ensure complete reproducibility, instant recovery, and consistent environments across all machines. Traditional system management approaches led to configuration drift, "works on my machine" issues, and time-consuming manual setup when deploying to new systems or recovering from failures.

The challenge was compounded by the need to support different hardware configurations (different GPUs, monitors, input devices), different use cases (personal, work, server), maintain backward compatibility during refactoring, ensure documentation stays synchronized with code changes, and maintain system reproducibility across all machines. Without Infrastructure as Code, system configurations would diverge, making it impossible to replicate identical systems or recover quickly from failures.

Additionally, as documentation grew to 100+ files, finding relevant information and maintaining accuracy became increasingly difficult. NixOS commands are complex and require being in specific directories, making common operations tedious and error-prone. The installation process was complex with many manual steps, creating opportunities for errors and inconsistencies.

Task

The goal was to engineer comprehensive Infrastructure as Code repository implementing declarative, version-controlled system management for multiple NixOS systems. The solution needed to:

  • Define entire operating system as code—from kernel modules to applications—ensuring complete reproducibility
  • Enable instant recovery and consistent environments across all machines
  • Implement profile system for multi-system management without code duplication
  • Create unified automation tooling for common operations
  • Build comprehensive installation automation eliminating manual steps
  • Establish security-first architecture with centralized defaults
  • Create documentation infrastructure with Router/Catalog system for efficient navigation

Success criteria included: identical system rebuilds across machines, 9+ profiles from single codebase with minimal duplication, unified command interface for common operations, automated installation process, consistent security posture across all systems, and efficient documentation navigation.

Action

Infrastructure as Code Architecture

Implemented comprehensive Infrastructure as Code with entire operating system defined declaratively:

  • 161 Nix Configuration Files: Defining complete system state from kernel modules to applications
  • Version-Controlled in Git: All configurations tracked in Git repository ensuring complete reproducibility
  • Flake-Based Configuration: Using flake lock ensuring dependency reproducibility across machines and time
  • Modular Organization: System modules (system/*/*.nix) for hardware/security/services, user modules (user/*/*.nix) for applications/environments

Profile System for Multi-System Management

Created profile system enabling 9+ different system configurations from single codebase:

  • Centralized Defaults: lib/defaults.nix providing common defaults for all profiles
  • Profile-Specific Overrides: Profile configs override only what's different, minimizing duplication
  • Recursive Merging: lib/flake-base.nix handling recursive merging of defaults and profile-specific configs
  • Flake Lock Reproducibility: Ensures identical system rebuilds across machines and time
  • Result: 9+ profiles (desktop, laptop, servers, WSL) from single codebase with minimal duplication

Unified Automation Tooling

Created phoenix wrapper script providing unified interface for common operations:

  • System-Wide Command: Installed as system package available from any directory
  • Unified Interface: phoenix sync, phoenix update, phoenix upgrade, phoenix gc
  • Eliminates Complexity: No need to remember complex Nix commands or be in specific directories
  • Error Reduction: Unified interface reduces errors from incorrect command usage

Comprehensive Installation Automation

Built comprehensive install.sh script (1000+ lines) automating entire installation process:

  • Interactive and Silent Modes: Supports both interactive setup and automated deployment
  • Profile Validation: Validates profile selection before proceeding
  • Flake File Copying: Handles flake file copying to correct locations
  • Hardware Detection: Automatically detects hardware configuration
  • SSH Key Generation: Generates SSH keys for LUKS unlock
  • Docker Container Management: Handles Docker container setup and configuration
  • Boot Mode Detection: Detects boot mode (UEFI/BIOS) and configures appropriately
  • System Rebuild: Automates complete system rebuild process
  • Result: Eliminates all manual steps and errors from installation process

Security-First Architecture

Implemented security-first infrastructure with consistent security configurations:

  • Centralized Security Defaults: lib/defaults.nix with secure defaults (firewall enabled, SSH keys required)
  • Security Modules: 17 security modules in system/security/ directory
  • Explicit Overrides Required: Profile configs can override but must explicitly disable security features
  • Declarative Security: Security settings declarative and version-controlled
  • Consistent Security Posture: Ensures consistent security posture across all systems
  • Security Features: Firewall, LUKS encryption, SSH, fail2ban, polkit, application sandboxing

Documentation Router/Catalog System

Created Router/Catalog system for efficient documentation navigation:

  • Auto-Generated Indexes: Python script extracts YAML frontmatter metadata and generates compact Router table
  • Router-First Retrieval Protocol: Enables efficient context scoping for AI agents and humans
  • 103 Markdown Documentation Files: Comprehensive infrastructure knowledge management
  • Prevents Documentation Drift: Auto-generated indexes stay synchronized with documentation
  • Efficient Navigation: Enables finding relevant information without loading all documentation

Theming System Integration

Implemented theming across different window managers:

  • Stylix Integration: For Sway/Hyprland/XMonad with system-wide theming
  • Plasma6 Theming: Separate theming via environment isolation
  • Theme Containment: Ensures no conflicts between different theming systems
  • Dynamic Theme Switching: Via Stylix refresh
  • 55+ Base16 Themes: Supported for consistent visual experience

Technical Implementation Details

Observability/Operations:

  • Maintenance script logs to maintenance.log with rotation (max 3 files, 1MB threshold)
  • Systemd service logs for backup operations
  • NixOS system generations provide rollback capability and change history
  • Git history tracks all configuration changes
  • Documentation Router/Catalog provides visibility into system structure

Infrastructure Scale: 161 Nix files, 9+ profiles, 79 automation scripts, 103 documentation files, 579 Git commits

Result

Infrastructure as Code Achievement

161 Nix configuration files define entire operating system, version-controlled in Git repository, ensuring complete reproducibility and instant system recovery across 9+ profiles. The declarative approach enables rebuilding any system from scratch with identical configuration, eliminating "works on my machine" issues.

Multi-System Management Success

Profile system enables 9+ different system configurations (desktop, laptop, servers, WSL) from single codebase with minimal duplication. Flake lock ensures identical systems across machines and time, demonstrating true Infrastructure as Code reproducibility.

Automation Infrastructure Impact

Unified phoenix command interface eliminates complex Nix commands, and comprehensive install.sh script (1000+ lines) automates installation process. 79 automation scripts for installation, synchronization, and maintenance reduce operational overhead and eliminate manual errors.

Security Architecture Excellence

17 security modules with centralized defaults (firewall, LUKS encryption, SSH, fail2ban, polkit, application sandboxing) ensure consistent security posture across all systems. Security-first approach with explicit overrides required prevents accidental security degradation.

Documentation Infrastructure Success

103 markdown documentation files with Router/Catalog system enable efficient navigation. Auto-generated indexes prevent documentation drift, and Router-first retrieval protocol enables efficient context scoping for AI agents, demonstrating systematic knowledge management.

Technical Stack

NixOS, Nix Flakes, Home Manager, systemd, Git, Python (documentation generator), Stylix, Restic, LUKS, Firejail, Polkit, fail2ban

Conclusion

The NixOS Infrastructure as Code project exemplifies replicable & resilient systems engineering through comprehensive Infrastructure as Code. By defining the entire operating system declaratively and version-controlling all configurations, the project enables instant system recovery and complete reproducibility across 9+ profiles. The holistic "full-stack" infrastructure view spans from kernel modules to applications, demonstrating deep understanding of system architecture. The security-first approach with centralized defaults ensures consistent security posture, while the documentation Router/Catalog system showcases systematic knowledge management. This infrastructure investment—doing it right the first time—eliminates technical debt and accelerates future growth, transforming system management from manual, error-prone processes into automated, replicable infrastructure. The project demonstrates how Infrastructure as Code principles can be applied at the operating system level, creating truly replicable and resilient systems.