Skip to content

noahfarshad/ansible-windows-postdeploy

Repository files navigation

Ansible — Windows Post-Deploy Automation

Production Ansible playbook and role library that configures Windows VMs after Aria provisions them. Replicates the legacy vRO Windows workflow execution chain in idempotent Ansible roles.

This is the layer that runs after Aria clones the VM, Sysprep renames and regenerates the SID, and the vRO addDataDisksOnDeploy subscription attaches any additional disks. By the time this playbook runs, the VM is on the network with an IP from BlueCat and reachable from the control node.


Directory Layout

Ansible_Windows_PostDeploy/
├── README.md                                This file
├── production/                              Production playbook + 10 roles (in-scope)
│   ├── README.md                            Authoritative technical reference
│   ├── production/
│   │   ├── windows_postdeploy.yml           Master playbook
│   │   ├── requirements.yml                 Galaxy collection dependencies
│   │   └── group_vars/
│   │       └── windows.yml                  SSH/WinRM connection settings
│   ├── inventories/
│   │   └── Prod/
│   │       ├── hosts                        Live inventory (see note below)
│   │       ├── esxp-tw01-vc01/              TX vCenter inventory tree
│   │       ├── esxp-vw01-vc01/              VA vCenter inventory tree
│   │       └── host_vars/                   Per-host overrides
│   └── roles/                               The 10 production roles
│       ├── set_timezone/
│       ├── kms_activation/
│       ├── set_execution_policy/
│       ├── enable_rdp_firewall/
│       ├── security_hardening/
│       ├── registry_build_info/
│       ├── enable_rdp/
│       ├── firewall_local_policy/
│       ├── fleet_agent/
│       └── join_domain_prod/
└── legacy/                                  Older work preserved for reference
    ├── README.md                            Scope + maintenance status
    ├── devroles/roles/                      125 unmaintained roles
    ├── playbooks/                           Dev playbooks + inventories
    └── ...

The production tree is what the blueprint's Cloud_Ansible_Windows resource executes. The legacy tree is older tooling from prior projects — preserved for history and occasional reference but not part of the current VM deployment pipeline.


How This Fits Into the Deployment Pipeline

┌──────────────────────────────────────────────────────────────────┐
│  1. Blueprint deploys VM via vCenter clone                       │
│     (Cloud.vSphere.Machine)                                      │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│  2. Guest customization (aria-windows-postdeploy)                │
│     Sysprep regenerates SID, sets hostname, joins WORKGROUP      │
│     (NO domain join yet — Ansible handles that)                  │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│  3. compute.provision.post fires                                 │
│     vRO subscription → addDataDisksOnDeploy action               │
│     Attaches additional data disks via vCenter API               │
│     BLOCKING — Aria waits for this to complete                   │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│  4. Cloud_Ansible_Windows resource runs                          │
│     Aria SSHes to ansible.example.com                 │
│     Executes: ansible-playbook                                   │
│       /home/ansible/production/ansible/playbooks/                │
│       production/windows_postdeploy.yml                          │
│                                                                  │
│     Master playbook runs 10 roles in pre-domain-join order       │
│     All configuration happens BEFORE domain join (step 12)       │
│     GPO takes effect after join and may block later changes      │
└──────────────────────────────────────────────────────────────────┘
                              ↓
┌──────────────────────────────────────────────────────────────────┐
│  5. Deployment marked complete in Aria                           │
│     User sees the VM in their catalog with IP, gateway, name     │
└──────────────────────────────────────────────────────────────────┘

Master Playbook — windows_postdeploy.yml (v2.0.0)

Located at production/production/windows_postdeploy.yml. Orchestrates 10 roles in two phases.

Phase 1 — Pre-domain-join configuration (steps 1–11)

All changes that GPO might block must run before domain join. The playbook uses hosts: "{{ target_hosts | default('all') }}" so it honors --limit from the command line. Connection is SSH with PowerShell shell (target state) with WinRM NTLM/HTTPS fallback available via group_vars or extra vars.

Phase 2 — Domain join (step 12)

Runs join_domain_prod last. The role triggers a reboot on successful join. After this point, GPO takes effect and subsequent ansible runs against the VM would have to work within those constraints.

Conditional domain join

The domain join step is gated by when: join_domain_enabled | default(true) | bool. To skip:

ansible-playbook ... -e "join_domain_enabled=false"

This maps to the blueprint input joinDomain=WORKGROUP — when a user requests WORKGROUP, Aria passes join_domain_enabled=false as an extra var and the playbook runs steps 1–11 only.


The 10 Roles

Every role follows the desired_state dispatcher pattern:

roles/<name>/
├── defaults/main.yml           Default variables
├── tasks/
│   ├── main.yml                Dispatcher: include_tasks is_present.yml when desired_state == 'present'
│   ├── is_present.yml          Apply configuration (idempotent)
│   └── is_absent.yml           (fleet_agent only — uninstall path)
└── meta/, handlers/, vars/, files/   Role-standard directories

This matches the NGAS ansible-engineering-3.0.4 convention. Every role is idempotent — safe to re-run on a configured VM with no side effects.

1. set_timezone

Purpose: Set system timezone (default: UTC). vRO equivalent: tzutil /s UTC. Modules: ansible.windows.win_shell (read), community.windows.win_timezone (write). Default: win_timezone: "UTC". Idempotency: Reads [System.TimeZoneInfo]::Local.Id first; only writes when the current value differs. Why first: Timestamps in subsequent steps (registry build info, KMS logs) must use the right timezone.

2. kms_activation

Purpose: License the Windows OS via the corporate KMS server. vRO equivalent: slmgr /ipk <GVLK>, slmgr /skms 192.0.2.10:1688, slmgr /ato. Modules: ansible.windows.win_shell (all three slmgr calls). OS detection: Reads Win32_OperatingSystem.Caption, maps to one of three GVLKs:

OS GVLK
Windows Server 2025 Datacenter D764K-2NDRG-47T6Q-P8T8W-YP6DY
Windows Server 2022 Datacenter WX4NM-KYWYW-QJJR4-XV3QB-6VM33
Windows Server 2019 Datacenter WMDGN-G9PQG-XVVXX-R3X43-63DFG

KMS server: 192.0.2.10:1688 (kms.example.com). Idempotency: Checks slmgr /dli for "License Status: Licensed" and the current KMS server before re-applying. Skips cleanly on already-activated hosts. Adding a new OS version: edit defaults/main.yml → kms_keys, add the key, done.

3. set_execution_policy

Purpose: Set PowerShell execution policy to RemoteSigned at the LocalMachine scope so signed scripts can run. vRO equivalent: Set-ExecutionPolicy RemoteSigned. Modules: ansible.windows.win_shell. Default: win_execution_policy: "RemoteSigned". Idempotency: Reads Get-ExecutionPolicy -Scope LocalMachine and only writes if different. Why before hardening: The hardening scripts (apply<OS>LocalPolicy.ps1) are signed PowerShell and require this policy to run.

4. enable_rdp_firewall

Purpose: Enable the "Remote Desktop" firewall rule group (inbound TCP 3389). vRO equivalent: Enable-NetFirewallRule -DisplayGroup 'Remote Desktop'. Modules: ansible.windows.win_shell (check), community.windows.win_firewall_rule (enable). Idempotency: Enumerates rules in the "Remote Desktop" group, skips if all already enabled. Why before security_hardening: The hardening script locks down firewall defaults — RDP needs to be whitelisted first or admins lose access after reboot.

5. security_hardening

Purpose: Apply the Essential Coach OS-specific local policy hardening scripts. vRO equivalent: runs apply2025LocalPolicy.ps1, apply2022LocalPolicy.ps1, or apply2019LocalPolicy.ps1 depending on OS. Modules: ansible.windows.win_shell, ansible.windows.win_stat, ansible.windows.win_reg_stat. Script paths:

OS Path
2025 C:\ProgramData\Essential Coach\apply2025LocalPolicy.ps1
2022 C:\ProgramData\Essential Coach\apply2022LocalPolicy.ps1
2019 C:\windows\temp\hardening\apply2019LocalPolicy.ps1

Idempotency: Writes a registry marker at HKLM:\System\Essential Coach\Hardening\Applied_<OS> on success. Checks for this marker before re-running the script. Assumption: The hardening scripts are already present on the template — this role does not install them.

6. registry_build_info

Purpose: Stamp deployment metadata into the registry for later audit and reconciliation. vRO equivalent: writes to HKLM:\System\Essential Coach\Build Info. Modules: ansible.windows.win_regedit, ansible.windows.win_reg_stat. Registry values written:

Name Content
TicketNumber ServiceDesk ticket from blueprint input ciocTicket
CreatedDate ISO 8601 UTC timestamp (first write only)
CreatedBy Requester identity from Aria
SystemCode, Environment From blueprint inputs

Idempotency: First write populates CreatedDate; subsequent runs preserve it. Other fields can be updated (e.g., if the ticket changes). How the data arrives: Blueprint customProperties flow through to Ansible extra vars, which map to role defaults.

7. enable_rdp

Purpose: Enable the Remote Desktop service and Network Level Authentication. vRO equivalent: sets fDenyTSConnections = 0, UserAuthentication = 1. Modules: ansible.windows.win_regedit, ansible.windows.win_reg_stat. Registry writes:

Path Value Result
HKLM:\System\CurrentControlSet\Control\Terminal Server\fDenyTSConnections 0 RDP enabled
HKLM:\System\CurrentControlSet\Control\Terminal Server\WinStations\RDP-Tcp\UserAuthentication 1 NLA required

Idempotency: Both values are read before write. Why separate from enable_rdp_firewall: The firewall rule (step 4) opens the port; this role (step 7) enables the service. Both are needed — either alone is insufficient.

8. firewall_local_policy

Purpose: Allow local firewall rules to coexist with GPO-pushed rules. vRO equivalent: Set-NetFirewallProfile -AllowLocalFirewallRules True on Domain, Private, and Public profiles. Modules: ansible.windows.win_shell. Profiles affected: Domain, Private, Public — all three set to True. Idempotency: Enumerates the three profiles; skips if all already True. Why this matters: Without this, the first GPO refresh after domain join can wipe out every local firewall rule added in earlier steps (including the RDP rule from step 4). This role is the "belt-and-suspenders" that ensures local rules survive GPO.

9. fleet_agent

Purpose: Install the Fleet osquery agent for endpoint telemetry. Source: internal Essential Coach repository at https://repo.example.com/files/fleet/. Modules: ansible.windows.win_uri (version fetch), win_get_url (MSI download), win_package (install), win_file. Hostname exclusion (important): Skips VMs whose hostname matches fleet_exclude_prefixes — defaults to:

fleet_exclude_prefixes:
  - "AUTHD-"
  - "AUTHP-"

Domain controllers (AUTH-prefixed servers) don't run Fleet because the agent has compatibility issues with AD-specific services. Idempotency: Checks the MSI product ID against installed packages before re-downloading or re-installing. Also supports desired_state: absent for clean uninstall — the only role in the library with an is_absent.yml path. What happens on AUTH servers: Role logs the skip and returns fleet_should_install: false. All other steps (download, install, verify) are gated on this fact.

10. join_domain_prod

Purpose: Join the VM to Active Directory. Runs last. Modules: microsoft.ad.membership (from the microsoft.ad collection, requires >= 1.3.0). Defaults:

domain: "corp.example.com"
domainadmin_user: "svc-domain-join"
domain_ou_path: "OU=Build,OU=Patching,OU=Servers,OU=Corporate,DC=corp,DC=essential-coach,DC=com"
short_hostname: "{{ inventory_hostname_short }}"

Password: reads {{ domain_password }} from the vault (/home/ansible/vault.yml). Idempotency: Reads current Win32_ComputerSystem.PartOfDomain and domain name first. Only joins if the VM is not already in the target domain. Handles the case where a VM is already in a different domain (forces rejoin). Reboot: reboot: true on success — microsoft.ad.membership handles the full reboot + reconnect sequence automatically. Why last: After domain join, GPO applies and may lock down firewall, execution policy, registry, and service configuration. All 9 prior roles must complete first.


Execution Order Quick Reference

# Role vRO Equivalent Why This Position
1 set_timezone tzutil /s UTC Foundation for timestamps in later steps
2 kms_activation slmgr /ipk /skms /ato License OS before other activation-dependent steps
3 set_execution_policy Set-ExecutionPolicy RemoteSigned Required for hardening scripts to run
4 enable_rdp_firewall Enable-NetFirewallRule 'Remote Desktop' Whitelist RDP before hardening locks things down
5 security_hardening apply<OS>LocalPolicy.ps1 Applies local policy; must come after firewall allow
6 registry_build_info Write HKLM:\System\Essential Coach\Build Info Audit stamp; write before GPO can intervene
7 enable_rdp fDenyTSConnections=0, NLA=1 Enable the service (firewall port was opened in step 4)
8 firewall_local_policy AllowLocalFirewallRules True Protect local rules from GPO wipeout
9 fleet_agent (new — no vRO equivalent) Endpoint agent install, skips AUTH-prefixed VMs
10 join_domain_prod Add-Computer MUST BE LAST — GPO takes over after this

Collections and Dependencies

Required Ansible Collections

From production/production/requirements.yml:

collections:
  - name: ansible.windows
    version: ">=2.1.0"
  - name: community.windows
    version: ">=2.0.0"
  - name: microsoft.ad
    version: ">=1.3.0"
  - name: community.vmware
    version: ">=3.0.0"

Install on the control node:

ansible-galaxy collection install -r requirements.yml --force

Required Python Packages (control node)

# For SSH connection to Windows targets (target state)
pip3 install paramiko

# For WinRM connection (interim fallback)
pip3 install pywinrm requests-ntlm

Connection Architecture

The playbook supports two connection paths to Windows targets.

SSH with PowerShell shell (target state)

group_vars/windows.yml configures SSH as the default, requiring Win32-OpenSSH on the template image:

ansible_connection: ssh
ansible_shell_type: powershell
ansible_ssh_common_args: '-o StrictHostKeyChecking=no'

WinRM NTLM/HTTPS (interim fallback)

Commented section in group_vars/windows.yml or override via extra vars:

ansible-playbook ... -e "ansible_connection=winrm ansible_port=5986 \
  ansible_winrm_transport=ntlm ansible_winrm_server_cert_validation=ignore"

Used when a template doesn't yet have Win32-OpenSSH installed.

Authentication before domain join

Roles 1–9 run with the local Administrator account (passed through from Aria as windows_local_admin_password). Role 10 (domain join) uses that same local admin to authenticate to the VM, but supplies the domain service account (svc-domain-join) as the domain admin doing the join.

After reboot, the VM is domain-joined and further ansible runs would need a domain account — but by design, nothing runs against the VM after step 10 in the initial deploy.


Vault

Credentials are stored in /home/ansible/vault.yml (encrypted with ansible-vault).

Required vault variables

Variable Purpose
vault_windows_user Local admin username (pre-domain-join auth)
vault_windows_password Local admin password
vault_domain_password svc-domain-join password for the domain join
vault_vsphere_password srvc-vro password (only if vSphere operations are needed)

Rotation

ansible-vault edit /home/ansible/vault.yml

The local admin password must stay in sync with whatever the Aria Sysprep customization spec (aria-windows-postdeploy) sets during clone. If the Sysprep spec is updated to change the local admin password, vault_windows_password must be updated to match.


Usage

Full post-deploy against a single VM

ansible-playbook -i inventories/Prod/esxp-tw01-vc01/inventory \
  production/windows_postdeploy.yml \
  --limit newvm01 \
  -e "build_ticket_number=INC12345 build_created_by='John Doe'"

Dry run (check mode)

ansible-playbook -i inventories/Prod/esxp-tw01-vc01/inventory \
  production/windows_postdeploy.yml \
  --limit newvm01 --check

Skip domain join (config only)

ansible-playbook ... --limit newvm01 -e "join_domain_enabled=false"

Run specific step only (via tags)

ansible-playbook ... --limit newvm01 --tags kms_activation
ansible-playbook ... --limit newvm01 --tags security_hardening
ansible-playbook ... --limit newvm01 --tags "kms_activation,set_timezone"

Force WinRM (if SSH unavailable)

ansible-playbook ... --limit newvm01 \
  -e "ansible_connection=winrm ansible_port=5986 \
  ansible_winrm_transport=ntlm ansible_winrm_server_cert_validation=ignore"

Environment Reference

Resource TX VA
vCenter vcenter-tx01.example.com vcenter-va01.example.com
Aria vra.example.com (v8.18.1) same
Ansible Control Node ansible.example.com same
KMS Server kms.example.com (192.0.2.10:1688) same
AD Domain corp.example.com same
AD OU OU=Build,OU=Patching,OU=Servers,OU=Corporate,DC=corp,DC=essential-coach,DC=com same
DNS 192.0.2.53, 192.0.2.153 same

Common Modifications

Adding a new OS version (e.g., Windows Server 2026)

Three roles need OS detection updates:

  1. kms_activation/defaults/main.yml — add the GVLK to kms_keys
  2. kms_activation/tasks/is_present.yml — add the elseif branch in the detection block
  3. security_hardening/defaults/main.yml — add the hardening script path to hardening_scripts
  4. security_hardening/tasks/is_present.yml — add the elseif branch

Changing the KMS server

kms_activation/defaults/main.ymlkms_server: "new.server.fqdn:1688". No code changes needed.

Changing the domain

join_domain_prod/defaults/main.ymldomain, domainadmin_user, domain_ou_path. The vault variable vault_domain_password must also be updated to the corresponding service account's password.

Adding a new Fleet exclusion

fleet_agent/defaults/main.yml → append to fleet_exclude_prefixes:

fleet_exclude_prefixes:
  - "AUTHD-"
  - "AUTHP-"
  - "NEWP-"

Adding a new role to the playbook

  1. Create roles/<new_role>/ following the defaults/, tasks/main.yml (dispatcher), tasks/is_present.yml (logic) pattern
  2. Add the role to production/windows_postdeploy.yml in the correct position relative to the domain join
  3. Update this README's execution order table

Troubleshooting

"No packages available" errors on fleet_agent

The internal repo at repo.example.com isn't reachable. Check:

  1. Control node → repo network path
  2. Repo service status
  3. fleet_version_url returns a valid version string (try curl -I from control node)

KMS activation fails with 0xC004F074

KMS server isn't responding or the KMS count hasn't reached the activation threshold (requires 5 Windows clients / 25 servers). Check:

  1. KMS server reachability: Test-NetConnection 192.0.2.10 -Port 1688 from target
  2. KMS count on server: slmgr /dlv on the KMS host
  3. Sufficient network path (TCP 1688 must be open)

Domain join times out

svc-domain-join can't reach a domain controller, or credentials are wrong. Check:

  1. DNS from target VM → DC: Resolve-DnsName corp.example.com
  2. vault_domain_password matches current AD password for svc-domain-join
  3. The service account hasn't been locked out (AD audit log)
  4. Target OU exists and service account has "Create Computer Objects" right

Ansible can't connect to target VM

Determine which connection is being attempted first:

ansible-playbook ... --limit newvm01 -vvv 2>&1 | head -30

Look for ssh: connect to host... port 22 or winrm:. Fix per connection type:

  • SSH: verify Win32-OpenSSH is running on target (Get-Service sshd)
  • WinRM: verify winrm quickconfig was run on target, port 5986 is open

Playbook runs but skips the VM

The inventory file doesn't include the hostname, or the hostname case doesn't match. Inventory is case-sensitive when used with --limit.

GPO overwrites a configuration after domain join

Expected behavior — that's why domain join is last. If this happens:

  1. Confirm the specific GPO setting that overrides
  2. Either modify the blueprint/playbook to accept the GPO setting, or
  3. Exempt the new VM OU from the blocking policy (AD team action)

fleet_agent role takes 10+ minutes

MSI download is slow. The file is ~60MB from an internal repo. Not a bug, but if consistent across many VMs, check the repo server's throughput.

Registry build info shows CreatedBy = Ansible Automation instead of the requester

Blueprint isn't passing build_created_by through customProperties. Check the blueprint's customProperties.requestedBy = ${env.requestedBy} — that's the hook point. If the blueprint is right, check the Ansible extra-vars wiring in Cloud_Ansible_Windows in the blueprint.


Live Inventory Note

inventories/Prod/hosts contains real deployed test hostnames (AUTHD-TEST9xx, ESXD-TEST9xx, APPSD-TEST9x, etc.). This is a live working file used by Ansible during deployment, not a template. It's included here intact because this is the customer's internal repo — the operator's current working state is part of the handoff.

To reuse this as a template on a fresh control node, replace the hostnames with your own or regenerate via the vSphere dynamic inventory script (see Ansible_Inventory_Generators/).


Version Summary

Artifact Version Last Major Change
windows_postdeploy.yml 2.0.0 Dispatcher pattern across all roles; SSH-primary with WinRM fallback
kms_activation OS-version-aware GVLK selection (2019/2022/2025)
join_domain_prod microsoft.ad.membership module (replaced legacy win_domain_membership)
fleet_agent Fleet osquery install with hostname-prefix exclusion for AD controllers

Contact

Original author: Noah Farshad (noah@essential.coach) Engagement: VMware / Aria Automation reference implementation

About

Production-grade Ansible role library for Windows Server post-deploy: timezone, KMS activation, hardening, RDP, fleet agents, domain join. Idempotent via desired_state dispatcher pattern

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors