AI Proxmox Hypervisor Management

How the Proxmox Integration Works

The hypervisor module authenticates against the Proxmox VE REST API using a dedicated API token scoped with PVEAuditor and PVEVMAdmin roles. When you ask a question about your virtualization environment, the system's intent classifier routes your request to the Proxmox collector, which translates natural language into the appropriate API calls.

The collector maintains a persistent connection pool to each Proxmox node in your cluster. It does not cache stale data. Every query you run hits the live API, and the response is assembled in real time. This is the same design principle behind every module: the AI layer generates dashboards and analysis from current infrastructure state, never from memory or training data.

Authentication and Access Control

API tokens are stored in an encrypted credential vault and are never exposed in conversation output. The token's role-based permissions in Proxmox determine what the platform can read and modify. For read-only deployments, a PVEAuditor role is sufficient. For environments where you want AI-assisted provisioning, the token needs PVEVMAdmin on the target resource pools. The system never escalates beyond the permissions you configure.

VM and Container Queries

The most common interaction pattern is querying the current state of your virtual machines and LXC containers. You can ask in plain language and the system builds structured output from the /api2/json/nodes/{node}/qemu and /api2/json/nodes/{node}/lxc endpoints.

virp

> show me all running VMs on pve-node01 [REST API] GET /api2/json/nodes/pve-node01/qemu — 200 OK (38ms) Running VMs on pve-node01: VMID NAME STATUS CPU MEM DISK UPTIME 100 dc01-win2022 running 2/4 4.1/8 GB 32 GB 14d 6h 101 app-server-prod running 3/8 12.4/16 GB 120 GB 14d 6h 102 db-postgres-01 running 4/8 28.1/32 GB 500 GB 14d 6h 103 docker-host-01 running 6/8 13.8/16 GB 200 GB 7d 12h 105 monitoring-stack running 1/2 1.9/4 GB 40 GB 14d 6h 5 VMs running · 16/30 vCPUs allocated · 60.3/76 GB RAM in use

The tabular output is generated inline by the AI layer, not by a pre-built template. The system reads the raw JSON from the Proxmox API, identifies the fields that are relevant to your query, and assembles a structured response. If you ask a follow-up like "which one is using the most memory," it re-reads the data and answers contextually.

LXC Container Management

Lightweight containers are a core part of most Proxmox environments, and the system handles them as first-class objects. You can query, start, stop, and inspect LXC containers with the same conversational interface used for full VMs.

virp

> list all containers across the cluster with their resource usage [REST API] GET /api2/json/cluster/resources?type=lxc — 200 OK (52ms) LXC Containers (cluster-wide): CTID NAME NODE STATUS CPU MEM DISK 200 nginx-proxy pve-node01 running 0.3/2 512/2048 MB 8 GB 201 pihole-primary pve-node01 running 0.1/1 128/512 MB 4 GB 202 unbound-dns pve-node02 running 0.1/1 96/512 MB 4 GB 203 dev-sandbox pve-node02 stopped -/- -/4096 MB 20 GB 204 wazuh-agent-ct pve-node03 running 0.4/2 384/1024 MB 12 GB 4 running · 1 stopped · 3 nodes

Resource Monitoring: CPU, Memory, and Disk

Beyond listing VMs, the system pulls real-time resource utilization from the Proxmox RRD (round-robin database) API endpoints. This gives you CPU load averages, memory pressure, disk I/O rates, and network throughput without opening a browser or SSH session.

virp

> how are my nodes doing on resources? [REST API] GET /api2/json/cluster/resources?type=node — 200 OK (41ms) Cluster Node Resource Summary: NODE STATUS CPU RAM STORAGE (local-zfs) pve-node01 online 34% (4c) 60.3/76 GB (79%) 1.2/3.6 TB (33%) pve-node02 online 22% (4c) 38.7/64 GB (60%) 0.9/3.6 TB (25%) pve-node03 online 71% (4c) 58.2/64 GB (91%) 2.8/3.6 TB (78%) [WARNING] pve-node03 RAM at 91% — 5.8 GB free. Consider migrating workloads. pve-node03 storage trending toward 85% within ~30 days at current growth rate.

The warning is not hard-coded. The AI layer computes threshold crossings from the live numbers and raises alerts when utilization exceeds configurable limits (default: 85% RAM, 90% storage). The storage growth projection uses the delta between current usage and the value from the previous polling interval to estimate linear fill rate.

ZFS Storage Health

Most production Proxmox deployments use ZFS for data integrity and snapshots. The platform queries ZFS pool status through the Proxmox API's storage and disk endpoints, surfacing pool health, fragmentation, scrub status, and any degraded vdevs.

virp

> check ZFS health on all nodes [REST API] GET /api2/json/nodes/pve-node01/disks/zfs — 200 OK [REST API] GET /api2/json/nodes/pve-node02/disks/zfs — 200 OK [REST API] GET /api2/json/nodes/pve-node03/disks/zfs — 200 OK ZFS Pool Status: NODE POOL STATE SIZE ALLOC FRAG LAST SCRUB pve-node01 rpool ONLINE 3.6 TB 1.2 TB 12% 2026-02-20 02:00 (clean) pve-node01 backup-pool ONLINE 7.2 TB 4.1 TB 8% 2026-02-19 03:00 (clean) pve-node02 rpool ONLINE 3.6 TB 0.9 TB 6% 2026-02-20 02:00 (clean) pve-node03 rpool DEGRADED 3.6 TB 2.8 TB 31% 2026-02-18 02:00 (errors) [CRITICAL] pve-node03/rpool is DEGRADED. 1 vdev shows read errors. Recommended: Check physical disk health with `zpool status rpool` on pve-node03. High fragmentation (31%) may impact write performance. Consider rebalancing.

ZFS degradation is one of the highest-severity conditions the system can surface. When a pool enters a degraded state, the platform flags it immediately and provides the specific CLI command you would need to investigate further on the host. The system does not attempt to run host-level commands on its own — it operates exclusively through the Proxmox REST API and presents actionable guidance for anything that requires direct shell access.

Cluster Management and HA Status

For multi-node Proxmox clusters, the system monitors cluster quorum, corosync ring status, and HA (High Availability) group configuration through the /api2/json/cluster/ha and /api2/json/cluster/status endpoints.

virp

> what's the cluster HA status? [REST API] GET /api2/json/cluster/ha/status/current — 200 OK (29ms) [REST API] GET /api2/json/cluster/status — 200 OK (31ms) Cluster: tli-prod-cluster Quorum: OK (3/3 nodes, votes: 3) Corosync: healthy, ring0 operational HA-Managed Resources: SID STATE NODE GROUP MAX_RESTART MAX_RELOCATE vm:100 started pve-node01 ha-group1 3 1 vm:101 started pve-node01 ha-group1 3 1 vm:102 started pve-node02 ha-group2 3 1 ct:200 started pve-node01 ha-group1 3 1 4 HA resources active · 0 fenced · 0 in error state

If a node loses quorum or an HA resource enters an error state, the system reports it with full context: which resource, which node, and what the last known state transition was. This eliminates the need to check the Proxmox web UI or parse ha-manager status output manually.

Backup Monitoring

The platform queries the Proxmox backup job configuration and recent task logs to give you a consolidated view of backup health. It pulls from /api2/json/cluster/backup for schedules and /api2/json/nodes/{node}/tasks filtered to vzdump type for execution history.

virp

> did all backups complete successfully this week? [REST API] GET /api2/json/cluster/backup — 200 OK [REST API] GET /api2/json/nodes/pve-node01/tasks?typefilter=vzdump&limit=50 — 200 OK [REST API] GET /api2/json/nodes/pve-node02/tasks?typefilter=vzdump&limit=50 — 200 OK [REST API] GET /api2/json/nodes/pve-node03/tasks?typefilter=vzdump&limit=50 — 200 OK Backup Summary (Feb 17 – Feb 23): JOB ID SCHEDULE STORAGE LAST RUN STATUS backup-01 daily 01:00 backup-pool 2026-02-23 01:14:32 OK (7/7 VMs) backup-02 daily 02:00 nfs-backup 2026-02-23 02:47:10 WARN (6/7 VMs) backup-03 weekly Sun backup-pool 2026-02-16 03:00:00 OK (all CTs) [WARNING] backup-02: VM 103 (docker-host-01) failed on Feb 21. Error: "vzdump: backup size 201 GB exceeds storage limit" 6 of 7 successful runs this week. 1 failure needs attention.

The system does not just report pass/fail. It pulls the task log for failed jobs and extracts the error message, so you can understand the root cause without digging through the Proxmox task viewer. In the example above, the AI identifies that the backup target ran out of space for a specific VM and surfaces the exact error string from the vzdump log.

VM Provisioning and Lifecycle Management

When the API token has write permissions, the platform can provision new VMs and containers through natural language. The system translates your request into the appropriate API calls, but critically, it presents the full configuration for your approval before executing anything.

virp

> create a new Ubuntu 22.04 VM on pve-node02, 4 cores, 8GB RAM, 80GB disk I'll provision the following VM on pve-node02: VMID: 106 (next available) Name: ubuntu-2204-new Template: ubuntu-22.04-cloud (local:iso) CPU: 4 cores, type=host Memory: 8192 MB Disk: 80 GB on local-zfs Network: vmbr0, DHCP Start: no (manual start after review) [APPROVAL REQUIRED] This is a write operation. Type "approve" to proceed or "cancel" to abort.

Every write operation — VM creation, resource modification, start, stop, migration — goes through the approval gate. The system never executes destructive or state-changing API calls without explicit confirmation. This is not a configurable setting; it is hard-coded into the architecture.

AI Capacity Planning and Rebalancing

One of the more valuable features of AI-powered server management is capacity analysis. The system examines resource allocation across your cluster and identifies imbalances, over-provisioned VMs, and nodes approaching capacity limits.

virp

> analyze cluster capacity and suggest rebalancing [REST API] Querying resource allocation across 3 nodes, 9 VMs, 5 CTs... Capacity Analysis: NODE vCPU RATIO RAM COMMIT STORAGE pve-node01 16/16 (100%) 60/76 GB 33% disk pve-node02 12/16 (75%) 39/64 GB 25% disk pve-node03 14/16 (88%) 58/64 GB 78% disk Recommendations: 1. Migrate VM 103 (docker-host-01, 8 vCPU/16 GB) from pve-node01 to pve-node02. This frees 8 vCPU and 16 GB on node01, and node02 still has headroom (20/16 vCPU, 55/64 GB RAM). 2. VM 102 (db-postgres-01) is allocated 32 GB RAM but averaging 28.1 GB usage. Current headroom is 12%. No action needed yet, but monitor for growth. 3. pve-node03 storage at 78%. At current growth rate (~1.2 GB/day), it will reach 90% in approximately 36 days. Plan storage expansion or migrate disk-heavy workloads. No critical imbalances detected. 1 migration suggested for optimization.

These recommendations are computed from the live resource data, not from a pre-defined rule engine. The AI layer compares actual utilization against allocated resources, identifies which migrations would improve overall balance, and validates that the destination node has sufficient headroom before suggesting the move. It will never recommend a migration that would push the target node past its capacity thresholds.

Inline Dashboard Generation

A key architectural feature of the platform is that dashboards are not pre-built panels. When you ask for a resource overview, cluster summary, or backup report, the AI constructs the output format dynamically based on what data the API returned and what your query was asking for. A question about "memory usage across VMs" produces a memory-focused table. A question about "backup health" produces a schedule-and-status view. The same underlying API data is presented differently depending on context.

This means there are no dashboard pages to maintain, no widget configurations to update when you add a new node, and no stale panels showing data from a VM that was decommissioned last month. The output always reflects the current state of the cluster because it is generated on the fly from the current API response.

What the System Does Not Do

The platform operates exclusively through the Proxmox REST API. It does not SSH into your nodes. It does not install agents on guest VMs. It does not require network access to anything other than the Proxmox API endpoint (typically port 8006). If the Proxmox API does not expose a particular data point, the system will tell you that the information is not available through the API rather than guessing or fabricating a response. This constraint is fundamental to the anti-fabrication architecture that governs all modules.

AI-Powered Hypervisor Control

How the Proxmox Integration Works

Authentication and Access Control

VM and Container Queries

LXC Container Management

Resource Monitoring: CPU, Memory, and Disk

ZFS Storage Health

Cluster Management and HA Status

Backup Monitoring

VM Provisioning and Lifecycle Management

AI Capacity Planning and Rebalancing

Inline Dashboard Generation

What the System Does Not Do

See it on your cluster

How the Proxmox Integration Works

Authentication and Access Control

VM and Container Queries

LXC Container Management

Resource Monitoring: CPU, Memory, and Disk

ZFS Storage Health

Cluster Management and HA Status

Backup Monitoring

VM Provisioning and Lifecycle Management

AI Capacity Planning and Rebalancing

Inline Dashboard Generation

What the System Does Not Do

Related Capabilities

See it on your cluster