LLM Agents¶

rimae/scan includes two optional AI-powered agents that automate configuration discovery tasks. Both agents are powered by the Anthropic API and use Claude as the reasoning engine. They operate as background tasks via background and produce staged results for human review before any configuration is activated.

Configuration Requirements¶

Both agents require an Anthropic API key:

ANTHROPIC_API_KEY=sk-ant-...

Set this in /etc/rimae-scan/rimae-scan.conf or via the environment. Without this key, agents will skip their LLM-powered discovery steps and rely only on deterministic URL template matching and API lookups.

Note: LLM calls are a fallback mechanism. The agents first attempt deterministic lookups (known URL templates, public APIs) before using the Anthropic API. Many OS versions and applications will be fully resolved without any LLM calls.

OS Version Onboarding Agent¶

The version onboarding agent automates the setup of OS version tracking configurations. When a new operating system version is detected in your infrastructure (e.g., a new Ubuntu LTS release), the agent runs a 7-stage pipeline to discover advisory feeds, CPE strings, OVAL definitions, and dependency chains.

7-Stage Pipeline¶

Stage 1: Identity Resolution¶

Queries the endoflife.date API for release metadata:

Release date and end-of-life date
Codename (e.g., jammy for Ubuntu 22.04)
LTS status
For Proxmox: resolves the underlying Debian base version and codename

Supported distributions: Ubuntu, Debian, AlmaLinux, Proxmox, ESXi, RHEL, CentOS, Rocky Linux

Confidence output: verified (API returned data) or not_found

Stage 2: Advisory Feed Discovery¶

Locates the machine-readable vulnerability advisory feed for the distribution:

Checks known URL templates for the distro family:
Ubuntu: USN RSS feed (https://ubuntu.com/security/notices.rss)
Debian: DSA feed (https://www.debian.org/security/dsa)
AlmaLinux: Errata feed (https://errata.almalinux.org/)
Proxmox: Security advisories page
ESXi: Broadcom/VMware security advisories (VMSA)
Validates each candidate URL with a HEAD request to confirm reachability
If no known template works, falls back to the Anthropic API to reason about alternative URLs

Confidence output: verified (URL confirmed reachable), inferred (LLM suggestion confirmed), or not_found

Stage 3: NVD CPE Resolution¶

Queries the NVD CPE Dictionary API to find the matching CPE 2.3 string:

cpe:2.3:o:canonical:ubuntu_linux:22.04:*:*:*:*:*:*:*

Vendor/product mappings: Ubuntu (Canonical), Debian, AlmaLinux, Proxmox, ESXi (VMware), RHEL (Red Hat)

Confidence output: verified (CPE found in NVD) or not_found

Stage 4: OVAL Feed Discovery¶

Checks known OVAL definition repositories per distribution:

Ubuntu: https://security-metadata.canonical.com/oval/com.ubuntu.{codename}.usn.oval.xml.bz2
Debian: https://www.debian.org/security/oval/oval-definitions-{codename}.xml.bz2
AlmaLinux: https://repo.almalinux.org/almalinux/{major}/security/oval/org.almalinux.alma-{major}.xml.bz2

Confidence output: verified (URL reachable), inferred (URL constructed but not confirmed), or not_applicable (no OVAL feed for this distro)

Stage 5: Dependency Chain Resolution¶

For distributions based on another (currently Proxmox on Debian), resolves the upstream advisory sources:

Locates the Debian DSA feed for the base version
Checks the Debian OVAL feed for the base codename
Records these as secondary advisory sources

Proxmox-to-Debian version mapping:

Proxmox	Debian Version	Debian Codename
8.x	12	bookworm
7.x	11	bullseye
6.x	10	buster
5.x	9	stretch

Confidence output: verified, inferred, or not_applicable

Stage 6: Confidence Scoring¶

Aggregates per-field confidence scores into a summary report:

{
  "identity": "verified",
  "advisory_feed": "verified",
  "cpe": "verified",
  "oval": "inferred",
  "dependency_chain": "not_applicable"
}

Confidence levels:

Level	Meaning
`verified`	Data confirmed from authoritative source or URL reachability check
`inferred`	Data suggested by LLM or constructed from patterns but not fully verified
`not_found`	No data could be discovered
`not_applicable`	This field does not apply to this distribution

Stage 7: Staged Commit¶

Writes the discovered configuration to the database:

Creates an OsVersionConfig record with enabled=false and auto_discovered=true
Stores the full resolution report (all stage results) in the resolution_report JSON field
Creates an informational alert for admin review

Warning: Auto-discovered configurations are never automatically activated. An admin must review and approve each discovery before it begins affecting vulnerability correlation.

Triggering the Agent¶

Via the API:

# Discover all new OS versions from Wazuh inventory
curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://rimae-scan.example.com/api/config/os-versions/discover

# Re-run the agent for a specific OS version
curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://rimae-scan.example.com/api/config/os-versions/<config-id>/re-run-agent

Both endpoints return HTTP 202 Accepted and queue a background task.

Via the UI:

Discover New Versions button on the OS Versions settings page triggers the discovery scan
Re-Run Agent button on individual version entries re-runs the pipeline for that version

Reviewing Staged Discoveries¶

Staged discoveries appear in the Staged tab on the OS Versions settings page. For each staged version:

Review the resolution report to check confidence levels across all stages
Edit any fields that need correction (advisory URLs, CPE strings, etc.)
Approve to enable the version for active tracking, or leave it staged

# List staged discoveries
curl -H "Authorization: Bearer $TOKEN" \
  https://rimae-scan.example.com/api/config/os-versions/staged

# Approve a staged discovery
curl -X PATCH -H "Authorization: Bearer $TOKEN" \
  https://rimae-scan.example.com/api/config/os-versions/<config-id>/approve

Source Discovery Agent¶

The source discovery agent automates the process of finding vulnerability advisory sources for tracked applications (Nginx, PostgreSQL, Redis, etc.).

Discovery Strategies¶

The agent uses four strategies in sequence, stopping when a viable source is found:

Strategy 1: Well-Known Vendor Patterns¶

Checks a curated database of known advisory URLs for popular software:

Application	Advisory Source
Nginx	`https://nginx.org/en/security_advisories.html`
Apache HTTPD	`https://httpd.apache.org/security/vulnerabilities_24.html`
PostgreSQL	`https://www.postgresql.org/support/security/`
Redis	`https://github.com/redis/redis/security/advisories`
OpenSSH	`https://www.openssh.com/security.html`
OpenSSL	`https://www.openssl.org/news/vulnerabilities.html`

Each URL is validated with a HEAD request.

Strategy 2: GitHub Security Advisories¶

If the application has a linked GitHub repository, checks:

https://github.com/{repo}/security/advisories -- GitHub Security Advisories (GHSA)
https://github.com/{repo}/releases.atom -- Release Atom feed for version tracking

Strategy 3: NVD CPE Search¶

Searches the NVD CPE Dictionary API by application name to find matching CPE entries. Returns a link to NVD vulnerability search results for the matched CPE.

Strategy 4: LLM Fallback¶

If no reachable source has been found, queries the Anthropic API to suggest 1--3 advisory feed URLs. Each suggestion is validated for reachability before being included as a candidate.

Candidate Selection¶

After all strategies run, the agent selects the best candidate based on:

Confidence level -- verified > inferred > not_found
Reachability -- Reachable URLs are preferred over unreachable ones

Result Structure¶

The agent returns a DiscoveryResult containing:

{
  "app_name": "Nginx",
  "app_slug": "nginx",
  "candidates": [
    {
      "url": "https://nginx.org/en/security_advisories.html",
      "source_type": "vendor_advisory",
      "reachable": true,
      "confidence": "verified",
      "description": "Nginx official security advisories"
    }
  ],
  "best_candidate": {
    "url": "https://nginx.org/en/security_advisories.html",
    "source_type": "vendor_advisory",
    "reachable": true,
    "confidence": "verified",
    "description": "Nginx official security advisories"
  }
}

Cost and Privacy Considerations¶

LLM calls are made only when deterministic methods fail to find a result
Only distribution/application names and version numbers are sent to the Anthropic API -- no asset data, hostnames, or vulnerability details
Each LLM call uses a small token budget (256 tokens for onboarding, 512 for source discovery)
Set ANTHROPIC_API_KEY to empty to disable all LLM functionality

Settings Reference -- Managing OS version configs
Settings Reference -- Managing application configs
API Reference -- OS version API endpoints
Troubleshooting -- Agent troubleshooting