Skip to content

LLM Agents

rimae/scan includes two optional AI-powered agents that automate configuration discovery tasks. Both agents are powered by the Anthropic API and use Claude as the reasoning engine. They operate as background tasks via background and produce staged results for human review before any configuration is activated.

Configuration Requirements

Both agents require an Anthropic API key:

ANTHROPIC_API_KEY=sk-ant-...

Set this in /etc/rimae-scan/rimae-scan.conf or via the environment. Without this key, agents will skip their LLM-powered discovery steps and rely only on deterministic URL template matching and API lookups.

Note: LLM calls are a fallback mechanism. The agents first attempt deterministic lookups (known URL templates, public APIs) before using the Anthropic API. Many OS versions and applications will be fully resolved without any LLM calls.


OS Version Onboarding Agent

The version onboarding agent automates the setup of OS version tracking configurations. When a new operating system version is detected in your infrastructure (e.g., a new Ubuntu LTS release), the agent runs a 7-stage pipeline to discover advisory feeds, CPE strings, OVAL definitions, and dependency chains.

7-Stage Pipeline

Stage 1: Identity Resolution

Queries the endoflife.date API for release metadata:

  • Release date and end-of-life date
  • Codename (e.g., jammy for Ubuntu 22.04)
  • LTS status
  • For Proxmox: resolves the underlying Debian base version and codename

Supported distributions: Ubuntu, Debian, AlmaLinux, Proxmox, ESXi, RHEL, CentOS, Rocky Linux

Confidence output: verified (API returned data) or not_found

Stage 2: Advisory Feed Discovery

Locates the machine-readable vulnerability advisory feed for the distribution:

  1. Checks known URL templates for the distro family:
  2. Ubuntu: USN RSS feed (https://ubuntu.com/security/notices.rss)
  3. Debian: DSA feed (https://www.debian.org/security/dsa)
  4. AlmaLinux: Errata feed (https://errata.almalinux.org/)
  5. Proxmox: Security advisories page
  6. ESXi: Broadcom/VMware security advisories (VMSA)

  7. Validates each candidate URL with a HEAD request to confirm reachability

  8. If no known template works, falls back to the Anthropic API to reason about alternative URLs

Confidence output: verified (URL confirmed reachable), inferred (LLM suggestion confirmed), or not_found

Stage 3: NVD CPE Resolution

Queries the NVD CPE Dictionary API to find the matching CPE 2.3 string:

cpe:2.3:o:canonical:ubuntu_linux:22.04:*:*:*:*:*:*:*

Vendor/product mappings: Ubuntu (Canonical), Debian, AlmaLinux, Proxmox, ESXi (VMware), RHEL (Red Hat)

Confidence output: verified (CPE found in NVD) or not_found

Stage 4: OVAL Feed Discovery

Checks known OVAL definition repositories per distribution:

  • Ubuntu: https://security-metadata.canonical.com/oval/com.ubuntu.{codename}.usn.oval.xml.bz2
  • Debian: https://www.debian.org/security/oval/oval-definitions-{codename}.xml.bz2
  • AlmaLinux: https://repo.almalinux.org/almalinux/{major}/security/oval/org.almalinux.alma-{major}.xml.bz2

Confidence output: verified (URL reachable), inferred (URL constructed but not confirmed), or not_applicable (no OVAL feed for this distro)

Stage 5: Dependency Chain Resolution

For distributions based on another (currently Proxmox on Debian), resolves the upstream advisory sources:

  • Locates the Debian DSA feed for the base version
  • Checks the Debian OVAL feed for the base codename
  • Records these as secondary advisory sources

Proxmox-to-Debian version mapping:

Proxmox Debian Version Debian Codename
8.x 12 bookworm
7.x 11 bullseye
6.x 10 buster
5.x 9 stretch

Confidence output: verified, inferred, or not_applicable

Stage 6: Confidence Scoring

Aggregates per-field confidence scores into a summary report:

{
  "identity": "verified",
  "advisory_feed": "verified",
  "cpe": "verified",
  "oval": "inferred",
  "dependency_chain": "not_applicable"
}

Confidence levels:

Level Meaning
verified Data confirmed from authoritative source or URL reachability check
inferred Data suggested by LLM or constructed from patterns but not fully verified
not_found No data could be discovered
not_applicable This field does not apply to this distribution

Stage 7: Staged Commit

Writes the discovered configuration to the database:

  • Creates an OsVersionConfig record with enabled=false and auto_discovered=true
  • Stores the full resolution report (all stage results) in the resolution_report JSON field
  • Creates an informational alert for admin review

Warning: Auto-discovered configurations are never automatically activated. An admin must review and approve each discovery before it begins affecting vulnerability correlation.

Triggering the Agent

Via the API:

# Discover all new OS versions from Wazuh inventory
curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://rimae-scan.example.com/api/config/os-versions/discover

# Re-run the agent for a specific OS version
curl -X POST -H "Authorization: Bearer $TOKEN" \
  https://rimae-scan.example.com/api/config/os-versions/<config-id>/re-run-agent

Both endpoints return HTTP 202 Accepted and queue a background task.

Via the UI:

  • Discover New Versions button on the OS Versions settings page triggers the discovery scan
  • Re-Run Agent button on individual version entries re-runs the pipeline for that version

Reviewing Staged Discoveries

Staged discoveries appear in the Staged tab on the OS Versions settings page. For each staged version:

  1. Review the resolution report to check confidence levels across all stages
  2. Edit any fields that need correction (advisory URLs, CPE strings, etc.)
  3. Approve to enable the version for active tracking, or leave it staged
# List staged discoveries
curl -H "Authorization: Bearer $TOKEN" \
  https://rimae-scan.example.com/api/config/os-versions/staged

# Approve a staged discovery
curl -X PATCH -H "Authorization: Bearer $TOKEN" \
  https://rimae-scan.example.com/api/config/os-versions/<config-id>/approve

Source Discovery Agent

The source discovery agent automates the process of finding vulnerability advisory sources for tracked applications (Nginx, PostgreSQL, Redis, etc.).

Discovery Strategies

The agent uses four strategies in sequence, stopping when a viable source is found:

Strategy 1: Well-Known Vendor Patterns

Checks a curated database of known advisory URLs for popular software:

Application Advisory Source
Nginx https://nginx.org/en/security_advisories.html
Apache HTTPD https://httpd.apache.org/security/vulnerabilities_24.html
PostgreSQL https://www.postgresql.org/support/security/
Redis https://github.com/redis/redis/security/advisories
OpenSSH https://www.openssh.com/security.html
OpenSSL https://www.openssl.org/news/vulnerabilities.html

Each URL is validated with a HEAD request.

Strategy 2: GitHub Security Advisories

If the application has a linked GitHub repository, checks:

  • https://github.com/{repo}/security/advisories -- GitHub Security Advisories (GHSA)
  • https://github.com/{repo}/releases.atom -- Release Atom feed for version tracking

Searches the NVD CPE Dictionary API by application name to find matching CPE entries. Returns a link to NVD vulnerability search results for the matched CPE.

Strategy 4: LLM Fallback

If no reachable source has been found, queries the Anthropic API to suggest 1--3 advisory feed URLs. Each suggestion is validated for reachability before being included as a candidate.

Candidate Selection

After all strategies run, the agent selects the best candidate based on:

  1. Confidence level -- verified > inferred > not_found
  2. Reachability -- Reachable URLs are preferred over unreachable ones

Result Structure

The agent returns a DiscoveryResult containing:

{
  "app_name": "Nginx",
  "app_slug": "nginx",
  "candidates": [
    {
      "url": "https://nginx.org/en/security_advisories.html",
      "source_type": "vendor_advisory",
      "reachable": true,
      "confidence": "verified",
      "description": "Nginx official security advisories"
    }
  ],
  "best_candidate": {
    "url": "https://nginx.org/en/security_advisories.html",
    "source_type": "vendor_advisory",
    "reachable": true,
    "confidence": "verified",
    "description": "Nginx official security advisories"
  }
}

Cost and Privacy Considerations

  • LLM calls are made only when deterministic methods fail to find a result
  • Only distribution/application names and version numbers are sent to the Anthropic API -- no asset data, hostnames, or vulnerability details
  • Each LLM call uses a small token budget (256 tokens for onboarding, 512 for source discovery)
  • Set ANTHROPIC_API_KEY to empty to disable all LLM functionality