LLM Agents¶
rimae/scan includes two optional AI-powered agents that automate configuration discovery tasks. Both agents are powered by the Anthropic API and use Claude as the reasoning engine. They operate as background tasks via background and produce staged results for human review before any configuration is activated.
Configuration Requirements¶
Both agents require an Anthropic API key:
Set this in /etc/rimae-scan/rimae-scan.conf or via the environment. Without this key, agents will skip their LLM-powered discovery steps and rely only on deterministic URL template matching and API lookups.
Note: LLM calls are a fallback mechanism. The agents first attempt deterministic lookups (known URL templates, public APIs) before using the Anthropic API. Many OS versions and applications will be fully resolved without any LLM calls.
OS Version Onboarding Agent¶
The version onboarding agent automates the setup of OS version tracking configurations. When a new operating system version is detected in your infrastructure (e.g., a new Ubuntu LTS release), the agent runs a 7-stage pipeline to discover advisory feeds, CPE strings, OVAL definitions, and dependency chains.
7-Stage Pipeline¶
Stage 1: Identity Resolution¶
Queries the endoflife.date API for release metadata:
- Release date and end-of-life date
- Codename (e.g.,
jammyfor Ubuntu 22.04) - LTS status
- For Proxmox: resolves the underlying Debian base version and codename
Supported distributions: Ubuntu, Debian, AlmaLinux, Proxmox, ESXi, RHEL, CentOS, Rocky Linux
Confidence output: verified (API returned data) or not_found
Stage 2: Advisory Feed Discovery¶
Locates the machine-readable vulnerability advisory feed for the distribution:
- Checks known URL templates for the distro family:
- Ubuntu: USN RSS feed (
https://ubuntu.com/security/notices.rss) - Debian: DSA feed (
https://www.debian.org/security/dsa) - AlmaLinux: Errata feed (
https://errata.almalinux.org/) - Proxmox: Security advisories page
-
ESXi: Broadcom/VMware security advisories (VMSA)
-
Validates each candidate URL with a HEAD request to confirm reachability
-
If no known template works, falls back to the Anthropic API to reason about alternative URLs
Confidence output: verified (URL confirmed reachable), inferred (LLM suggestion confirmed), or not_found
Stage 3: NVD CPE Resolution¶
Queries the NVD CPE Dictionary API to find the matching CPE 2.3 string:
Vendor/product mappings: Ubuntu (Canonical), Debian, AlmaLinux, Proxmox, ESXi (VMware), RHEL (Red Hat)
Confidence output: verified (CPE found in NVD) or not_found
Stage 4: OVAL Feed Discovery¶
Checks known OVAL definition repositories per distribution:
- Ubuntu:
https://security-metadata.canonical.com/oval/com.ubuntu.{codename}.usn.oval.xml.bz2 - Debian:
https://www.debian.org/security/oval/oval-definitions-{codename}.xml.bz2 - AlmaLinux:
https://repo.almalinux.org/almalinux/{major}/security/oval/org.almalinux.alma-{major}.xml.bz2
Confidence output: verified (URL reachable), inferred (URL constructed but not confirmed), or not_applicable (no OVAL feed for this distro)
Stage 5: Dependency Chain Resolution¶
For distributions based on another (currently Proxmox on Debian), resolves the upstream advisory sources:
- Locates the Debian DSA feed for the base version
- Checks the Debian OVAL feed for the base codename
- Records these as secondary advisory sources
Proxmox-to-Debian version mapping:
| Proxmox | Debian Version | Debian Codename |
|---|---|---|
| 8.x | 12 | bookworm |
| 7.x | 11 | bullseye |
| 6.x | 10 | buster |
| 5.x | 9 | stretch |
Confidence output: verified, inferred, or not_applicable
Stage 6: Confidence Scoring¶
Aggregates per-field confidence scores into a summary report:
{
"identity": "verified",
"advisory_feed": "verified",
"cpe": "verified",
"oval": "inferred",
"dependency_chain": "not_applicable"
}
Confidence levels:
| Level | Meaning |
|---|---|
verified |
Data confirmed from authoritative source or URL reachability check |
inferred |
Data suggested by LLM or constructed from patterns but not fully verified |
not_found |
No data could be discovered |
not_applicable |
This field does not apply to this distribution |
Stage 7: Staged Commit¶
Writes the discovered configuration to the database:
- Creates an
OsVersionConfigrecord withenabled=falseandauto_discovered=true - Stores the full resolution report (all stage results) in the
resolution_reportJSON field - Creates an informational alert for admin review
Warning: Auto-discovered configurations are never automatically activated. An admin must review and approve each discovery before it begins affecting vulnerability correlation.
Triggering the Agent¶
Via the API:
# Discover all new OS versions from Wazuh inventory
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://rimae-scan.example.com/api/config/os-versions/discover
# Re-run the agent for a specific OS version
curl -X POST -H "Authorization: Bearer $TOKEN" \
https://rimae-scan.example.com/api/config/os-versions/<config-id>/re-run-agent
Both endpoints return HTTP 202 Accepted and queue a background task.
Via the UI:
- Discover New Versions button on the OS Versions settings page triggers the discovery scan
- Re-Run Agent button on individual version entries re-runs the pipeline for that version
Reviewing Staged Discoveries¶
Staged discoveries appear in the Staged tab on the OS Versions settings page. For each staged version:
- Review the resolution report to check confidence levels across all stages
- Edit any fields that need correction (advisory URLs, CPE strings, etc.)
- Approve to enable the version for active tracking, or leave it staged
# List staged discoveries
curl -H "Authorization: Bearer $TOKEN" \
https://rimae-scan.example.com/api/config/os-versions/staged
# Approve a staged discovery
curl -X PATCH -H "Authorization: Bearer $TOKEN" \
https://rimae-scan.example.com/api/config/os-versions/<config-id>/approve
Source Discovery Agent¶
The source discovery agent automates the process of finding vulnerability advisory sources for tracked applications (Nginx, PostgreSQL, Redis, etc.).
Discovery Strategies¶
The agent uses four strategies in sequence, stopping when a viable source is found:
Strategy 1: Well-Known Vendor Patterns¶
Checks a curated database of known advisory URLs for popular software:
| Application | Advisory Source |
|---|---|
| Nginx | https://nginx.org/en/security_advisories.html |
| Apache HTTPD | https://httpd.apache.org/security/vulnerabilities_24.html |
| PostgreSQL | https://www.postgresql.org/support/security/ |
| Redis | https://github.com/redis/redis/security/advisories |
| OpenSSH | https://www.openssh.com/security.html |
| OpenSSL | https://www.openssl.org/news/vulnerabilities.html |
Each URL is validated with a HEAD request.
Strategy 2: GitHub Security Advisories¶
If the application has a linked GitHub repository, checks:
https://github.com/{repo}/security/advisories-- GitHub Security Advisories (GHSA)https://github.com/{repo}/releases.atom-- Release Atom feed for version tracking
Strategy 3: NVD CPE Search¶
Searches the NVD CPE Dictionary API by application name to find matching CPE entries. Returns a link to NVD vulnerability search results for the matched CPE.
Strategy 4: LLM Fallback¶
If no reachable source has been found, queries the Anthropic API to suggest 1--3 advisory feed URLs. Each suggestion is validated for reachability before being included as a candidate.
Candidate Selection¶
After all strategies run, the agent selects the best candidate based on:
- Confidence level --
verified>inferred>not_found - Reachability -- Reachable URLs are preferred over unreachable ones
Result Structure¶
The agent returns a DiscoveryResult containing:
{
"app_name": "Nginx",
"app_slug": "nginx",
"candidates": [
{
"url": "https://nginx.org/en/security_advisories.html",
"source_type": "vendor_advisory",
"reachable": true,
"confidence": "verified",
"description": "Nginx official security advisories"
}
],
"best_candidate": {
"url": "https://nginx.org/en/security_advisories.html",
"source_type": "vendor_advisory",
"reachable": true,
"confidence": "verified",
"description": "Nginx official security advisories"
}
}
Cost and Privacy Considerations¶
- LLM calls are made only when deterministic methods fail to find a result
- Only distribution/application names and version numbers are sent to the Anthropic API -- no asset data, hostnames, or vulnerability details
- Each LLM call uses a small token budget (256 tokens for onboarding, 512 for source discovery)
- Set
ANTHROPIC_API_KEYto empty to disable all LLM functionality
Related Documentation¶
- Settings Reference -- Managing OS version configs
- Settings Reference -- Managing application configs
- API Reference -- OS version API endpoints
- Troubleshooting -- Agent troubleshooting