Skip to content

Correlation Engine

The correlation engine is the core of rimae/scan. It matches your asset inventory against vulnerability data to produce actionable vulnerability match records. The engine supports both full and incremental correlation runs.

Overview

Correlation answers the question: which of the thousands of CVEs in the database actually affect my specific assets? It does this by comparing installed package versions against advisory fix versions, scoring each match with a weighted composite formula, and deduplicating results from multiple overlapping sources.

The 7-Step Correlation Pipeline

Every correlation run executes the following steps in sequence:

Step 0: Build Source Registry

The engine queries vuln_source_configs to determine which vulnerability sources are enabled and their scoring weights. This produces an immutable SourceRegistry used throughout the run.

Step 1: Asset Inventory Snapshot

A point-in-time snapshot of the entire asset inventory is loaded into memory:

  • All Asset records.
  • All OsPackage records, grouped by asset ID.
  • All AppInstance records, grouped by asset ID.
  • All InfraComponent records, grouped by asset ID.
  • All DockerImage records, grouped by asset ID.

This snapshot is immutable -- the engine never modifies inventory data during correlation.

Step 2+3: Vulnerability Source Lookup and CVE Enrichment

For each package on each asset, the engine searches two advisory indexes:

  1. Vendor advisories (Advisory + AdvisoryPackage tables) -- OS and vendor advisories that reference specific packages and CVEs.
  2. Ecosystem advisories (EcosystemAdvisory table) -- advisories from language ecosystems (OSV, GitHub, PyPA, etc.).

For each match, the engine creates a MatchCandidate enriched with CVE data including CVSS scores (v3.1 and v4), EPSS probability, KEV status, ransomware flags, and exploit availability (Metasploit, Nuclei, ExploitDB, XDB).

In incremental mode, only the specified CVE IDs are checked, significantly reducing processing time.

Step 4: Version Comparison

Each candidate is filtered to determine if the installed version is actually vulnerable:

  • If no fixed_version is known, the candidate is kept (marked as needing review).
  • If a fixed_version exists, the installed version is compared using the appropriate versioning scheme.
  • A result of installed < fixed means the asset is vulnerable; the candidate is kept.
  • A result of installed >= fixed means the asset has the fix; the candidate is discarded.
  • If version comparison fails (malformed version strings), the candidate is kept as needs review.

Version Comparison Schemes

The engine supports 5 versioning schemes, auto-detected from the package source or ecosystem:

Scheme Used For Detection Keys
semver npm, Cargo, generic npm, cargo, or fallback default
deb Debian/Ubuntu packages apt, dpkg, deb
rpm Red Hat/AlmaLinux/Fedora rpm, yum, dnf
pep440 Python packages pip, pypi, python
go Go modules go, golang

Each scheme implements full specification-compliant parsing:

  • semver: major.minor.patch with pre-release comparison (no pre-release > any pre-release).
  • deb: epoch:upstream-debian with the full dpkg character weight algorithm (tilde sorts before everything).
  • rpm: epoch:version-release with rpmvercmp token splitting (numeric segments > alphabetic segments).
  • pep440: epoch, release segments, pre-release (alpha/beta/rc), post-release, and dev-release ordering.
  • go: standard semver plus Go pseudo-version detection (timestamp-hash pre-release strings).

The engine also supports affected range checking where an advisory specifies introduced, fixed, and/or last_affected bounds.

Step 5: Composite Scoring

Every verified candidate is scored using the composite scoring formula (detailed below). Scoring failures are logged but do not block the run; the candidate is recorded with an error note.

Step 6: Deduplication

The same CVE can be reported by multiple sources (e.g. NVD, Ubuntu USN, and OSV all reporting the same vulnerability for the same package). The deduplication step merges candidates that share the same (asset_id, cve_id) pair:

  • Fixed version is selected from the most authoritative source, in order: vendor advisory, OSV, NVD, Red Hat, Ubuntu, AlmaLinux, Debian, GitHub Advisory.
  • Source references are merged as a union -- all contributing sources are preserved.
  • Enrichment flags use OR logic -- if any source reports KEV, Metasploit, or ransomware, the merged candidate inherits those flags.
  • CVSS and EPSS scores take the maximum value across all sources.

Step 7: Write Vulnerability Matches

The final scored and deduplicated matches are written to the vuln_match table:

  • New matches are created with status open and first_seen_at set to now.
  • Existing matches (same asset + CVE) are updated with the latest scores, enrichment, and last_confirmed_at timestamp.
  • Disappeared matches -- open matches from a previous run that are no longer produced -- are automatically set to status resolved with resolved_at timestamp.

Composite Scoring Formula

The composite score is a weighted average on a 0-10 scale. Each signal produces a raw value (clamped to 0-10), multiplied by its weight, then normalised:

composite = sum(signal_value * weight) / sum(weights)

Scoring Signals

Seven signals are currently implemented:

Signal Source Default Weight Raw Value Range Description
CVSS Base Score nvd 0.40 0-10 CVSS v4 preferred, v3.1 fallback. Already on a 0-10 scale.
EPSS Probability epss 0.20 0-10 EPSS probability (0-1) scaled to 0-10.
KEV Boost kev 0.15 0-5 CISA KEV confirmed = 3.0, extended only = 2.0, ransomware confirmed = +2.0 additional.
Exploit Availability exploit_signals 0.15 0-2.5 Takes the highest single signal: Metasploit = 2.5, Nuclei = 1.5, ExploitDB = 1.0, XDB = 0.5.
SIEM Correlation siem 0.05 0-10 Boosts score when SIEM alerts correlate with the CVE on the same asset.
Asset Criticality asset_criticality 0.03 0-10 Scales score based on the asset's criticality tier (critical, high, medium, low).
ThreatIntel Corroboration threat_intel 0.02 0-10 Boosts score when threat intelligence feeds corroborate active exploitation.

Note: Signal weights can be overridden per-source via the scoring configuration in Settings. The source-level priority_weight multiplies the signal's default weight.

Signal Discovery

Signals are registered in the internal/correlation/signals/ package. Each signal implements the ScoringSignal interface defined in signal.go. A signal is active if its source slug matches an enabled source, or if it is tagged as builtin.

Interpreting Composite Scores

Score Range Interpretation
8.0 - 10.0 Critical priority -- active exploitation likely, patch immediately.
6.0 - 7.9 High priority -- significant risk, schedule remediation soon.
3.0 - 5.9 Medium priority -- monitor and plan remediation.
0.1 - 2.9 Low priority -- address during regular maintenance.

Triggering Correlation Runs

Full Correlation

A full run processes every asset and every CVE. This is typically scheduled periodically (e.g. after each crawl cycle) or triggered manually from the system settings.

Incremental Correlation

An incremental run processes only a specified list of CVE IDs -- typically the newly ingested CVEs from a crawler run. This is significantly faster and is the default mode after each crawler cycle completes.

API Trigger

Correlation runs can be triggered via the system API or from the Settings > System page.

Correlation Result

Each run returns an immutable CorrelationResult containing:

Field Description
new_matches Number of new vulnerability matches created.
updated Number of existing matches that were refreshed.
resolved Number of previously open matches that disappeared (auto-resolved).
total_processed Total candidates after deduplication.
errors List of non-fatal errors encountered during the run.