Correlation Engine¶

The correlation engine is the core of rimae/scan. It matches your asset inventory against vulnerability data to produce actionable vulnerability match records. The engine supports both full and incremental correlation runs.

Overview¶

Correlation answers the question: which of the thousands of CVEs in the database actually affect my specific assets? It does this by comparing installed package versions against advisory fix versions, scoring each match with a weighted composite formula, and deduplicating results from multiple overlapping sources.

The 7-Step Correlation Pipeline¶

Every correlation run executes the following steps in sequence:

Step 0: Build Source Registry¶

The engine queries vuln_source_configs to determine which vulnerability sources are enabled and their scoring weights. This produces an immutable SourceRegistry used throughout the run.

Step 1: Asset Inventory Snapshot¶

A point-in-time snapshot of the entire asset inventory is loaded into memory:

All Asset records.
All OsPackage records, grouped by asset ID.
All AppInstance records, grouped by asset ID.
All InfraComponent records, grouped by asset ID.
All DockerImage records, grouped by asset ID.

This snapshot is immutable -- the engine never modifies inventory data during correlation.

Step 2+3: Vulnerability Source Lookup and CVE Enrichment¶

For each package on each asset, the engine searches two advisory indexes:

Vendor advisories (Advisory + AdvisoryPackage tables) -- OS and vendor advisories that reference specific packages and CVEs.
Ecosystem advisories (EcosystemAdvisory table) -- advisories from language ecosystems (OSV, GitHub, PyPA, etc.).

For each match, the engine creates a MatchCandidate enriched with CVE data including CVSS scores (v3.1 and v4), EPSS probability, KEV status, ransomware flags, and exploit availability (Metasploit, Nuclei, ExploitDB, XDB).

In incremental mode, only the specified CVE IDs are checked, significantly reducing processing time.

Step 4: Version Comparison¶

Each candidate is filtered to determine if the installed version is actually vulnerable:

If no fixed_version is known, the candidate is kept (marked as needing review).
If a fixed_version exists, the installed version is compared using the appropriate versioning scheme.
A result of installed < fixed means the asset is vulnerable; the candidate is kept.
A result of installed >= fixed means the asset has the fix; the candidate is discarded.
If version comparison fails (malformed version strings), the candidate is kept as needs review.

Version Comparison Schemes¶

The engine supports 5 versioning schemes, auto-detected from the package source or ecosystem:

Scheme	Used For	Detection Keys
semver	npm, Cargo, generic	`npm`, `cargo`, or fallback default
deb	Debian/Ubuntu packages	`apt`, `dpkg`, `deb`
rpm	Red Hat/AlmaLinux/Fedora	`rpm`, `yum`, `dnf`
pep440	Python packages	`pip`, `pypi`, `python`
go	Go modules	`go`, `golang`

Each scheme implements full specification-compliant parsing:

semver: major.minor.patch with pre-release comparison (no pre-release > any pre-release).
deb: epoch:upstream-debian with the full dpkg character weight algorithm (tilde sorts before everything).
rpm: epoch:version-release with rpmvercmp token splitting (numeric segments > alphabetic segments).
pep440: epoch, release segments, pre-release (alpha/beta/rc), post-release, and dev-release ordering.
go: standard semver plus Go pseudo-version detection (timestamp-hash pre-release strings).

The engine also supports affected range checking where an advisory specifies introduced, fixed, and/or last_affected bounds.

Step 5: Composite Scoring¶

Every verified candidate is scored using the composite scoring formula (detailed below). Scoring failures are logged but do not block the run; the candidate is recorded with an error note.

Step 6: Deduplication¶

The same CVE can be reported by multiple sources (e.g. NVD, Ubuntu USN, and OSV all reporting the same vulnerability for the same package). The deduplication step merges candidates that share the same (asset_id, cve_id) pair:

Fixed version is selected from the most authoritative source, in order: vendor advisory, OSV, NVD, Red Hat, Ubuntu, AlmaLinux, Debian, GitHub Advisory.
Source references are merged as a union -- all contributing sources are preserved.
Enrichment flags use OR logic -- if any source reports KEV, Metasploit, or ransomware, the merged candidate inherits those flags.
CVSS and EPSS scores take the maximum value across all sources.

Step 7: Write Vulnerability Matches¶

The final scored and deduplicated matches are written to the vuln_match table:

New matches are created with status open and first_seen_at set to now.
Existing matches (same asset + CVE) are updated with the latest scores, enrichment, and last_confirmed_at timestamp.
Disappeared matches -- open matches from a previous run that are no longer produced -- are automatically set to status resolved with resolved_at timestamp.

Composite Scoring Formula¶

The composite score is a weighted average on a 0-10 scale. Each signal produces a raw value (clamped to 0-10), multiplied by its weight, then normalised:

composite = sum(signal_value * weight) / sum(weights)

Scoring Signals¶

Seven signals are currently implemented:

Signal	Source	Default Weight	Raw Value Range	Description
CVSS Base Score	`nvd`	0.40	0-10	CVSS v4 preferred, v3.1 fallback. Already on a 0-10 scale.
EPSS Probability	`epss`	0.20	0-10	EPSS probability (0-1) scaled to 0-10.
KEV Boost	`kev`	0.15	0-5	CISA KEV confirmed = 3.0, extended only = 2.0, ransomware confirmed = +2.0 additional.
Exploit Availability	`exploit_signals`	0.15	0-2.5	Takes the highest single signal: Metasploit = 2.5, Nuclei = 1.5, ExploitDB = 1.0, XDB = 0.5.
SIEM Correlation	`siem`	0.05	0-10	Boosts score when SIEM alerts correlate with the CVE on the same asset.
Asset Criticality	`asset_criticality`	0.03	0-10	Scales score based on the asset's criticality tier (critical, high, medium, low).
ThreatIntel Corroboration	`threat_intel`	0.02	0-10	Boosts score when threat intelligence feeds corroborate active exploitation.

Note: Signal weights can be overridden per-source via the scoring configuration in Settings. The source-level priority_weight multiplies the signal's default weight.

Signal Discovery¶

Signals are registered in the internal/correlation/signals/ package. Each signal implements the ScoringSignal interface defined in signal.go. A signal is active if its source slug matches an enabled source, or if it is tagged as builtin.

Interpreting Composite Scores¶

Score Range	Interpretation
8.0 - 10.0	Critical priority -- active exploitation likely, patch immediately.
6.0 - 7.9	High priority -- significant risk, schedule remediation soon.
3.0 - 5.9	Medium priority -- monitor and plan remediation.
0.1 - 2.9	Low priority -- address during regular maintenance.

Triggering Correlation Runs¶

Full Correlation¶

A full run processes every asset and every CVE. This is typically scheduled periodically (e.g. after each crawl cycle) or triggered manually from the system settings.

Incremental Correlation¶

An incremental run processes only a specified list of CVE IDs -- typically the newly ingested CVEs from a crawler run. This is significantly faster and is the default mode after each crawler cycle completes.

API Trigger¶

Correlation runs can be triggered via the system API or from the Settings > System page.

Correlation Result¶

Each run returns an immutable CorrelationResult containing:

Field	Description
`new_matches`	Number of new vulnerability matches created.
`updated`	Number of existing matches that were refreshed.
`resolved`	Number of previously open matches that disappeared (auto-resolved).
`total_processed`	Total candidates after deduplication.
`errors`	List of non-fatal errors encountered during the run.

Asset Inventory -- the asset data that correlation consumes
Vulnerabilities -- the CVE data that correlation matches against
Remediation Queue -- the actionable output of correlation
Dashboard -- last correlation timestamp and match counts