GitHub Repository Scanning¶
rimae/scan scans your GitHub organization to discover repositories, parse dependency manifests, resolve upstream fork origins, run OpenSSF Scorecard assessments, and match dependencies against the vulnerability database.
Connecting a GitHub Organization¶
To enable GitHub scanning:
- Navigate to Settings > Integrations.
- Enter your GitHub organization name and a personal access token (or GitHub App token) with
repoandread:orgscopes. - Save the configuration and trigger an initial org scan.
Warning: The token must have access to all repositories you want to scan, including private repos. Tokens with insufficient permissions will result in partial scan results.
Repository Enumeration¶
The org scanner calls the GitHub REST API to enumerate all repositories in your organization:
- Fetches pages of 100 repos at a time via
/orgs/{org}/repos?type=all. - Archived repos are included but flagged as
archived = true. - Empty repos (no default branch and zero size) are skipped.
- For each repo, the following metadata is recorded: name, default branch, visibility (public/private), primary language, and last push timestamp.
A ScanRun audit record is created for every org scan with timing, record counts, and errors.
Dependency Detection¶
After enumeration, rimae/scan parses dependency manifests from each repository to build a complete bill of materials.
How It Works¶
- The Git tree API (
/repos/{org}/{repo}/git/trees/{branch}?recursive=1) lists all files in the default branch. - Filenames are matched against the registered parser map.
- Matched files are fetched via the GitHub raw content API.
- Each parser extracts dependency name, version, and direct/transitive flag.
- Dependencies are upserted into the
repo_dependenciestable. Stale dependencies (present in the database but no longer in any manifest) are automatically removed.
Supported Lockfile Parsers¶
rimae/scan ships with 15 lockfile parsers covering all major ecosystems:
| Parser | Filename(s) | Ecosystem |
|---|---|---|
| requirements.txt | requirements.txt |
pip |
| Pipfile.lock | Pipfile.lock |
pip |
| poetry.lock | poetry.lock |
pip |
| go.mod | go.mod |
pip |
| package-lock.json | package-lock.json |
npm |
| yarn.lock | yarn.lock |
npm |
| pnpm-lock.yaml | pnpm-lock.yaml |
npm |
| go.mod | go.mod |
go |
| go.sum | go.sum |
go |
| Cargo.toml | Cargo.toml |
cargo |
| Cargo.lock | Cargo.lock |
cargo |
| Gemfile.lock | Gemfile.lock |
rubygems |
| composer.lock | composer.lock |
composer |
| pom.xml | pom.xml |
maven |
| build.gradle | build.gradle |
gradle |
Parsers are registered in the internal/github/ package. Each parser implements the ManifestParser interface and is registered by filename pattern at startup.
Per-Repo Metadata¶
After parsing, each repository record is updated with:
ecosystems_detected-- sorted list of detected ecosystems (e.g.["go", "npm"]).dependency_count-- total number of dependencies across all lockfiles.last_scanned_at-- timestamp of the most recent scan.
OpenSSF Scorecard Integration¶
rimae/scan runs OpenSSF Scorecard assessments against your repositories to evaluate their security posture. Results include:
- Overall score (0-10) stored as
scorecard_score. - Detailed check results stored as
scorecard_detailJSON.
Scorecard data is available in both the repository list (score column) and the detail view (full breakdown).
Upstream URL Resolution¶
Many organisations maintain internal forks of open-source projects. rimae/scan's upstream resolver identifies the original project through 5 ordered strategies:
| Strategy | Confidence | Method |
|---|---|---|
| Git remote | Verified | Checks GitHub API source and parent fields for fork metadata. |
| Package identity | Verified/Inferred | Inspects go.mod module path or package.json repository field. If the declared origin differs from the current org, the upstream is identified. |
| README parsing | Inferred | Scans README files for phrases like "forked from", "based on", or "upstream:" followed by a GitHub URL. |
| Convention files | Inferred | Checks for UPSTREAM, FORK_SOURCE, .upstream, or FORK.md files containing GitHub URLs. |
| Name similarity | Inferred | Fuzzy-matches the repo name against a dictionary of well-known OSS projects (Kubernetes, React, Django, Redis, Grafana, Prometheus, and others). |
Strategies are tried in order; the first successful result is used. Each result includes a confidence level (verified, inferred, or not_found) and the strategy name.
Note: Upstream resolution enables rimae/scan to match vulnerabilities reported against the upstream project to your forked copy, even when the fork has a different name.
Repository Detail View¶
Clicking a repository opens its detail view with the following sections:
| Section | Content |
|---|---|
| Overview | Org, name, default branch, visibility, language, archived status, last push, scan status. |
| Scorecard | OpenSSF Scorecard overall score and per-check breakdown. |
| Upstream | Resolved upstream URL, confidence level, and resolution strategy. |
| Dependencies | Full list of parsed dependencies with ecosystem, package name, installed version, lockfile path, and direct/transitive flag. |
| Vulnerabilities | Matched CVEs with dependency name, installed version, severity, composite score, patch availability, fixed version, and status. |
Vulnerability Counts¶
The repository list view shows per-repo vulnerability counts broken down by severity:
vuln_count_critical(CVSS 9.0+)vuln_count_high(CVSS 7.0-8.9)vuln_count_medium(CVSS 4.0-6.9)vuln_count_low(CVSS 0.1-3.9)
Filtering and Sorting¶
The repository list supports the following filters:
| Filter | Description |
|---|---|
| Ecosystem | Show repos that have dependencies in a specific ecosystem. |
| Has critical | Show only repos with at least one critical vulnerability. |
| Upstream status | Filter by upstream resolution confidence level. |
| Archived | Show or hide archived repositories. |
Sortable columns include repo name, last pushed, last scanned, language, vulnerability counts (by severity), dependency count, and scorecard score.
Triggering Scans¶
Full Org Scan¶
POST /api/github/scan-all enqueues a background task that re-enumerates all repositories, parses manifests, resolves upstreams, and runs vulnerability matching.
Single Repo Scan¶
POST /api/github/repos/{repo_id}/scan enqueues a scan for a single repository. The repo's scan_status is set to scanning until the task completes.
Both endpoints require the analyst role.
Vulnerability Matching for Dependencies¶
Repository dependencies are matched against the same vulnerability database used for asset correlation. The RepoVulnMatch table records:
- Dependency name and installed version.
- CVE ID and severity.
- Composite score.
- Patch availability and fixed version.
- Status (open, in_review, accepted_risk, resolved).
Related Documentation¶
- Vulnerabilities -- the CVE database that dependencies are matched against
- Correlation Engine -- the scoring and version comparison logic shared with asset correlation
- Remediation Queue -- prioritised view of all vulnerability matches