Skip to content

GitHub Repository Scanning

rimae/scan scans your GitHub organization to discover repositories, parse dependency manifests, resolve upstream fork origins, run OpenSSF Scorecard assessments, and match dependencies against the vulnerability database.

Connecting a GitHub Organization

To enable GitHub scanning:

  1. Navigate to Settings > Integrations.
  2. Enter your GitHub organization name and a personal access token (or GitHub App token) with repo and read:org scopes.
  3. Save the configuration and trigger an initial org scan.

Warning: The token must have access to all repositories you want to scan, including private repos. Tokens with insufficient permissions will result in partial scan results.

Repository Enumeration

The org scanner calls the GitHub REST API to enumerate all repositories in your organization:

  • Fetches pages of 100 repos at a time via /orgs/{org}/repos?type=all.
  • Archived repos are included but flagged as archived = true.
  • Empty repos (no default branch and zero size) are skipped.
  • For each repo, the following metadata is recorded: name, default branch, visibility (public/private), primary language, and last push timestamp.

A ScanRun audit record is created for every org scan with timing, record counts, and errors.

Dependency Detection

After enumeration, rimae/scan parses dependency manifests from each repository to build a complete bill of materials.

How It Works

  1. The Git tree API (/repos/{org}/{repo}/git/trees/{branch}?recursive=1) lists all files in the default branch.
  2. Filenames are matched against the registered parser map.
  3. Matched files are fetched via the GitHub raw content API.
  4. Each parser extracts dependency name, version, and direct/transitive flag.
  5. Dependencies are upserted into the repo_dependencies table. Stale dependencies (present in the database but no longer in any manifest) are automatically removed.

Supported Lockfile Parsers

rimae/scan ships with 15 lockfile parsers covering all major ecosystems:

Parser Filename(s) Ecosystem
requirements.txt requirements.txt pip
Pipfile.lock Pipfile.lock pip
poetry.lock poetry.lock pip
go.mod go.mod pip
package-lock.json package-lock.json npm
yarn.lock yarn.lock npm
pnpm-lock.yaml pnpm-lock.yaml npm
go.mod go.mod go
go.sum go.sum go
Cargo.toml Cargo.toml cargo
Cargo.lock Cargo.lock cargo
Gemfile.lock Gemfile.lock rubygems
composer.lock composer.lock composer
pom.xml pom.xml maven
build.gradle build.gradle gradle

Parsers are registered in the internal/github/ package. Each parser implements the ManifestParser interface and is registered by filename pattern at startup.

Per-Repo Metadata

After parsing, each repository record is updated with:

  • ecosystems_detected -- sorted list of detected ecosystems (e.g. ["go", "npm"]).
  • dependency_count -- total number of dependencies across all lockfiles.
  • last_scanned_at -- timestamp of the most recent scan.

OpenSSF Scorecard Integration

rimae/scan runs OpenSSF Scorecard assessments against your repositories to evaluate their security posture. Results include:

  • Overall score (0-10) stored as scorecard_score.
  • Detailed check results stored as scorecard_detail JSON.

Scorecard data is available in both the repository list (score column) and the detail view (full breakdown).

Upstream URL Resolution

Many organisations maintain internal forks of open-source projects. rimae/scan's upstream resolver identifies the original project through 5 ordered strategies:

Strategy Confidence Method
Git remote Verified Checks GitHub API source and parent fields for fork metadata.
Package identity Verified/Inferred Inspects go.mod module path or package.json repository field. If the declared origin differs from the current org, the upstream is identified.
README parsing Inferred Scans README files for phrases like "forked from", "based on", or "upstream:" followed by a GitHub URL.
Convention files Inferred Checks for UPSTREAM, FORK_SOURCE, .upstream, or FORK.md files containing GitHub URLs.
Name similarity Inferred Fuzzy-matches the repo name against a dictionary of well-known OSS projects (Kubernetes, React, Django, Redis, Grafana, Prometheus, and others).

Strategies are tried in order; the first successful result is used. Each result includes a confidence level (verified, inferred, or not_found) and the strategy name.

Note: Upstream resolution enables rimae/scan to match vulnerabilities reported against the upstream project to your forked copy, even when the fork has a different name.

Repository Detail View

Clicking a repository opens its detail view with the following sections:

Section Content
Overview Org, name, default branch, visibility, language, archived status, last push, scan status.
Scorecard OpenSSF Scorecard overall score and per-check breakdown.
Upstream Resolved upstream URL, confidence level, and resolution strategy.
Dependencies Full list of parsed dependencies with ecosystem, package name, installed version, lockfile path, and direct/transitive flag.
Vulnerabilities Matched CVEs with dependency name, installed version, severity, composite score, patch availability, fixed version, and status.

Vulnerability Counts

The repository list view shows per-repo vulnerability counts broken down by severity:

  • vuln_count_critical (CVSS 9.0+)
  • vuln_count_high (CVSS 7.0-8.9)
  • vuln_count_medium (CVSS 4.0-6.9)
  • vuln_count_low (CVSS 0.1-3.9)

Filtering and Sorting

The repository list supports the following filters:

Filter Description
Ecosystem Show repos that have dependencies in a specific ecosystem.
Has critical Show only repos with at least one critical vulnerability.
Upstream status Filter by upstream resolution confidence level.
Archived Show or hide archived repositories.

Sortable columns include repo name, last pushed, last scanned, language, vulnerability counts (by severity), dependency count, and scorecard score.

Triggering Scans

Full Org Scan

POST /api/github/scan-all enqueues a background task that re-enumerates all repositories, parses manifests, resolves upstreams, and runs vulnerability matching.

Single Repo Scan

POST /api/github/repos/{repo_id}/scan enqueues a scan for a single repository. The repo's scan_status is set to scanning until the task completes.

Both endpoints require the analyst role.

Vulnerability Matching for Dependencies

Repository dependencies are matched against the same vulnerability database used for asset correlation. The RepoVulnMatch table records:

  • Dependency name and installed version.
  • CVE ID and severity.
  • Composite score.
  • Patch availability and fixed version.
  • Status (open, in_review, accepted_risk, resolved).