Argus
  • Features
  • Memory
  • Pricing
  • Compare
  • Changelog
  • Docs

Documentation

Argus Documentation

Last updated April 17, 2026

Find the bugs your team missed. Setup, pipeline, memory, commands — everything you need to get started.

On this page

Getting StartedThe Review PipelineDeep ReviewIncremental ReviewsWhat Argus SeesArchitecture VisualizationCode SimulationPR Enrichment & DiagramsConversational ReviewLive Activity TimelineSeveritiesCategoriesReview RulesModel ConfigAPI Keys (BYOK)BYOT SupermemoryReview PersonasAuto-review & TriggersBot CommandsTest GenerationMemory & LearningInsights & RiskToken & Cost TrackingLight ModeFeature FlagsSettings & Controls

Getting Started

Three minutes from zero to your first automated review.

1

Install the GitHub App

One click at github.com/apps/argus-eye. Works with orgs and personal accounts. Your repos appear in the dashboard immediately.

2

Select repositories

Choose which repos Argus watches. Enable all or pick specific ones. You can change this any time.

3

Add your API key

Bring your own key — OpenAI, Anthropic, or any OpenRouter provider. Your key, your costs, your data stays yours.

4

Open a pull request

Every PR triggers Argus automatically. Inline comments appear with one-click suggestion fixes you can commit straight from GitHub.

5

Teach it your standards

Choose a review persona, add custom rules, or let Argus learn your team's patterns over time. It gets sharper with every review.

The Review Pipeline

Every PR runs through a multi-stage pipeline. Each stage can use a different model, configurable per-repo. The sequence typically completes in a couple of minutes.

01Triage

Classifies each changed file by risk before any tokens are spent. Generated files, lockfiles, and vendored dependencies are skipped.

02Context

Gathers cross-file context, blast radius, and relevant memory — so the review understands how your change affects the rest of the system.

03Review

Performs focused analysis across multiple angles in parallel — correctness, security, architecture, and regression risk.

04Refine

Deduplicates and validates findings. Noise and low-signal comments are dropped so only high-confidence issues survive.

05Synthesize

Produces a scannable verdict with fix ordering, severity tiers, and diagrams — actionable, not a wall of text.

06Post & Learn

Posts inline comments to GitHub and updates memory from your feedback — so future reviews get sharper.

Deep Review

Four specialist agents review every file in parallel.

Instead of one pass, Argus deploys four domain specialists per file. Each brings a different lens — and they run concurrently, so it doesn't slow you down.

bug_hunter

Logic errors, off-by-ones, nil dereferences, broken invariants, incorrect boolean chains. The specialist that catches what compiles but doesn’t work.

security

Injection, auth bypass, SSRF, path traversal, leaked credentials, insecure deserialization. Reviews with a pen-tester’s eye.

architecture

Dependency direction, API contracts, separation of concerns, blast radius. Flags when a change violates the system’s structural intent.

regression

Uses scenario memory and past review history to detect changes that re-introduce previously fixed bugs or break known invariants.

Deep Review triggers automatically for complex PRs. Enable it globally in Settings → Features. Findings from all four specialists are deduplicated before scoring.

Incremental Reviews

Push again? Argus only reviews what changed.

When you push new commits to an already-reviewed PR, Argus computes the diff since the last review and only analyzes the delta. Previous findings that are still relevant are preserved. Resolved findings are dropped.

Delta detection

Compares HEAD against the last-reviewed SHA. Only new or modified hunks enter the pipeline. Unchanged files are skipped entirely.

Finding lifecycle

Findings from the previous review are carried forward if the relevant code is unchanged. Updated code triggers a fresh review of that file only.

Cost reduction

Incremental reviews typically use 30–70% fewer tokens than a full re-review, depending on how much changed between pushes.

Force a full re-review with @argus-eye review --force.

What Argus Sees

Most review tools see the diff. Argus sees the system.

Before reviewing a single line of code, Argus builds a living model of your codebase that evolves with every review. This is what separates a linter from an engineer.

Cross-file context

Argus traces callers, imports, tests, and shared types. When you change a function, Argus already knows who calls it — and what breaks if the contract shifts.

Blast radius

A persistent dependency graph maps every function and class. On each PR, Argus surfaces what downstream code is affected. No more "I didn't realize that module depended on this."

Scenario memory

Past bugs, incidents, and edge cases are remembered across team turnover. "The last time this module changed, EU billing broke." Argus remembers so your team doesn't have to.

Decision traces

Every review, every developer reply, every fix builds a living knowledge graph. Patterns that were dismissed stop recurring. Patterns that were confirmed get reinforced.

Argus maintains a world model of your codebase. The more it reviews, the more it understands. Context is not a feature — it is the architecture.

Architecture Visualization

See your codebase as a dependency graph.

The Architecture page renders an interactive dependency graph built from every review. Nodes are files, edges are import/call/type relationships. Four analytical lenses let you see different dimensions of the system.

Risk

Colors nodes by cumulative risk score. High-severity findings, frequent changes, and unresolved comments push a file’s risk up.

Choke Points

Highlights files with high fan-in — the modules everything depends on. Breaking these breaks everything.

Hotspots

Surfaces files with the most review activity. Frequent changes + frequent findings = code that needs attention.

Coupling

Shows tightly coupled file clusters. Files that always change together likely share hidden dependencies.

Navigation

File search

Fuzzy search across all nodes. Select a result to center and highlight it in the graph.

Smart zoom

On first load, the graph auto-zooms to the highest-risk cluster so you see what matters immediately.

Node hover

Hover any node for a metrics tooltip: risk score, review count, finding breakdown, and last review date.

The graph builds incrementally with each review. An onboarding guide walks new users through the interface on first visit.

Code Simulation

Before you merge, Argus imagines what happens.

Given a PR and known scenarios from your codebase history, Argus simulates execution paths and reports what it finds. Confidence scores tell you how certain the system is.

argus — simulation output
failsScenario: Concurrent subscription cancellationconfidence 94%

Root cause: No idempotency key on the cancellation path. Two concurrent requests reach the payment provider — first succeeds, second throws. DB update runs for both.

Impact: Double refund issued. Revenue loss proportional to cancellation volume.

Fix: Add mutex or idempotency key. Wrap call + DB write in a transaction.

degradesScenario: Cache key collision under ID reuseconfidence 78%

Root cause: Deleted user IDs are recycled. Infinite TTL cache serves stale data from the previous account holder.

Impact: Data leakage between accounts. Severity scales with user churn.

passesScenario: Webhook retry under network partitionconfidence 91%

Result: Idempotency key already present on this path. Retry is safe. No state corruption detected.

Simulation is powered by scenario memory — the richer your review history, the more scenarios Argus can test against. Currently in experimental rollout.

PR Enrichment & Mermaid Diagrams

Argus writes the context your PR description forgot.

After reviewing, Argus appends auto-generated Mermaid diagrams and missing context directly to the PR description. Reviewers see the system impact before reading a single line of diff.

Sequence diagrams

Generated from call paths affected by the PR. Shows the request flow through services, middleware, and handlers.

Data flow diagrams

Maps how data transforms as it moves through the changed code. Input → validation → processing → output, with types annotated.

Dependency diagrams

Shows which modules the PR touches and their upstream/downstream relationships. Highlights the blast radius visually.

Diagrams render natively on GitHub. Toggle in Settings → Features → PR Enrichment.

The Conversational Review

Argus doesn't post a list of findings. It writes you a review the way a senior engineer would — conversational, opinionated, and to the point.

Every review has three layers: the summary, the inline comments, and the feedback loop.

The summary

argus — review summary

Verdict: Adds 20 utility modules but has critical security and correctness issues that must be fixed before merging.

Critical issues:

  • src/lib/convert/units.ts:L15 — Hour multiplier is 360,000ms instead of 3,600,000ms
  • src/lib/filter/predicate.ts:L42 — User input passed directly to RegExp without escaping

Warnings:

  • src/lib/color/grade.ts:L10 — No NaN check before clamping
  • src/lib/counter/rolling.ts:L28 — Unbounded bucket array (+4 more)
2 critical · 2 warnings · 4 suggestions

Inline comments

Every inline comment follows a structured format: what the issue is, why it matters, and a one-click suggestion fix when applicable.

argus — inline comment
criticalbug

What: Two concurrent cancellation requests can both pass the status === "active" check. First succeeds at the payment provider, second throws — but the DB update runs for both.

Why: No lock or idempotency key on this path. The check-then-act window is ~200ms under load. This will cause double refunds in production.

The feedback loop

Every Argus comment has approval reactions. Your feedback directly shapes future reviews.

Approve

Reinforces the pattern. Argus will catch similar issues with higher confidence in future reviews.

Dismiss

Suppresses the pattern. Argus stores a “dismissed” signal and avoids similar false positives going forward.

Live Activity Timeline

Watch reviews happen in real time.

When a review is in progress, the review detail page streams live activity via WebSocket. You see exactly what Argus is doing as it happens.

Live streaming

WebSocket-powered real-time updates. See which file is being reviewed, which specialist is assigned, and comments as they arrive.

Scoring results

Watch findings get scored in real time. Low-confidence findings drop out as scoring completes.

Token & cost counter

Live token usage and cost counter updates as each pipeline stage completes.

Elapsed timer

Running timer shows total review duration. Auto-scrolls when you're at the bottom, stops auto-scroll when you scroll up to read.

The timeline is collapsible for long reviews. All activity persists in the review detail page after completion.

Severities

Every finding is tagged with one of four severity levels. These drive the quality score and determine what gets posted.

critical

Bugs, security vulnerabilities, data loss risks, or logic errors that will cause failures in production.

warning

Performance issues, error handling gaps, race conditions, or code that works but is fragile.

suggestion

Readability improvements, style consistency, better naming, or minor refactors.

praise

Well-written code, good patterns, clever solutions, or thorough test coverage worth highlighting.

Categories

Every finding is also tagged with a category — the type of issue detected.

security

Injection vulnerabilities, leaked credentials, unsafe deserialization, SSRF, path traversal.

bug

Off-by-one errors, nil dereferences, broken invariants, incorrect boolean logic, missing edge cases.

performance

N+1 queries, unnecessary allocations, missing caching, O(n²) where O(n) is possible.

error_handling

Swallowed errors, empty catch blocks, missing error propagation, silent fallbacks.

readability

Unclear naming, complex nesting, missing comments on non-obvious logic, dead code.

style

Formatting inconsistencies, convention violations, import ordering, naming patterns.

type_design

Weak type invariants, stringly-typed APIs, missing generics, poor encapsulation.

testing

Missing edge case tests, brittle assertions, untested error paths, test-only code in production.

Review Rules

Tell Argus what matters to your team. Rules are injected into every review, so every comment reflects your standards — not generic best practices.

Org-level rules

Create rules in the dashboard under Rules. Each rule has a category, content, priority, and enabled flag. These apply to all repos in your org.

Repo-level rules

Add a .argus/rules.md file to your repo. Repo rules override org rules in the same category.

## security
- Always flag hardcoded API keys or secrets
- Check for SQL injection in raw query strings

## performance
- Flag N+1 queries in ORM code
- Warn about unbounded list fetches without pagination

## style
- Enforce camelCase for variables, PascalCase for types
- Require JSDoc on exported functions

Model Configuration

All 4 pipeline stages are independently configurable per-repo from the Settings page. Default model depends on your OpenRouter key. Temperature and MaxTokens are adjustable per stage via sliders.

Stage
Default Model
Max Tokens
Temperature
triage
configurable
configurable
configurable
review
configurable
configurable
configurable
scoring
configurable
configurable
configurable
synthesis
configurable
configurable
configurable

Supported providers: OpenRouter, OpenAI, Anthropic, Azure OpenAI, GCP Vertex AI, AWS Bedrock, and Zhipu AI. Custom model names are supported — enter any model identifier your provider accepts.

API Keys (BYOK)

Your keys, your models, your bill. Argus never stores prompts or code on our servers — API calls go straight from our backend to your chosen provider. No hidden costs, no surprises.

Setup
  1. Go to Settings in the dashboard
  2. Select a repo and choose a provider (OpenAI, Anthropic, etc.)
  3. Enter your API key — it's encrypted at rest
  4. Pick a model for each pipeline stage (triage, review, scoring, synthesis)
Security
  • Strong encryption at rest — keys never persist in plaintext.
  • In-memory only — decrypted for API calls, then discarded. Never logged or cached.
  • Workspace-isolated — no other workspace can access your keys.
  • Masked — dashboard shows sk-...**** only. Full key never sent to the frontend.

We never see your code. We never see your keys. Without a key configured, Argus posts a friendly onboarding comment on your first PR linking to Settings.

BYOT Supermemory

Bring Your Own Token for Supermemory.

Argus uses Supermemory for RAG-powered memory — storing review patterns, codebase conventions, and scenario history. You can bring your own Supermemory API key for full control over your data.

Setup
  1. Go to Integrations in the dashboard
  2. Enter your Supermemory API key under the Supermemory section
  3. Key is scoped per-org — all repos in the org share the same memory backend

Without a custom key, Argus uses its shared Supermemory instance. Your data is isolated per-installation regardless.

Review Personas

Not every PR needs the same reviewer. Personas tune the tone, focus, and severity threshold — from a gentle mentor to a zero-mercy auditor. Set a default per-repo or override per-PR.

default

Balanced across all categories. The standard Argus experience most teams start with.

security_auditor

Treats every PR like a pen test. Injection risks, auth flaws, data exposure, SSRF.

performance_engineer

Hunts N+1 queries, memory leaks, O(n²) loops, and missing cache invalidation.

mentor

Explains the why behind every comment. Suggests learning resources. Built for growing teams.

architect

Thinks in boundaries. API contracts, separation of concerns, dependency direction.

strict

No free passes. Comments on everything. Maximum coverage, minimum mercy.

custom

Define your own persona with a freeform system prompt. Full control over tone, focus, and severity.

Per-PR override

Override per-PR with @argus-eye review --persona strict

@argus-eye review --persona security_auditor

Auto-review & Triggers

Argus supports two trigger modes. Pick per-org, override per-repo.

Auto-review off (default). When a PR opens, Argus posts a Trigger Argus review checkbox comment with an estimated token + cost preview. Reviewers tick the box to run a review on demand. Pushes to an open PR do not post additional comments.

Auto-review on. Every PR opened, pushed, or reopened is reviewed automatically — no checkbox, no preview.

Precedence

Repo override beats org default. If the repo setting is unset, the org default applies. If both are unset, auto-run is off.

Rate limits

Every review draws from a 10/hour per-repo bucket and a 50/day per-org bucket. Checkbox clicks and --force additionally draw from a tighter 3/hour per-repo force bucket — effectively capping on-demand triggers at 3/hour.

Cost preview

The trigger comment shows changed-file count, diff lines, and a historical average of tokens + USD cost across your last 20 reviews for this repo. USD is omitted when pricing data is unavailable (token-only fallback).

Fallback command

You can always trigger a review by commenting @argus-eye review, regardless of the auto-run setting. Useful if the checkbox comment is missing (webhook redelivery, PR opened before Argus install).

How the checkbox works

  1. PR opens → Argus posts a single comment with cost preview and - [ ] Trigger Argus review.
  2. A user with triage-level access ticks the box. GitHub fires an issue_comment.edited webhook.
  3. Argus verifies the comment author is argus-eye[bot] (anti-hijack), rate-limits the click, swaps the checkbox for Running Argus review…, and dispatches the review.
  4. If the pipeline errors, the checkbox is restored with a retry hint. Tick again to run.

Where to toggle

Dashboard → Settings:

  • Org Defaults tab → Auto-review card for the org-wide default.
  • Repo Overrides tab → Auto-review card for a per-repo override.
Gotchas
  • Trigger comments are posted only on opened. Pushes to an open PR (synchronize) do not repost — use the existing checkbox or @argus-eye review.
  • Ticking the box on anyone else's comment that mimics our format is ignored — only comments authored by Argus trigger reviews.
  • Only the [ ]→[x] transition triggers a review. Unticking ([x]→[ ]) does nothing, and a running review cannot be cancelled from the checkbox.

Bot Commands

Talk to Argus directly from any PR. Mention @argus-eye followed by a command and it responds in seconds.

@argus-eye review

Trigger a full review. Add --force to re-review at the same SHA. Add --persona to switch style for this PR only.

@argus-eye review --force --persona mentor
@argus-eye remember <pattern>

Teach Argus something new. Saves a pattern to memory for future reviews. Add --org to apply across all repos.

@argus-eye remember --org always check for SQL injection in raw queries
@argus-eye resolve

Scans all unresolved review threads and resolves ones where the referenced file has been updated in the latest push.

@argus-eye resolve
@argus-eye fix

Applies every suggestion block from the review as a single atomic commit pushed straight to your PR branch.

@argus-eye fix
@argus-eye test

Generate a test plan from review findings. Covers unit, edge case, integration, and regression tests.

@argus-eye test
@argus-eye test --code

Draft executable test code for findings, matching your project's framework and conventions.

@argus-eye test --code
@argus-eye review --persona <name>

Review with a specific persona for this PR only. Overrides the repo default.

@argus-eye review --persona strict
@argus-eye help

Lists all available commands and their usage right in the PR.

@argus-eye help

Test Generation

Turn review findings into tests before you merge.

Argus analyzes its own findings and generates targeted test plans or executable test code. No more “I'll add a test later.”

Test plan

@argus-eye test generates a structured test plan covering unit tests, edge cases, integration tests, and regression tests — all derived from the review findings on the current PR.

Executable test code

@argus-eye test --code drafts ready-to-run test code that matches your project's testing framework and conventions. Copy, paste, run.

Test generation uses the same review context and memory that powers the review pipeline. The richer the review, the better the tests.

Memory & Learning

Most tools forget between PRs. Argus remembers everything.

Every review, every developer reaction, every fix and dismissal feeds a growing knowledge base. The system doesn't just review code — it accumulates institutional memory that survives team turnover.

Patterns

Code conventions auto-learned from your codebase. Error handling styles, naming patterns, architecture decisions — extracted from what your team actually writes, not what a style guide says.

Scenarios

Three sources: auto-extracted from reviews, auto-imported from GitHub Issues labeled argus or bug, and manual via bot command. Each scenario includes steps, initial state, and expected outcome. Scenarios are marked outdated when referenced files change. React 👎 to dismiss.

Decision traces

Every review comment, every developer reply, every approval and dismissal. This is review history as institutional memory. Why was this pattern introduced? Who approved it? What broke last time?

Context graph

The "event clock" of your codebase. A living record of why things are the way they are — connecting reviews, patterns, scenarios, and code changes into a navigable knowledge graph.

The flywheel

Every review makes the system smarter. Patterns that get approved are reinforced. Patterns that get dismissed are suppressed. Scenarios that match real bugs get higher confidence. Over time, Argus converges on your team's actual standards — not generic rules, but the hard-won knowledge that usually lives only in senior engineers' heads.

Insights & Risk

Your codebase has a health score now.

The Insights dashboard aggregates everything Argus learns into an operational view of your codebase. Not vanity metrics — actionable risk signals drawn from real review data.

Hot files

Files most frequently flagged across reviews. These are the parts of your codebase that keep breaking — the modules that need a rewrite or better test coverage.

Risk scores

Per-file and per-module risk scores based on severity history, change frequency, and unresolved findings. Higher risk = higher attention from Argus.

Decision trace timeline

A chronological view of every review, reaction, and pattern learned. See how your codebase quality trends over time — and which decisions shaped it.

Quality trends

Track quality scores across PRs, repos, and teams. Spot regressions before they compound. Know when a refactor is paying off.

Token & Cost Tracking

Know exactly what every review costs.

Argus records per-stage token usage and cost for every review. Model and provider are tracked independently for each stage. Token data persists even on failed reviews.

Per-stage breakdown

Token usage tracked for: triage, review, scoring, synthesis, enrichment, conventions, patterns, file_synthesis, and graph. Each stage records input tokens, output tokens, model, and cost.

TokenPill

Hover any TokenPill in the review detail page to see the full cost breakdown per stage, including model name and provider.

Light Mode

Dark isn't the only option anymore.

Toggle between dark and light themes using the Sun/Moon icon in the sidebar footer. Your preference persists via localStorage and is applied instantly without a page reload.

Toggle

Click the Sun/Moon icon in the sidebar footer. Dark → Light → Dark. No page refresh required.

System preference

On first visit, Argus respects your OS prefers-color-scheme setting. After manual toggle, your choice takes precedence.

Full coverage

All dashboard pages, the architecture graph, code diffs, and marketing pages support both themes. Graph tokens use a warm cream palette in light mode.

Feature Flags

Toggle capabilities per-org from the dashboard.

Feature flags let you enable or disable advanced capabilities without code changes. All flags are scoped per-org and take effect on the next review.

Cross-PR Checksoff

Detect linked PRs across repos and run compatibility verification. Adds one extra LLM call per linked PR.

Issue Acceptanceon by default

Verify that PR diffs address linked issue acceptance criteria. Works with GitHub’s native issue-linking keywords.

Deep Reviewoff

4-specialist parallel review per file. Higher coverage, higher token cost.

PR Enrichmenton by default

Append Mermaid diagrams and missing context to PR descriptions after review.

Pattern Learningon by default

Auto-extract reusable code patterns from high-confidence findings.

Convention Learningon by default

Extract naming, error handling, and architecture conventions from diffs.

Architecture Graphon by default

Build and maintain a persistent dependency graph from code changes.

Manage flags in Settings → Features. Changes apply to the next review triggered on any repo in the org.

Settings & Controls

Every advanced capability can be toggled independently per-repo. Start with the defaults and enable features as your team is ready.

Auto-reviewoff

Review every PR automatically. When off, Argus posts a Trigger checkbox on opened PRs with a token/cost preview — reviewers tick to run on demand.

Deep Reviewoff

Enables the 4-specialist parallel review (bug_hunter, security, architecture, regression) per file.

Cross-File Contexton by default

Enables dependency tracing and caller analysis across your codebase during review.

Blast Radius Analysison by default

Maps downstream impact of every change using the persistent dependency graph.

Simulation & Scenariosoff

Simulates execution paths against known scenarios. Reports confidence, root cause, and impact.

PR Enrichmenton by default

Auto-enriches PR descriptions with missing context and mermaid diagrams.

Pattern Learningon by default

Learns reusable patterns from high-confidence findings across reviews.

Convention Learningon by default

Extracts codebase conventions from diffs — naming, error handling, architecture patterns.

File Synthesison by default

Creates per-file institutional memory — summaries of what each file does and how it has changed.

Architecture Graphon by default

Extracts dependency graph from code changes. Powers blast radius analysis and cross-file context.

All toggles are accessible from Settings in the dashboard. Changes take effect on the next review.