Documentation
Last updated
Find the bugs your team missed. Setup, pipeline, memory, commands — everything you need to get started.
Three minutes from zero to your first automated review.
One click at github.com/apps/argus-eye. Works with orgs and personal accounts. Your repos appear in the dashboard immediately.
Choose which repos Argus watches. Enable all or pick specific ones. You can change this any time.
Bring your own key — OpenAI, Anthropic, or any OpenRouter provider. Your key, your costs, your data stays yours.
Every PR triggers Argus automatically. Inline comments appear with one-click suggestion fixes you can commit straight from GitHub.
Choose a review persona, add custom rules, or let Argus learn your team's patterns over time. It gets sharper with every review.
Every PR runs through a multi-stage pipeline. Each stage can use a different model, configurable per-repo. The sequence typically completes in a couple of minutes.
Classifies each changed file by risk before any tokens are spent. Generated files, lockfiles, and vendored dependencies are skipped.
Gathers cross-file context, blast radius, and relevant memory — so the review understands how your change affects the rest of the system.
Performs focused analysis across multiple angles in parallel — correctness, security, architecture, and regression risk.
Deduplicates and validates findings. Noise and low-signal comments are dropped so only high-confidence issues survive.
Produces a scannable verdict with fix ordering, severity tiers, and diagrams — actionable, not a wall of text.
Posts inline comments to GitHub and updates memory from your feedback — so future reviews get sharper.
Four specialist agents review every file in parallel.
Instead of one pass, Argus deploys four domain specialists per file. Each brings a different lens — and they run concurrently, so it doesn't slow you down.
Logic errors, off-by-ones, nil dereferences, broken invariants, incorrect boolean chains. The specialist that catches what compiles but doesn’t work.
Injection, auth bypass, SSRF, path traversal, leaked credentials, insecure deserialization. Reviews with a pen-tester’s eye.
Dependency direction, API contracts, separation of concerns, blast radius. Flags when a change violates the system’s structural intent.
Uses scenario memory and past review history to detect changes that re-introduce previously fixed bugs or break known invariants.
Deep Review triggers automatically for complex PRs. Enable it globally in Settings → Features. Findings from all four specialists are deduplicated before scoring.
Push again? Argus only reviews what changed.
When you push new commits to an already-reviewed PR, Argus computes the diff since the last review and only analyzes the delta. Previous findings that are still relevant are preserved. Resolved findings are dropped.
Compares HEAD against the last-reviewed SHA. Only new or modified hunks enter the pipeline. Unchanged files are skipped entirely.
Findings from the previous review are carried forward if the relevant code is unchanged. Updated code triggers a fresh review of that file only.
Incremental reviews typically use 30–70% fewer tokens than a full re-review, depending on how much changed between pushes.
Force a full re-review with @argus-eye review --force.
Most review tools see the diff. Argus sees the system.
Before reviewing a single line of code, Argus builds a living model of your codebase that evolves with every review. This is what separates a linter from an engineer.
Argus traces callers, imports, tests, and shared types. When you change a function, Argus already knows who calls it — and what breaks if the contract shifts.
A persistent dependency graph maps every function and class. On each PR, Argus surfaces what downstream code is affected. No more "I didn't realize that module depended on this."
Past bugs, incidents, and edge cases are remembered across team turnover. "The last time this module changed, EU billing broke." Argus remembers so your team doesn't have to.
Every review, every developer reply, every fix builds a living knowledge graph. Patterns that were dismissed stop recurring. Patterns that were confirmed get reinforced.
Argus maintains a world model of your codebase. The more it reviews, the more it understands. Context is not a feature — it is the architecture.
See your codebase as a dependency graph.
The Architecture page renders an interactive dependency graph built from every review. Nodes are files, edges are import/call/type relationships. Four analytical lenses let you see different dimensions of the system.
Colors nodes by cumulative risk score. High-severity findings, frequent changes, and unresolved comments push a file’s risk up.
Highlights files with high fan-in — the modules everything depends on. Breaking these breaks everything.
Surfaces files with the most review activity. Frequent changes + frequent findings = code that needs attention.
Shows tightly coupled file clusters. Files that always change together likely share hidden dependencies.
Fuzzy search across all nodes. Select a result to center and highlight it in the graph.
On first load, the graph auto-zooms to the highest-risk cluster so you see what matters immediately.
Hover any node for a metrics tooltip: risk score, review count, finding breakdown, and last review date.
The graph builds incrementally with each review. An onboarding guide walks new users through the interface on first visit.
Before you merge, Argus imagines what happens.
Given a PR and known scenarios from your codebase history, Argus simulates execution paths and reports what it finds. Confidence scores tell you how certain the system is.
Root cause: No idempotency key on the cancellation path. Two concurrent requests reach the payment provider — first succeeds, second throws. DB update runs for both.
Impact: Double refund issued. Revenue loss proportional to cancellation volume.
Fix: Add mutex or idempotency key. Wrap call + DB write in a transaction.
Root cause: Deleted user IDs are recycled. Infinite TTL cache serves stale data from the previous account holder.
Impact: Data leakage between accounts. Severity scales with user churn.
Result: Idempotency key already present on this path. Retry is safe. No state corruption detected.
Simulation is powered by scenario memory — the richer your review history, the more scenarios Argus can test against. Currently in experimental rollout.
Argus writes the context your PR description forgot.
After reviewing, Argus appends auto-generated Mermaid diagrams and missing context directly to the PR description. Reviewers see the system impact before reading a single line of diff.
Generated from call paths affected by the PR. Shows the request flow through services, middleware, and handlers.
Maps how data transforms as it moves through the changed code. Input → validation → processing → output, with types annotated.
Shows which modules the PR touches and their upstream/downstream relationships. Highlights the blast radius visually.
Diagrams render natively on GitHub. Toggle in Settings → Features → PR Enrichment.
Argus doesn't post a list of findings. It writes you a review the way a senior engineer would — conversational, opinionated, and to the point.
Every review has three layers: the summary, the inline comments, and the feedback loop.
Verdict: Adds 20 utility modules but has critical security and correctness issues that must be fixed before merging.
Critical issues:
src/lib/convert/units.ts:L15 — Hour multiplier is 360,000ms instead of 3,600,000mssrc/lib/filter/predicate.ts:L42 — User input passed directly to RegExp without escapingWarnings:
src/lib/color/grade.ts:L10 — No NaN check before clampingsrc/lib/counter/rolling.ts:L28 — Unbounded bucket array (+4 more)Every inline comment follows a structured format: what the issue is, why it matters, and a one-click suggestion fix when applicable.
What: Two concurrent cancellation requests can both pass the status === "active" check. First succeeds at the payment provider, second throws — but the DB update runs for both.
Why: No lock or idempotency key on this path. The check-then-act window is ~200ms under load. This will cause double refunds in production.
Every Argus comment has approval reactions. Your feedback directly shapes future reviews.
Reinforces the pattern. Argus will catch similar issues with higher confidence in future reviews.
Suppresses the pattern. Argus stores a “dismissed” signal and avoids similar false positives going forward.
Watch reviews happen in real time.
When a review is in progress, the review detail page streams live activity via WebSocket. You see exactly what Argus is doing as it happens.
WebSocket-powered real-time updates. See which file is being reviewed, which specialist is assigned, and comments as they arrive.
Watch findings get scored in real time. Low-confidence findings drop out as scoring completes.
Live token usage and cost counter updates as each pipeline stage completes.
Running timer shows total review duration. Auto-scrolls when you're at the bottom, stops auto-scroll when you scroll up to read.
The timeline is collapsible for long reviews. All activity persists in the review detail page after completion.
Every finding is tagged with one of four severity levels. These drive the quality score and determine what gets posted.
Bugs, security vulnerabilities, data loss risks, or logic errors that will cause failures in production.
Performance issues, error handling gaps, race conditions, or code that works but is fragile.
Readability improvements, style consistency, better naming, or minor refactors.
Well-written code, good patterns, clever solutions, or thorough test coverage worth highlighting.
Every finding is also tagged with a category — the type of issue detected.
Injection vulnerabilities, leaked credentials, unsafe deserialization, SSRF, path traversal.
Off-by-one errors, nil dereferences, broken invariants, incorrect boolean logic, missing edge cases.
N+1 queries, unnecessary allocations, missing caching, O(n²) where O(n) is possible.
Swallowed errors, empty catch blocks, missing error propagation, silent fallbacks.
Unclear naming, complex nesting, missing comments on non-obvious logic, dead code.
Formatting inconsistencies, convention violations, import ordering, naming patterns.
Weak type invariants, stringly-typed APIs, missing generics, poor encapsulation.
Missing edge case tests, brittle assertions, untested error paths, test-only code in production.
Tell Argus what matters to your team. Rules are injected into every review, so every comment reflects your standards — not generic best practices.
Create rules in the dashboard under Rules. Each rule has a category, content, priority, and enabled flag. These apply to all repos in your org.
Add a .argus/rules.md file to your repo. Repo rules override org rules in the same category.
## security
- Always flag hardcoded API keys or secrets
- Check for SQL injection in raw query strings
## performance
- Flag N+1 queries in ORM code
- Warn about unbounded list fetches without pagination
## style
- Enforce camelCase for variables, PascalCase for types
- Require JSDoc on exported functionsAll 4 pipeline stages are independently configurable per-repo from the Settings page. Default model depends on your OpenRouter key. Temperature and MaxTokens are adjustable per stage via sliders.
Supported providers: OpenRouter, OpenAI, Anthropic, Azure OpenAI, GCP Vertex AI, AWS Bedrock, and Zhipu AI. Custom model names are supported — enter any model identifier your provider accepts.
Your keys, your models, your bill. Argus never stores prompts or code on our servers — API calls go straight from our backend to your chosen provider. No hidden costs, no surprises.
sk-...**** only. Full key never sent to the frontend.We never see your code. We never see your keys. Without a key configured, Argus posts a friendly onboarding comment on your first PR linking to Settings.
Bring Your Own Token for Supermemory.
Argus uses Supermemory for RAG-powered memory — storing review patterns, codebase conventions, and scenario history. You can bring your own Supermemory API key for full control over your data.
Without a custom key, Argus uses its shared Supermemory instance. Your data is isolated per-installation regardless.
Not every PR needs the same reviewer. Personas tune the tone, focus, and severity threshold — from a gentle mentor to a zero-mercy auditor. Set a default per-repo or override per-PR.
Balanced across all categories. The standard Argus experience most teams start with.
Treats every PR like a pen test. Injection risks, auth flaws, data exposure, SSRF.
Hunts N+1 queries, memory leaks, O(n²) loops, and missing cache invalidation.
Explains the why behind every comment. Suggests learning resources. Built for growing teams.
Thinks in boundaries. API contracts, separation of concerns, dependency direction.
No free passes. Comments on everything. Maximum coverage, minimum mercy.
Define your own persona with a freeform system prompt. Full control over tone, focus, and severity.
Override per-PR with @argus-eye review --persona strict
@argus-eye review --persona security_auditorArgus supports two trigger modes. Pick per-org, override per-repo.
Auto-review off (default). When a PR opens, Argus posts a Trigger Argus review checkbox comment with an estimated token + cost preview. Reviewers tick the box to run a review on demand. Pushes to an open PR do not post additional comments.
Auto-review on. Every PR opened, pushed, or reopened is reviewed automatically — no checkbox, no preview.
Repo override beats org default. If the repo setting is unset, the org default applies. If both are unset, auto-run is off.
Every review draws from a 10/hour per-repo bucket and a 50/day per-org bucket. Checkbox clicks and --force additionally draw from a tighter 3/hour per-repo force bucket — effectively capping on-demand triggers at 3/hour.
The trigger comment shows changed-file count, diff lines, and a historical average of tokens + USD cost across your last 20 reviews for this repo. USD is omitted when pricing data is unavailable (token-only fallback).
You can always trigger a review by commenting @argus-eye review, regardless of the auto-run setting. Useful if the checkbox comment is missing (webhook redelivery, PR opened before Argus install).
- [ ] Trigger Argus review.issue_comment.edited webhook.argus-eye[bot] (anti-hijack), rate-limits the click, swaps the checkbox for Running Argus review…, and dispatches the review.Dashboard → Settings:
opened. Pushes to an open PR (synchronize) do not repost — use the existing checkbox or @argus-eye review.[ ]→[x] transition triggers a review. Unticking ([x]→[ ]) does nothing, and a running review cannot be cancelled from the checkbox.Talk to Argus directly from any PR. Mention @argus-eye followed by a command and it responds in seconds.
@argus-eye reviewTrigger a full review. Add --force to re-review at the same SHA. Add --persona to switch style for this PR only.
@argus-eye review --force --persona mentor@argus-eye remember <pattern>Teach Argus something new. Saves a pattern to memory for future reviews. Add --org to apply across all repos.
@argus-eye remember --org always check for SQL injection in raw queries@argus-eye resolveScans all unresolved review threads and resolves ones where the referenced file has been updated in the latest push.
@argus-eye resolve@argus-eye fixApplies every suggestion block from the review as a single atomic commit pushed straight to your PR branch.
@argus-eye fix@argus-eye testGenerate a test plan from review findings. Covers unit, edge case, integration, and regression tests.
@argus-eye test@argus-eye test --codeDraft executable test code for findings, matching your project's framework and conventions.
@argus-eye test --code@argus-eye review --persona <name>Review with a specific persona for this PR only. Overrides the repo default.
@argus-eye review --persona strict@argus-eye helpLists all available commands and their usage right in the PR.
@argus-eye helpTurn review findings into tests before you merge.
Argus analyzes its own findings and generates targeted test plans or executable test code. No more “I'll add a test later.”
@argus-eye test generates a structured test plan covering unit tests, edge cases, integration tests, and regression tests — all derived from the review findings on the current PR.
@argus-eye test --code drafts ready-to-run test code that matches your project's testing framework and conventions. Copy, paste, run.
Test generation uses the same review context and memory that powers the review pipeline. The richer the review, the better the tests.
Most tools forget between PRs. Argus remembers everything.
Every review, every developer reaction, every fix and dismissal feeds a growing knowledge base. The system doesn't just review code — it accumulates institutional memory that survives team turnover.
Code conventions auto-learned from your codebase. Error handling styles, naming patterns, architecture decisions — extracted from what your team actually writes, not what a style guide says.
Three sources: auto-extracted from reviews, auto-imported from GitHub Issues labeled argus or bug, and manual via bot command. Each scenario includes steps, initial state, and expected outcome. Scenarios are marked outdated when referenced files change. React 👎 to dismiss.
Every review comment, every developer reply, every approval and dismissal. This is review history as institutional memory. Why was this pattern introduced? Who approved it? What broke last time?
The "event clock" of your codebase. A living record of why things are the way they are — connecting reviews, patterns, scenarios, and code changes into a navigable knowledge graph.
Every review makes the system smarter. Patterns that get approved are reinforced. Patterns that get dismissed are suppressed. Scenarios that match real bugs get higher confidence. Over time, Argus converges on your team's actual standards — not generic rules, but the hard-won knowledge that usually lives only in senior engineers' heads.
Your codebase has a health score now.
The Insights dashboard aggregates everything Argus learns into an operational view of your codebase. Not vanity metrics — actionable risk signals drawn from real review data.
Files most frequently flagged across reviews. These are the parts of your codebase that keep breaking — the modules that need a rewrite or better test coverage.
Per-file and per-module risk scores based on severity history, change frequency, and unresolved findings. Higher risk = higher attention from Argus.
A chronological view of every review, reaction, and pattern learned. See how your codebase quality trends over time — and which decisions shaped it.
Track quality scores across PRs, repos, and teams. Spot regressions before they compound. Know when a refactor is paying off.
Know exactly what every review costs.
Argus records per-stage token usage and cost for every review. Model and provider are tracked independently for each stage. Token data persists even on failed reviews.
Token usage tracked for: triage, review, scoring, synthesis, enrichment, conventions, patterns, file_synthesis, and graph. Each stage records input tokens, output tokens, model, and cost.
Hover any TokenPill in the review detail page to see the full cost breakdown per stage, including model name and provider.
Dark isn't the only option anymore.
Toggle between dark and light themes using the Sun/Moon icon in the sidebar footer. Your preference persists via localStorage and is applied instantly without a page reload.
Click the Sun/Moon icon in the sidebar footer. Dark → Light → Dark. No page refresh required.
On first visit, Argus respects your OS prefers-color-scheme setting. After manual toggle, your choice takes precedence.
All dashboard pages, the architecture graph, code diffs, and marketing pages support both themes. Graph tokens use a warm cream palette in light mode.
Toggle capabilities per-org from the dashboard.
Feature flags let you enable or disable advanced capabilities without code changes. All flags are scoped per-org and take effect on the next review.
Detect linked PRs across repos and run compatibility verification. Adds one extra LLM call per linked PR.
Verify that PR diffs address linked issue acceptance criteria. Works with GitHub’s native issue-linking keywords.
4-specialist parallel review per file. Higher coverage, higher token cost.
Append Mermaid diagrams and missing context to PR descriptions after review.
Auto-extract reusable code patterns from high-confidence findings.
Extract naming, error handling, and architecture conventions from diffs.
Build and maintain a persistent dependency graph from code changes.
Manage flags in Settings → Features. Changes apply to the next review triggered on any repo in the org.
Every advanced capability can be toggled independently per-repo. Start with the defaults and enable features as your team is ready.
Review every PR automatically. When off, Argus posts a Trigger checkbox on opened PRs with a token/cost preview — reviewers tick to run on demand.
Enables the 4-specialist parallel review (bug_hunter, security, architecture, regression) per file.
Enables dependency tracing and caller analysis across your codebase during review.
Maps downstream impact of every change using the persistent dependency graph.
Simulates execution paths against known scenarios. Reports confidence, root cause, and impact.
Auto-enriches PR descriptions with missing context and mermaid diagrams.
Learns reusable patterns from high-confidence findings across reviews.
Extracts codebase conventions from diffs — naming, error handling, architecture patterns.
Creates per-file institutional memory — summaries of what each file does and how it has changed.
Extracts dependency graph from code changes. Powers blast radius analysis and cross-file context.
All toggles are accessible from Settings in the dashboard. Changes take effect on the next review.