Your AGENTS.md File is Lying to You

Your AGENTS.md file is making your coding agents dumber.

There, I said it. That repository context file you spent hours crafting? The one with detailed architecture diagrams, contribution guidelines, and deployment procedures? It’s actively hurting your agent’s performance.

And you’re paying 20% more in tokens for the privilege.

The Paper Nobody Wants to Talk About

On February 12th, 2026, a paper dropped on arXiv that should have sent shockwaves through the AI engineering community. Instead, it’s been quietly debated in Hacker News threads while vendors continue pushing the exact opposite advice.

The paper: “Evaluating AGENTS.md: Are Repository-Level Context Files Helpful for Coding Agents?”

The findings are brutal:

LLM-generated AGENTS.md files DECREASE performance by 3%
Human-written AGENTS.md files only improve performance by 4%
Token costs increase by 20%+ across all tasks
Context pollution from unnecessary requirements makes tasks harder

Let that sink in. The industry-standard practice of adding AGENTS.md, CLAUDE.md, or .cursor/rules files to your repository is, in most cases, making your agents worse at their jobs.

The Context Pollution Problem

Here’s what’s actually happening when you feed your agent a 2,000-line AGENTS.md file:

Task: Fix the authentication bug in login.ts

Agent receives:
- Your actual codebase: ~50 files, 8,000 lines
- AGENTS.md: 2,000 lines of "context"
  - 400 lines: Architecture overview (irrelevant to auth bug)
  - 300 lines: Deployment procedures (irrelevant)
  - 500 lines: Coding standards (most irrelevant)
  - 200 lines: Team contact info (completely irrelevant)
  - 600 lines: Historical decisions from 2024 (outdated)

Total context: 10,000 lines
Relevant context: ~50 lines
Signal-to-noise ratio: 0.5%

The paper authors put it diplomatically: “unnecessary requirements from context files make tasks harder.”

Translation: Your agent is drowning in noise while trying to find the signal.

The 4% Lie

Defenders of AGENTS.md files will point to the 4% improvement from human-written files and say “See? It works!”

This is the most dangerous statistic in the paper.

Yes, human-written context files provide a 4% boost. But here’s what they don’t tell you:

That 4% comes at a 20% token cost increase - you’re paying a 5x premium for marginal gains
The 4% average hides massive variance - some tasks improve 15%, others degrade 10%
Most AGENTS.md files aren’t human-written - they’re LLM-generated, which decreases performance by 3%

A senior engineer on Hacker News put it perfectly:

“The 4% gain is ‘yuuuge’ in hard projects, but only if your AGENTS.md is actually good. Most aren’t. Most are outdated documentation dumps that confuse the agent more than they help.”

When AGENTS.md Files Actually Work

The paper reveals something crucial: AGENTS.md files only help when they describe minimal requirements.

Good AGENTS.md (150 lines):

# Project Constraints
- Node 22.x only
- PostgreSQL 16, never upgrade without migration review
- Auth: JWT with RS256, refresh tokens rotate every 7 days
- All API responses must include X-Request-ID

Bad AGENTS.md (2,000 lines):

# Welcome to Our Amazing Project!
## History
In 2023, we started this project because...
## Architecture Overview
Our microservices communicate via...
[continues for 40 more sections]

The difference? The first file tells the agent what constraints matter. The second file tells the agent everything except what matters.

The Token Economics

Let’s talk money, because this is where AGENTS.md files become indefensible.

Assume you’re using Claude Sonnet 4.6 (released Feb 17, 2026):

Input tokens: $0.15 per million
Your AGENTS.md: 2,000 lines ≈ 3,000 tokens
Daily agent tasks: 50
Daily wasted tokens: 150,000
Monthly wasted tokens: 4.5 million
Monthly cost for AGENTS.md alone: $0.68

That doesn’t sound like much until you scale:

Team of 10 engineers: $6.80/month
Company with 100 developers: $68/month
Enterprise with 1,000 developers: $680/month

For what? A 3% performance decrease?

And that’s just the direct cost. The indirect cost—agents taking longer to complete tasks, making more mistakes, requiring human intervention—is 10x higher.

The Real Problem: We’re Treating Agents Like Humans

Here’s the fundamental mistake: we’re writing AGENTS.md files as if agents are new team members who need onboarding.

Agents aren’t humans.

A human needs context about:

Team culture
Historical decisions
Deployment procedures
Who to contact for what

An agent needs:

The specific files to modify
The constraints that affect the task
The test suite to run

Everything else is noise.

The paper authors note: “human-written context files should describe only minimal requirements.”

Not “minimal requirements plus everything else we think might be useful.” Just the requirements.

What to Do Instead

1. Delete Your AGENTS.md File

Start here. Just delete it. Watch what happens.

Your agents will:

Complete tasks faster (less context to process)
Make fewer mistakes (less conflicting information)
Cost less (fewer tokens)

2. Use Task-Specific Context

Instead of a monolithic AGENTS.md, provide context per task:

# Bad: Agent reads entire AGENTS.md
claude-code "Fix the auth bug"

# Good: Agent gets only relevant constraints
claude-code "Fix the auth bug. Constraints: JWT RS256, refresh tokens rotate every 7 days, see /lib/auth/token.ts"

3. Implement Context Retrieval

Build a simple retrieval system:

# context_retriever.py
def get_task_context(task: str, repo_path: str) -> str:
    """Extract only relevant constraints for this task."""
    
    # Index your constraints by keyword
    constraint_index = {
        'auth': 'JWT RS256, refresh tokens rotate every 7 days',
        'database': 'PostgreSQL 16, migrations required',
        'api': 'All responses include X-Request-ID',
    }
    
    # Return only matching constraints
    relevant = []
    for keyword, constraint in constraint_index.items():
        if keyword in task.lower():
            relevant.append(constraint)
    
    return '\n'.join(relevant) if relevant else 'No specific constraints'

Now your agent gets 50 tokens of relevant context instead of 3,000 tokens of noise.

4. Monitor Agent Performance

Track these metrics:

Task completion rate (before/after removing AGENTS.md)
Token consumption per task
Human intervention frequency
Time to first correct solution

If removing AGENTS.md improves these metrics (and the paper suggests it will), you’ve just optimized your entire agent workflow.

The Vendor Incentive Problem

Here’s why nobody’s talking about this paper: vendors have every incentive to keep you using AGENTS.md files.

More context = more tokens = more revenue.

Anthropic, OpenAI, Cursor, GitHub Copilot—they all benefit from you dumping massive context files into every agent session. The 20% token increase isn’t a bug; it’s a feature.

The paper authors acknowledge this indirectly:

“While context files are widely recommended by AI coding assistant vendors, our results suggest that their benefits may be overstated.”

Translation: Vendors are lying to you.

The Path Forward

The AGENTS.md debate isn’t about whether context matters. It’s about signal-to-noise ratio.

Good context:

Minimal (under 200 tokens)
Task-specific
Constraint-focused
Regularly updated

Bad context:

Massive (2,000+ lines)
Repository-wide
Information-dumped
Written once, never updated

The paper gives us a framework for distinguishing between the two. It’s time we started using it.

Your Move

Here’s your challenge:

Audit your AGENTS.md file - How many lines are actually constraints vs. fluff?
Measure your token usage - How much are you spending on context that doesn’t help?
Run an A/B test - Try one week without AGENTS.md, track performance
Share your results - The community needs real-world data, not vendor marketing

The research is clear. The economics are clear. The only thing standing between you and better agent performance is the courage to delete that file.

What Do You Think?

Is your AGENTS.md file helping or hurting? Have you measured the actual impact on your agent’s performance? Drop your findings in the comments—let’s build a data-driven understanding of what actually works.

Because right now, we’re all paying 20% more for 3% worse performance.

That’s not just bad engineering. It’s bad business.

Related Reading:

Zero-Polling Agentic Workflows - Cut your agent token costs by 50%
The Agentic Isolation Trap - Why enterprise AI agents fail in production
Agentic ROI: Reliability Over Features - What actually matters for production agents

Primary Sources: