Bot management

On a production site, the automated traffic arriving in any given hour includes verified search crawlers, LLM retrieval bots, monitoring agents, authorized AI agents, scrapers, and credential-stuffing scripts, often on the same routes. Sorting them out is detection’s job. Deciding what each one gets (allow, throttle, challenge, block) is bot management. Marketing copy sometimes treats the two as synonyms, but the work is genuinely different: detection is a probabilistic classifier, while management is a policy engine that has to make sensible decisions for hundreds of overlapping bot populations, including the ones you actually want.

This is the companion piece to bot detection, which covers the signal side. If you want the response-side detail (which surfaces to challenge, what to log, what to throttle versus what to block), the most practical reference is bot mitigation.

What “bot management” actually means

Bot management is the operational layer above detection. A bot management system:

Receives classified traffic from the detection layer.
Looks up the bot or session against a policy table.
Applies one of a small set of actions (allow, throttle, challenge, block, redirect, log).
Logs the decision in a form an operations team can audit.
Adapts over time as the bot population and the business’s needs evolve.

The policy table is the load-bearing element. Detection is widely available, so the value of a bot management product lies in knowing what to do with that information for your specific business.

The four actions, in detail

Every reputable bot management product supports the same four primary actions. The differences are in how cleanly they handle edge cases and how granular the targeting is.

Allow

Allowed traffic passes through to the origin with no friction. The bot is logged but otherwise treated identically to a human session.

Allow is the right action for:

Verified search-engine crawlers you depend on for discovery.
Verified AI agents authorized by the user to act on their behalf.
Internal monitoring agents (Datadog Synthetics, Pingdom, your own integration tests).
First-party tools whose traffic you specifically want.

The mistake to avoid is allowing by IP allow-list alone, because IPs rotate and verified bot lists go stale. It is better to allow by classification (verified bot, named provider, signed Web Bot Auth) and log every allow decision so anomalies are visible.

Throttle

Throttled traffic is permitted but rate-limited, typically per device or per session rather than per IP. Throttle is the lowest-cost mitigation that still does useful work, and it is underused.

Cases where throttling beats blocking:

An LLM crawler is fetching too aggressively. Cutting it to one request every few seconds is preferable to a 403, because the crawler will retry and the next instance is less identifiable.
A scraper is grinding through pagination. Throttling makes the operation uneconomic without telling the operator they have been detected.
A user with abnormal traffic patterns might be human (frantic clicker, accessibility tool, slow connection retrying). Throttling preserves the option of being wrong.

Throttling is also the correct response to most ambiguous classifications: when the system is 60% sure something is a bot, slow it down rather than blocking it.

Challenge

Challenges interrupt the session to demand proof of humanness or proof of authentication. They come in several forms:

Invisible (proof-of-work, behavioral). A JavaScript challenge the browser solves silently. The user notices a brief delay; bots that lack a working JS engine or fail behavioral checks do not pass.
Interactive (CAPTCHA, click puzzle). Visible to the user. Friction is significant. Modern image CAPTCHAs are also solvable by AI for $1 to $5 per 1,000 (CapSolver, 2Captcha and similar farms advertise commodity solving rates), which limits their value against motivated attackers.
Step-up authentication. For logged-in sessions, demand a second factor: a passkey re-verify, a one-time code, a biometric prompt.

Challenges have an honesty problem. Putting a CAPTCHA in front of credential stuffing slows the attacker by a few cents per attempt and slows your real users by several seconds each. For high-value transactions where false negatives cost more than false positives, this trade is worth it. For everyday traffic it is not.

The modern alternative to CAPTCHA for the typical bot population is passive behavioral challenge: collect mouse, scroll and timing data for the first few seconds, score it server-side, and only escalate to a visible challenge for sessions that fail the passive check. We cover the implementation of this pattern in bot mitigation and the full menu of replacements in CAPTCHA alternatives.

Block

Hard rejection. The request returns a 403 (or, controversially, a 200 with a fake error page so the operator does not learn they were blocked). Block is the right action when you have high confidence and the bot is clearly hostile.

The most important property of a good block decision is that it is attributable. The logs should show why the request was blocked, which signals fired, and which detection classifier made the call. A block that cannot be explained cannot be appealed when a real user gets caught.

The most important property of a good block action is that it is reversible. Bot populations shift. The block rule that was correct last quarter is harmful this quarter. A bot management system that makes blocks easy to add and hard to remove will accumulate harm faster than it adds value.

Less-discussed actions

Two others worth knowing about.

Tarpit. Hold the connection open for an unusually long time before responding. Some operators run this against scrapers; it costs the attacker compute and time without telling them they have been detected. Modern HTTP/2 and HTTP/3 make tarpitting less effective than it was, but it still has a place.

Honeypot. Serve fake but plausible content (synthetic SKUs, fake prices, dummy account creation that does not actually create an account). Useful for poisoning a scraping operation without alerting them and for measuring the size of an ongoing scraping campaign.

Designing a policy table

A policy table is a per-route, per-classification decision matrix. The structure that works in practice:

Route                    | Classification              | Action
-------------------------+-----------------------------+-----------
GET /                    | verified-good-bot           | allow
GET /                    | unknown-bot                 | throttle (4 rpm)
GET /                    | known-malicious-bot         | block
GET /                    | human                       | allow
POST /api/login          | verified-good-bot           | block (no bot should log in)
POST /api/login          | human                       | allow
POST /api/login          | unverified-ai-agent         | challenge (step-up)
POST /api/login          | known-headless-framework    | block
POST /api/signup         | any-bot                     | block
POST /api/signup         | human (low device trust)    | challenge (email verify)
GET /api/search          | verified-good-bot           | throttle (60 rpm)
GET /api/search          | scraper-class               | block
GET /api/checkout        | any-bot                     | block
GET /api/checkout        | human (low device trust)    | step-up

A useful matrix has fewer rows than this in early deployment. Start with one row per high-value endpoint, expand only when the bot population justifies it. Every new row should answer “what would have happened in the last 30 days under this rule?” with concrete data.

Budget for false positives from the start, because over-aggressive policies have a real operational cost: blocked customers file support tickets, abandon carts, and rarely come back to tell you why. Run every new challenge or block rule in shadow mode first: log what it would have done for a week, then read the would-have-been-blocked sessions before enforcing. Once a rule is live, wrongly-challenged users need a recovery path that does not require contacting support: a step-up they can pass (email verification, passkey re-verify) rather than a dead-end 403. A policy table without an appeal path turns every false positive into churn.

Verified bots: who they are, why they matter

The 14% of internet traffic Imperva classifies as “good bots” matters disproportionately (Imperva 2025 report). Blocking them has real cost.

Data Pages crawled per referral sent back

log scale

Anthropic ClaudeBot

23,951 : 1
OpenAI GPTBot

1,276 : 1
Perplexity

~200 : 1
Google (search)

~15 : 1

Crawl-to-refer ratios, Q1 2026 (lower is more reciprocal)

Source: Cloudflare Radar, crawl-to-refer ratio analysis

Each bot consumes many pages of content for every visitor it sends back. The ratios span three orders of magnitude, which is why the chart uses a log scale. Anthropic's ratio dropped 74% from January to March 2026 (Q1 2026 average shown); the gap to Google's search crawler is still enormous.

The bots a typical business depends on:

Search engine crawlers. Googlebot, Bingbot, DuckDuckBot, Applebot, YandexBot. Blocking these hurts organic traffic.
AI training and live retrieval crawlers. GPTBot, ChatGPT-User, ClaudeBot, Claude-User, Perplexity-User, OAI-SearchBot, Amazonbot, CCBot. These split into training crawls (slow, periodic) and live-retrieval (per-question, latency-sensitive). The trade-off for whether to allow each is a business question.
Link preview and unfurling bots. Twitterbot, Slackbot-LinkExpanding, Discordbot, LinkedInBot, WhatsApp, iMessage. Blocking these breaks share previews.
Monitoring and uptime checks. Pingdom, Datadog, UptimeRobot. Usually first-party.
Accessibility crawlers. Browse.ai, several archive services, Wayback Machine’s IA Archiver.

Verification matters because the User-Agent is freely set. Two mechanisms exist.

Reverse DNS. The traditional approach: Googlebot’s IP reverse-resolves to *.googlebot.com, that hostname forward-resolves back to the same IP. Each vendor publishes its own verification scheme. The implementation cost is annoying because every vendor is slightly different, and it breaks when vendors rotate IP ranges.

Web Bot Auth. The newer mechanism. Bots sign their HTTP requests using RFC 9421 HTTP Message Signatures, with a public key published at a discoverable URL. Cloudflare proposed and is operating this pattern; Akamai and HUMAN are implementing it, and OpenAI and Anthropic are early adopters on the bot side (Cloudflare: Forget IPs, using cryptography to verify bot and agent traffic, Akamai: Web Bot Authentication).

The signed request looks like this in headers:

Signature-Input: sig1=("@authority" "@target-uri");created=1716412800;\
  expires=1716413100;keyid="ed25519-bot-key-2026-05";alg="ed25519"
Signature: sig1=:MEUCIQD...:
Signature-Agent: "https://bot.example.com"

The verifier resolves the Signature-Agent URL, fetches the public key, validates the signature over the target URI and creation timestamp, and allows the request to proceed if the signature is valid and the agent is on a per-site allow-list.

The reason this matters for bot management is that it turns “is this Googlebot?” from a reverse-DNS exercise into a cryptographic check, and it makes the same question answerable for any AI agent operator who chooses to participate. Bot management policies in 2026 should include rules for signed agents specifically.

The AI agent case

A growing share of “bot” traffic is actually authorized AI agent activity: a user has asked their agent to compare prices, book a flight, fill in a form, or run a research task. The agent shows up at the origin as a real Chromium browser with a real fingerprint and the user’s session cookies. Detecting it as “a bot” and blocking is detrimental to the user who authorized it.

The mature pattern is:

Detect that an automation framework is present. Browser Use, Browserbase, Anthropic Computer Use, OpenAI Operator and similar all leave signals: CDP usage, specific framework artefacts, characteristic User-Agent or Client Hints. Detection identifies which agent.
Check for authorization. Is the agent signed via Web Bot Auth? Is the session attached to a known user account?
Apply per-route policy. Browsing pages and search are usually fine for any agent; form submissions and high-trust actions need explicit user consent or known-agent attribution.

We go deeper on this in AI agent detection.

What to log, and what to do with the logs

A bot management deployment without good logs degrades quickly because nobody can tell whether the policies are working. The minimum useful log record per decision:

Request metadata (timestamp, route, IP, ASN, hosting class).
Detection inputs (TLS fingerprint, header set hash, JS environment hash, behavioral snapshot).
Detection outputs (classification, named framework if known, confidence, signals that fired).
Policy match (which rule matched, in which order).
Action taken (allow, throttle, challenge, block) and any user-visible response.
Outcome (if known: bot retried with a different signature, user completed step-up, request was challenged and abandoned).

Two views of this log feed routine operations:

A traffic dashboard showing the share of allow/throttle/challenge/block decisions per route and per classification, with the ability to drill in. Anomalies (a new IP range, a new framework, a sudden change in classification rate) should be obvious at a glance.
A decision audit allowing operators to look up a specific request, see why the decision was made, and reproduce the verdict. Customer support uses this when a real user is wrongly blocked.

The most common failure mode of bot management deployments is not bad detection but logs that nobody can read and rules that nobody can explain.

How Foil thinks about it

Foil treats bot management as a thin layer the customer owns and bot detection as the rich service we provide. The Foil SDK and API give each session a verdict (bot, human, or inconclusive) with a risk score, plus attribution labels that name the specific framework, anti-detect browser, AI agent, or crawler when one is identified, each label carrying its own confidence. The application uses that decision and its labels to drive its own policy table.

We deliberately do not auto-block on the customer’s behalf because the right policy is business-specific. A signup-flow policy that blocks Browser Use is correct for some sites and wrong for others; the system that knows which is true is the application, not the detector.