Datacenter proxy detection is the cheapest, fastest, and most reliable signal in network-layer bot defense. A residential consumer browsing your homepage from an AWS us-east-1 IP is not what they appear to be. The technique has been around for a decade, but the landscape changed when residential proxy networks became cheap enough to scale. This article covers how datacenter detection works, what residential proxies do to defeat it, and the cross-checks that still hold up in 2026.
This is one of the network-layer foundations referenced throughout bot detection, web scraping prevention, and price scraping. The harder sibling case is covered in residential proxy detection.
The basics: ASN and IP reputation
Every IP address on the public internet belongs to an Autonomous System (AS) identified by an Autonomous System Number (ASN). The ASN is owned by an organization (an ISP, a hosting provider, a content network, a corporate entity), and the ASN’s category is the first thing a bot detector checks.
The categories that matter:
- Consumer ISP. Comcast, AT&T, Verizon, BT, Deutsche Telekom, NTT. The expected source for consumer traffic.
- Mobile carrier. T-Mobile, Vodafone, AT&T Mobility. Expected for mobile traffic, with carrier-grade NAT often hiding the individual user.
- Hosting / cloud / datacenter. AWS (multiple ASNs per region), GCP, Microsoft Azure, DigitalOcean, OVH, Hetzner, Linode, Vultr, Scaleway, Cloudflare, Fastly, Akamai. Not expected for consumer browser traffic.
- VPN / proxy. NordVPN, ExpressVPN, ProtonVPN, Mullvad. Frequently used by privacy-conscious real users but also by abusers; the right policy depends on the application.
- Anonymising network. Tor exit nodes, I2P. Used overwhelmingly by privacy-conscious users with strong reasons; small minority of abusers in the mix.
- Corporate or enterprise. A company’s own ASN range. Internal traffic, expected for B2B applications.
The simple rule that catches a lot of automation: if a request claims to be a consumer browser but originates from a hosting ASN, it is overwhelmingly likely to be automation.
The public services that maintain ASN and IP reputation data include MaxMind, IP2Location, IPinfo, IPQualityScore, AbuseIPDB, Spamhaus and dozens of others. They differ in coverage, freshness, and false-positive rate, but the underlying ASN data is mostly the same: cloud and hosting ASNs are clearly identifiable.
Why this works for cheap automation
Most early-stage bot operations run on cheap hosting because that is where compute is cheapest. Hetzner CX11 instances at €3/month, OVH eco line at similar prices, AWS Free Tier and Lightsail, DigitalOcean droplets, Vultr instances. The operator boots a Linux box, installs Python + requests or a headless Chromium stack like Puppeteer, and starts scraping or signup-stuffing.
The bot detector sees:
- Source IP in a documented hosting ASN.
- User-Agent claiming Chrome on Windows.
- No referer for the first request to a deep URL.
- Time-of-day pattern inconsistent with the claimed user location.
Any one of those is suspicious on its own, and in combination they are unambiguous. Blocking at the network edge catches this entire population for the cost of an ASN lookup.
Beyond the ASN: the rest of the network fingerprint
The ASN is the cheapest signal, not the only one. Three lower layers corroborate it and catch the operator who has done something clever with their IP but not with the stack underneath it.
TCP/IP stack fingerprinting. The SYN packet that opens every connection carries operating-system defaults the application never sees. The initial TTL gives away the real OS through the observed hop count (Linux and macOS start at 64, Windows at 128), and the TCP window size, the window-scaling factor, the maximum segment size, and the presence and order of TCP options vary by kernel. The technique is old (p0f formalized it two decades ago) and still works because these fields come from the kernel, not the browser, so a Python client on a Linux VM cannot present the TCP fingerprint of the Windows Chrome it claims to be.
TLS fingerprinting. The ClientHello identifies the TLS library before a single HTTP byte is sent. A Go or Python client claiming to be Chrome is contradicted by its handshake. See TLS fingerprinting for the mechanics.
HTTP/2 fingerprinting. Over HTTP/2 the client sends a SETTINGS frame, a connection window-update, and its header pseudo-fields in an order that is characteristic of the implementation. Real Chrome, real Firefox, and the common automation libraries each have a recognizable HTTP/2 fingerprint, and a mismatch against the claimed User-Agent is another high-precision tell.
Together these mean an operator who sources a clean IP still has to reproduce a coherent operating-system-and-browser stack from the kernel up. Few do, which is why the network fingerprint catches automation that the ASN check alone would wave through.
Why residential proxies break the simple rule
Residential proxy networks changed the landscape. Bright Data, Oxylabs, SmartProxy, IPRoyal, NetNut and a dozen others sell access to networks of millions of consumer IPs. The mechanics:
- The proxy network has a presence on consumer devices: either through a free-VPN application the user installed (which monetises by selling exit-node capacity), through a residential ISP partnership, through a peer-to-peer model where users opt in for ad revenue, or through compromised IoT devices in less reputable cases.
- An attacker buys access to the network, configures their automation to route through it, and gets exits from real residential ISPs.
- The bandwidth is metered: pricing in 2026 ranges from $1 to $15 per GB depending on quality.
The result is that the simple ASN check no longer suffices. A scraper using residential proxies appears to come from Comcast Houston, then Spectrum Dallas, then Verizon Newark, all within minutes. Per-IP rate limiting collapses to zero effective rate per IP.
The proxy networks know this and market it explicitly. Bright Data positions itself for “enterprise-scale operations” with 150M+ residential IPs and 195 countries; Oxylabs markets “highest success rates against AI-powered detection.” Residential proxy detection covers this case in depth.
The economics still favor the defender at this layer, though. Datacenter compute is effectively free at the margin, while residential proxy bandwidth is metered at dollars per gigabyte. Most operators start on cheap hosting and only escalate to residential proxies for the targets that block them, so datacenter detection still removes the larger share of automated traffic for the lowest cost, and it forces the survivors onto a metered, more expensive footing that constrains how they behave.
What still works
Three categories of signal still distinguish residential-proxied automation from real residential users.
1. The proxy itself often has tells
Residential proxies are not literally consumer devices. They are exit nodes running specific software on consumer hardware or on residentially-addressed servers. That software has its own observable behavior:
- TCP fingerprinting. Many “residential” exits are Linux gateway nodes with consumer IPs routed to them rather than the consumer devices they appear to be. A Linux TCP/IP stack produces different SYN packet fingerprints from the Windows or macOS device the browser claims to be: the MTU, the TCP window scaling, the TCP timestamp option, and the order of TCP options together form a fingerprint distinct from typical consumer OSes.
- TLS fingerprint. The TLS library being used (the underlying Chrome or Node), combined with the OS, produces a JA4 fingerprint that often does not match a real consumer Chrome on the claimed OS.
- Latency profile. Residential proxies introduce two extra network hops: from the attacker to the proxy network, and from the proxy network to your origin. The round-trip-time variance is detectable.
2. The user-claim layer often does not match
Even with a residential IP, the attacker has to pretend to be a coherent user. They often slip on the joint distribution:
- Time zone vs IP geography. Browser-reported
Intl.DateTimeFormat().resolvedOptions().timeZoneshould match the IP’s geographic location. A proxy in Houston exiting to a browser claiming Europe/Moscow time is incoherent. - Language vs geography. A Comcast Texas IP serving a browser with
Accept-Language: ru-RU,en-US;q=0.9is statistically unlikely. - Behavioral local-time pattern. A residential proxy network rotates IPs continuously, so the geographic location of the requests changes minute-to-minute. A real user does not teleport across the United States every five minutes.
3. Cluster signatures across sessions
The operator running the attack is one entity. Their session-level signatures cluster:
- The proxy provider (each residential proxy network has its own exit-node distribution that produces recognizable patterns).
- The hour-of-day pattern at the operator’s local time, regardless of where the exits are.
- The behavioral style (how the operator clicks, types, navigates).
- The target endpoints.
Two sessions that arrive from different IPs but cluster together on these features are from the same operator. Per-cluster signatures catch coordinated residential-proxy operations that per-session signatures miss.
The datacenter traffic you actually want
Not all hosting-ASN traffic is hostile. Search-engine crawlers, AI crawlers, link unfurlers, uptime monitors, and partner integrations all originate from datacenters by design. A naive “hosting ASN plus browser User-Agent equals bot” rule blocks Googlebot, and a too-aggressive deployment will quietly remove a site from search results.
The carve-out is verification, not a hand-maintained allow-list of IPs that rotate underneath you:
- Reverse-DNS verification. Resolve the source IP’s PTR record, then forward-resolve the result. A genuine Googlebot reverse-resolves into
googlebot.comand forward-resolves back to the same address; a spoofed User-Agent from a random VPS fails the round trip. Google, Bing, and others document this as the canonical check. - Published IP ranges. The major crawler operators publish machine-readable IP-range files. Match the source against the current file rather than against memory.
- Web Bot Auth. A newer standard in which the agent signs each request with a key tied to a verifiable identity, replacing network heuristics with a cryptographic one.
Verified bots get their own policy, usually allow with a generous rate limit. Anything from a hosting ASN that fails verification and still claims to be a human browser is the population this article is about. Bot management covers how to structure those policies, and the distinction between good and bad automation in detail.
Mobile carrier traffic is its own thing
A note on mobile traffic: mobile carriers run carrier-grade NAT, which means thousands of users can share one IP at any given moment. This makes per-IP rate limiting on mobile carrier ASNs counterproductive (you block all the legitimate users) and makes “trust the IP” lookups unreliable.
The right pattern for mobile traffic:
- Identify the ASN as mobile carrier and apply mobile-specific policies.
- Use device intelligence rather than IP for unique identification.
- For mobile apps, rely on platform-attested device identity (Play Integrity, App Attest) instead of IP.
The same caution applies to ISP-shared CGN ranges that some residential ISPs use.
IPv6 needs its own handling. Cloud and consumer providers hand out enormous IPv6 blocks, commonly a /64 per customer, so a single user can rotate through billions of addresses without changing anything that matters. Rate-limiting and reputation on IPv6 should operate on the /64 (or the provider’s documented allocation prefix), not the individual /128, or the limit is trivially evaded.
What an integration looks like
The data layer needed:
- A reasonably-fresh ASN-to-category mapping. Public datasets are good enough for most purposes; commercial sources are higher quality.
- An IP reputation feed for known abuse.
- An IP-to-geography lookup with reasonable accuracy.
The application layer:
function classifyIP(ip: string) {
const asn = lookupASN(ip);
const reputation = lookupReputation(ip);
const geo = lookupGeo(ip);
return {
asn,
category: classifyASN(asn), // 'consumer-isp' | 'mobile' | 'hosting' | 'vpn' | 'tor' | 'corporate' | 'unknown'
reputation, // { knownAbuse: boolean, scrapeReports: number, ... }
geo: { country: geo.country, region: geo.region, city: geo.city },
};
}
function ipRisk(req) {
const ip = classifyIP(req.ip);
const ua = parseUA(req.headers['user-agent']);
const claimedTz = req.headers['x-fingerprint-tz'];
let score = 0;
if (ip.category === 'hosting' && ua.kind === 'browser') score += 0.6;
if (ip.category === 'tor') score += 0.4;
if (ip.category === 'vpn') score += 0.2;
if (ip.reputation.knownAbuse) score += 0.3;
if (claimedTz && tzMismatch(claimedTz, ip.geo.country)) score += 0.2;
return Math.min(1, score);
}
The score is one input among many. Hosting-IP alone does not justify a block; hosting-IP plus headless browser plus inconsistent geography does.
What not to do
Three common mistakes:
Trust the IP allow-list as primary auth. IPs change. Even corporate IPs change when network teams rotate ranges. IP allow-listing as the primary trust signal will fail.
Block all VPN traffic. Privacy-conscious users use VPNs. Travelling users use them. Users in restrictive countries use them. Blocking VPN traffic hurts legitimate users more than it hurts attackers, who switch to residential proxies.
Block all Tor exit nodes by default. A small population of legitimate users access services via Tor. Allowing Tor traffic with higher per-session friction is usually the right policy unless your application has a strong legal or compliance reason to refuse.
How Foil supports it
Foil’s session detail includes the IP classification and the cross-checks against the rest of the signal stack. A session arriving from hosting infrastructure carries network.anonymity.hosting === true alongside the ASN and organization in network.routing, the reputation summary in network.reputation, and the IP-side geography in network.location for comparison against the browser-claimed environment.
The application uses these to make policy decisions. The implementation pattern:
import { Foil } from "@abxy/foil-server";
const client = new Foil({ secretKey: process.env.FOIL_SECRET_KEY });
const session = await client.sessions.get(sessionId);
const { anonymity, routing, location } = session.network;
// hosting ASN claiming to be a consumer browser, with the IP-side
// timezone contradicting what the browser reported
if (anonymity.hosting && location.timezone !== clientReportedTimezone) {
return blockOrChallenge(req, res);
}
if (anonymity.hosting) {
// routing.asn / routing.organization name the provider
return throttleAndContinue(req, res);
}
For the broader picture, bot detection is the parent, web scraping prevention covers the scraping case, and anti-detect browser detection covers the operator side of the cluster-signature picture.
Further reading
- Google Search Central, Verifying Googlebot and other Google crawlers: developers.google.com/search/docs/crawling-indexing/verifying-googlebot
- Michał Zalewski, p0f v3 (passive TCP/IP stack fingerprinting): lcamtuf.coredump.cx/p0f3
- IETF, RFC 9113, HTTP/2 (the SETTINGS frame and framing layer): rfc-editor.org/rfc/rfc9113
- CAIDA, AS Rank (autonomous-system data and relationships): asrank.caida.org
- IETF, RFC 6177, IPv6 address assignment to end sites: rfc-editor.org/rfc/rfc6177