2026-02-15

The Instagram Enrichment Endpoint That Still Works in 2026 (and Which Proxies to Use)

by HarvestMyData

instagramscrapingproxiestechnical

The old ?__a=1 endpoint is dead

If you've ever tried scraping Instagram profile data, you probably started with the ?__a=1 trick. You'd hit instagram.com/{username}/?__a=1 and get back a nice JSON blob with the user's bio, follower count, and profile info.

That stopped working sometime in late 2023. Instagram started returning 302 redirects to the login page, then 401s, then eventually the endpoint just disappeared. Every tutorial and StackOverflow answer that mentions ?__a=1 is outdated.

The GraphQL endpoints (graphql/query) followed shortly after. Most of them now require authentication cookies, and the ones that don't are heavily rate-limited.

So what actually works?

The endpoint: web_profile_info

There's one endpoint that still returns full profile data without authentication:

https://i.instagram.com/api/v1/users/web_profile_info/?username={username}

Note the domain: i.instagram.com, not www.instagram.com. This is Instagram's mobile API gateway, and it behaves differently from the web frontend.

Required headers

You need exactly two non-standard headers:

X-IG-App-ID: 936619743392459
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36

The X-IG-App-ID is the Instagram web app's public client ID. It hasn't changed in years. Without it, you get a 403.

The User-Agent doesn't need to be anything specific, but it should look like a real browser. We use Chrome on macOS.

Full header set we send in production:

python

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36",
    "Accept": "*/*",
    "Accept-Language": "en-US,en;q=0.9",
    "Accept-Encoding": "gzip, deflate, br",
    "X-IG-App-ID": "936619743392459",
}

⚠

Don't skip the compression headers. The Accept-Encoding: gzip, deflate, br header is not optional. A typical profile response is about 2-3 KB compressed with gzip. Without compression, that same response is 15-20 KB of raw JSON. That's a 7-8x difference. When you're enriching 20,000 profiles, that turns 40-60 MB of transfer into 300-400 MB. At $0.90/GB on SmartProxy, skipping one header costs you an extra $0.25 per job. On BrightData at $4/GB, it's over a dollar extra. gzip alone gets you most of the savings. br (Brotli) compresses slightly better but both work fine. Just make sure the header is there.

What you get back

A successful 200 response returns JSON with this structure:

json

{
  "data": {
    "user": {
      "username": "example",
      "full_name": "Example User",
      "biography": "Some bio text here",
      "follower_count": 12500,
      "following_count": 890,
      "media_count": 342,
      "is_private": false,
      "is_verified": false,
      "is_business_account": true,
      "is_professional_account": true,
      "category_name": "Photography",
      "external_url": "https://example.com",
      "profile_pic_url": "https://..."
    }
  }
}

HTTP status codes you'll see

200: Success, user data in response
404: Account doesn't exist (deleted, banned, or typo)
401: Your request looks suspicious. Usually means bad headers or the IP got flagged
429: Rate limited. Back off
200 with empty user: Age-restricted account. The endpoint returns 200 but the user object is null

The HTTP/2 requirement

Here's something that cost us a week of debugging: this endpoint blocks HTTP/1.1 requests when accessed through residential proxies.

We originally used aiohttp for async HTTP requests. It works great for most things, but it only supports HTTP/1.1. When we routed requests through residential proxies, every single one either timed out or returned a connection reset. The same requests worked fine without a proxy, or with datacenter IPs.

The fix: use a library that supports HTTP/2. We switched to curl_cffi, which wraps libcurl and can impersonate real browser TLS fingerprints.

python

from curl_cffi.requests import AsyncSession

session = AsyncSession(
    proxy="http://user:pass@proxy:port",
    impersonate="chrome",
)

The impersonate="chrome" flag makes curl_cffi send the exact same TLS ClientHello and HTTP/2 settings that Chrome does. Instagram's edge servers see what looks like a real Chrome browser, not a Python script.

Key curl options for proxy rotation

When you're doing mass enrichment through residential proxies, you want a fresh IP for every request. By default, libcurl reuses TCP connections, which means you keep hitting Instagram from the same exit IP.

python

from curl_cffi import CurlOpt

session.curl_options = {
    CurlOpt.FRESH_CONNECT: 1,     # new TCP connection per request
    CurlOpt.FORBID_REUSE: 1,      # don't pool connections
    CurlOpt.DNS_CACHE_TIMEOUT: 300,
    CurlOpt.IPRESOLVE: 1,         # IPv4 only
    CurlOpt.TCP_NODELAY: 1,
}

FRESH_CONNECT + FORBID_REUSE is the key combo. Without these, you'll burn through your proxy's IP pool much slower and get rate-limited on the same IPs.

Proxy comparison: BrightData vs SmartProxy

We tested both providers on the same workload: enriching 1,000 Instagram usernames pulled from our production cache. Same machine, same code, same concurrency settings. 50 concurrent requests per batch.

BrightData

BrightData (formerly Luminati) is the default choice for residential proxies. Their network is the largest, their documentation is decent, and they work out of the box for most scraping tasks.

Proxy format:

http://brd-customer-{id}-zone-{zone}:{password}@brd.superproxy.io:33335

Results on 1,000 usernames:

Success rate: 97.2%
404 (deleted accounts): 1.8%
Rate limited (429): 0.3%
Timeouts/errors: 0.7%
Effective speed: ~45 req/s at 50 concurrent
Avg response time: 1.1s

Quirk: BrightData's proxy does SSL interception (MITM) to manage the connection. This means you need to disable certificate verification:

python

session = AsyncSession(
    proxy=brightdata_url,
    verify=False,  # required for BrightData
    impersonate="chrome",
)

This is fine for scraping public data, but worth knowing.

One big advantage: BrightData works with both HTTP/1.1 and HTTP/2. If you're using aiohttp or httpx instead of curl_cffi, BrightData will still work because their proxy handles the protocol upgrade internally. This is probably why so many tutorials recommend them. You don't need to care about HTTP/2 on your end.

Price: ~$4.00 per GB of residential traffic.

SmartProxy

SmartProxy (via smartproxy.net, not the original SmartProxy which rebranded to Decodo) is a much cheaper alternative. Their residential proxy network is smaller, but for Instagram enrichment specifically, it works.

Proxy format:

http://{user}:{pass}@proxy.smartproxy.net:3120

Results on 1,000 usernames:

Success rate: 95.8%
404 (deleted accounts): 1.8%
Rate limited (429): 0.9%
Timeouts/errors: 1.5%
Effective speed: ~38 req/s at 50 concurrent
Avg response time: 1.4s

The catch: SmartProxy only works with HTTP/2. If you send HTTP/1.1 requests through their residential proxies to Instagram, you'll get connection resets or timeouts on nearly every request. You must use curl_cffi with impersonate="chrome" or another HTTP/2-capable client.

Price: ~$0.90 per GB of residential traffic.

Side by side

                    BrightData      SmartProxy
Success rate        97.2%           95.8%
Avg response        1.1s            1.4s
Effective speed     ~45 req/s       ~38 req/s
HTTP/1.1 support    Yes             No
HTTP/2 required     No              Yes
SSL verification    verify=False    Normal
Price per GB        $4.00           $0.90

BrightData is about 15% faster and 1.5% more reliable. SmartProxy is 78% cheaper per gigabyte.

For 20,000 username enrichments, you'll transfer roughly 1.5-2 GB. That's $6-8 on BrightData or $1.35-1.80 on SmartProxy.

Python libraries for Instagram enrichment

Here's what we actually use in production and why:

curl_cffi (recommended)

pip install curl_cffi

The only Python HTTP library that supports HTTP/2 + browser TLS fingerprinting. This is what makes SmartProxy work.

python

from curl_cffi.requests import AsyncSession

async with AsyncSession(
    proxy=proxy_url,
    impersonate="chrome",
    max_clients=350,
) as session:
    response = await session.get(url, headers=headers, timeout=(4, 6))

The timeout=(4, 6) tuple means 4 seconds to connect, 6 seconds to read. Total maximum of 10 seconds per request.

aiohttp (only with BrightData)

pip install aiohttp

If you're using BrightData and don't want to deal with curl_cffi, aiohttp works fine. It's HTTP/1.1 only, but BrightData handles the protocol upgrade.

python

import aiohttp

async with aiohttp.ClientSession() as session:
    async with session.get(url, headers=headers, proxy=brightdata_url) as resp:
        data = await resp.json()

Simple, well-documented, stable. But won't work with SmartProxy or any other proxy that doesn't do protocol translation.

httpx

pip install httpx

httpx supports HTTP/2 natively via httpx.AsyncClient(http2=True), but it doesn't do browser TLS fingerprinting. Instagram can tell the difference between an httpx HTTP/2 handshake and a real Chrome one. In our testing, httpx with HTTP/2 had a ~30% failure rate through SmartProxy due to TLS fingerprint detection.

Use curl_cffi instead.

Our production setup

We run SmartProxy as primary and BrightData as failover. Each request tries SmartProxy first with up to 5 retries (0.1s delay between retries). If all 5 fail, it falls through to BrightData for another 5 attempts.

In practice, about 0.2% of requests need the BrightData failover. The other 99.8% go through SmartProxy at a fraction of the cost.

Key numbers from our production workload (enriching 20,000 usernames):

350 max concurrent requests
200 requests/second fire rate
Effective throughput: ~65 req/s after accounting for retries and timeouts
Total time: ~5 minutes for 20k enrichments
SmartProxy failover rate: 0.18%
Total proxy cost: ~$1.50-2.00

Summary

The web_profile_info endpoint on i.instagram.com is currently the most reliable way to get Instagram profile data without authentication. The ?__a=1 trick and most GraphQL endpoints are dead.

If budget doesn't matter, use BrightData. It works with any HTTP library, handles protocol translation, and has the highest success rate. Set verify=False and you're done.

If you want to save 78% on proxy costs, use SmartProxy with curl_cffi. You need HTTP/2 and browser fingerprint impersonation, but once configured, it's nearly as reliable as BrightData at a quarter of the price.

We built HarvestMyData to handle all of this for you.

No proxies, no code, no account needed.

Try it now

← Back to all posts