Glossary
Web Scraping & SERP API Glossary
Definitions for the terms you run into when working with SERP APIs, web scraping tools, and web data pipelines. From API keys and proxies to RAG pipelines and rotating proxies.
A
AI Overviews
Google's AI-generated summary boxes that appear above organic results for many queries. They synthesize information from multiple sources and cite reference links. Monitoring which queries trigger AI Overviews has become a key task for SEO and brand visibility tracking.
Anti-bot Protection
Systems websites use to detect and block automated traffic. Common techniques include CAPTCHA challenges, TLS fingerprinting, JavaScript execution challenges, and behavioral analysis. SERP APIs and web unblockers handle all of this on your behalf.
API (Application Programming Interface)
A defined interface that lets one software system request data or actions from another. Web data APIs accept HTTP requests with a query and return structured JSON, eliminating the need to build or maintain custom scrapers.
API Endpoint
A specific URL that accepts requests for a particular data type or operation. For example, a SERP API might expose /search for web queries and /product for product lookups. Each endpoint expects specific parameters and returns a defined response shape.
API Key
A unique authentication token that identifies your account when making API requests. Treat it like a password - don't embed it in client-side code or commit it to public repositories.
ASIN
Amazon Standard Identification Number. A 10-character alphanumeric code Amazon assigns to every product in its catalog. ASINs are the primary identifier for Amazon product API lookups - search results, reviews, pricing, and seller data all reference ASINs.
Async API
An API that processes requests asynchronously: you submit a task, and the results are returned later via polling or webhook rather than in the immediate response. Cheaper per request than live (synchronous) APIs, but not suitable for real-time use cases.
B
Backlink
A hyperlink from one website pointing to another. Search engines treat backlinks as votes of authority and relevance. The number and quality of backlinks pointing to a page is one of the strongest signals in Google's ranking algorithm.
Base URL
The root address of an API that all endpoints are appended to. For example, if the base URL is https://api.example.com/v1, a search endpoint would be https://api.example.com/v1/search. Always check the API's documentation for the correct base URL.
BeautifulSoup
A Python library for parsing HTML and XML documents. It builds a navigable tree from raw HTML, letting you extract elements by tag, CSS class, ID, or attribute. Typically paired with the requests library: requests fetches the page, BeautifulSoup parses it. For JavaScript-rendered pages, a headless browser is needed first.
Browser Fingerprinting
A technique anti-bot systems use to identify automated browsers by analyzing dozens of signals: canvas rendering, WebGL support, font metrics, audio context, screen resolution, timezone, and JavaScript engine quirks. Even without cookies, a consistent fingerprint can identify a scraper. Modern headless frameworks like Playwright offer stealth patches to reduce fingerprint detectability.
C
CAPTCHA
Completely Automated Public Turing test to tell Computers and Humans Apart. A challenge-response test (distorted text, image selection, checkbox) designed to block automated requests. SERP API providers solve CAPTCHAs as part of their infrastructure - your application never encounters them.
Click-Through Rate (CTR)
The percentage of users who click a result after seeing it in the SERP. A page ranking position 1 with a poor title can have a lower CTR than a page at position 3 with compelling copy. SERP APIs return impression data; CTR requires Search Console.
CSS Selector
A pattern used to locate HTML elements by tag, class, ID, or structural position. Widely used in scraping libraries like BeautifulSoup and Playwright. For example, div.result > h3 selects all h3 elements that are direct children of a div with class "result".
Web Crawler
An automated program that systematically browses the web by following links, typically to build an index of pages. Unlike a scraper (which extracts data from specific known URLs), a crawler discovers new pages. Search engines like Google run crawlers continuously.
D
Data Pipeline
A series of automated steps that collect, transform, and store data. A typical SERP data pipeline: call API → parse results → normalize fields → write to database → trigger downstream processing. Async APIs and webhooks are common components in data pipelines.
Datacenter Proxy
A proxy server hosted in a data center rather than on a residential ISP connection. Fast and inexpensive, but more likely to be flagged by sophisticated anti-bot systems since datacenter IP ranges are well-known.
E
ETL (Extract, Transform, Load)
A data engineering pattern where data is extracted from a source (such as a SERP API or web scraper), transformed into a target schema (cleaning, normalizing, enriching fields), and loaded into a destination like a database or data warehouse. Most web data pipelines follow this pattern.
Extraction
The first step of a web data pipeline: retrieving raw content from a source, whether via an API response or an HTTP request to a web page. What you do with that raw content (parsing, structuring, storing) is handled in subsequent pipeline steps.
F
Featured Snippet
A highlighted answer box Google displays above organic results - sometimes called "position zero". It can be a paragraph, numbered list, table, or video. Featured snippets are parsed as a distinct field by SERP APIs and are valuable to track for content optimization.
G
Geo-targeting
Requesting search results as they appear for a specific country, region, or city. Results differ significantly by location - especially for local queries. Controlled via location, uule, or gl parameters depending on the SERP API provider.
Google AI Mode
A conversational, multi-turn AI search interface Google launched in 2025. Unlike AI Overviews (single-query summaries), AI Mode supports follow-up questions within a session and synthesizes results across multiple turns.
Google Shopping Results
Product listings with price, image, and merchant name that appear in the SERP when a query has commercial intent. Returned as a distinct array by SERP APIs, they are useful for price intelligence and competitor product monitoring.
H
Headless Browser
A web browser that runs without a visible user interface. Used in scraping and automated testing to render JavaScript-heavy pages that return empty HTML to plain HTTP requests. Playwright and Puppeteer are the most widely used headless browser tools.
Headful Browser
A web browser running with a visible user interface, as opposed to headless mode. Sometimes used in scraping to reduce detection risk, since headful browsers more closely match the behavior of a real user. Playwright and Selenium both support headful mode via a simple configuration flag.
HTML Parser
A library that reads raw HTML and builds a navigable tree structure for extracting specific elements. BeautifulSoup (Python) and Cheerio (Node.js) are the most popular HTML parsers for web scraping.
HTTP Request
A message sent by a client to a server asking for data or an action. Most web data APIs use GET requests (query parameters in the URL) or POST requests (parameters in the body). The server responds with an HTTP status code and a response body - typically JSON.
I
IP Ban
When a website blocks all requests from a specific IP address after detecting too many requests or suspicious behavior. IP bans are a common anti-scraping countermeasure. Rotating proxies and residential proxy networks are the standard response - each request appears to come from a different IP.
IP Rotation
The practice of cycling through a pool of IP addresses across requests so no single IP triggers rate limits or bans on the target server. SERP APIs and web unblockers handle IP rotation internally. Residential IP rotation is harder for anti-bot systems to detect than datacenter IP rotation.
J
JavaScript Rendering
The process of executing a page's JavaScript before parsing its content. Many modern sites render their data client-side, meaning a plain HTTP request returns an empty shell. A headless browser or rendering service is required to see the full page content.
JSON (JavaScript Object Notation)
A lightweight, human-readable format for transmitting structured data as key-value pairs, arrays, and nested objects. All major web data APIs return JSON responses. Most programming languages have built-in JSON parsers.
K
Keyword Difficulty (KD)
A score (0–100) estimating how hard it is to rank organically for a given keyword, based on the authority and backlink profiles of the pages currently ranking in the top 10. Available through keyword research APIs like Semrush or DataForSEO.
Knowledge Graph
Google's database of real-world entities - companies, people, places, products - and the relationships between them. When you search for a brand or public figure, the information panel on the right side of the SERP is the Knowledge Graph card.
L
Local Pack
The map + three local business listings Google shows for location-based queries (e.g. "coffee shop near me"). Each result includes business name, address, rating, and hours. Accessible via Local Business Data APIs for lead generation and local SEO monitoring.
Long-tail Keyword
A specific, multi-word search query with lower monthly volume but higher purchase intent. "Best noise-cancelling headphones under $100" is a long-tail variant of "headphones". Long-tail keywords are easier to rank for and typically convert at higher rates.
O
Organic Results
The non-paid search listings on a SERP, ranked by Google's algorithm based on relevance, authority, and user experience signals. Organic result data - title, URL, snippet, and position - is the core output of any SERP API.
P
Pagination
The practice of splitting large result sets across multiple pages, navigated via a page or start parameter. SERP APIs expose pagination parameters to fetch results beyond the first 10. Rank trackers typically only need page 1; competitive research may require going deeper.
Pay-As-You-Go (PAYG)
A pricing model where you pay only for what you use with no monthly subscription fee. Suitable for variable workloads where request volume is unpredictable. DataForSEO uses PAYG; most other SERP APIs use fixed monthly tiers.
People Also Ask (PAA)
A SERP feature showing a set of related questions with expandable answer snippets. PAA boxes appear mid-SERP and cascade dynamically as users interact. The questions are a reliable source of content ideas and FAQ copy - SERP APIs return them as a structured array.
Proxy
A server that routes your requests through a different IP address, masking the original source. SERP APIs maintain large rotating proxy pools to make requests appear as legitimate user traffic from the target location, bypassing IP-based rate limits and blocks.
Playwright
An open-source browser automation framework by Microsoft, supporting Chromium, Firefox, and WebKit. Used for end-to-end testing and JavaScript-heavy web scraping. Supports headless and headful modes, network request interception, and stealth plugins to reduce anti-bot detection. The most widely used modern headless browser tool for scraping.
Puppeteer
A Node.js library by Google for controlling Chromium via the DevTools Protocol. Widely used for headless browser scraping, PDF generation, and screenshot capture. Playwright has largely superseded it for scraping due to broader browser support (Firefox, WebKit) and a more ergonomic API.
Q
Query Parameter
A key-value pair appended to a URL (after ?) to filter or configure a request. For example, ?q=python+scraping&country=us passes the search query and geo-targeting settings. SERP APIs accept query parameters for keyword, location, language, device type, and result count.
R
RAG (Retrieval-Augmented Generation)
An AI architecture that combines a language model with a real-time retrieval step - fetching live documents before generating a response. SERP APIs and web search APIs are commonly used as the retrieval layer to ground LLM outputs in current, factual web data.
Rate Limit
A cap on how many API requests you can make within a time window - typically per second and per month. Exceeding it returns a 429 status code. SERP APIs enforce rate limits to protect infrastructure; check your plan limits before building high-frequency polling.
Real-time Data
Data fetched live at the moment of the request, reflecting current state rather than a cached snapshot. Real-time SERP APIs query Google directly on each call, so results reflect current rankings, AI Overviews, and SERP features as they exist right now.
Residential Proxy
A proxy using IP addresses assigned to real residential users by ISPs. Much harder for anti-bot systems to detect than datacenter proxies, since the IPs look like genuine home internet connections. More expensive than datacenter proxies but more reliable for difficult targets.
Robots.txt
A plain-text file at the root of a website (e.g. /robots.txt) that tells crawlers which paths they are allowed or disallowed from accessing. It is a convention for well-behaved crawlers; it has no technical enforcement mechanism.
Rotating Proxy
A proxy setup that assigns a different IP address to each outgoing request, reducing the risk of IP bans from repeated requests. All major SERP API providers use rotating proxy pools internally as part of their anti-detection infrastructure.
S
Schema Markup
Structured data added to a page's HTML (typically as JSON-LD) to help search engines understand content type and context. It enables rich results like star ratings, FAQ dropdowns, and product prices directly in the SERP.
Scraping / Web Scraping
Automatically extracting data from websites by sending HTTP requests, loading pages, and parsing the HTML response. SERP APIs are a managed form of scraping - the provider handles the proxy infrastructure, rendering, parsing, and anti-bot challenges.
Search Console
Google's free tool for monitoring a site's organic search performance. Shows queries, impressions, clicks, average position, and Core Web Vitals. Useful for identifying keywords where you rank but have a low CTR, or pages with indexing errors.
Search Engine Results Page (SERP)
The page a search engine displays in response to a query. A modern Google SERP includes organic results, paid ads, and multiple feature types: Local Pack, People Also Ask, AI Overviews, Shopping results, featured snippets, and more.
Search Intent
The underlying goal of a user's query. The four main types: informational (learn something), navigational (find a specific site), commercial (research before buying), transactional (buy now). Matching content to search intent is a core principle of SEO.
Search Volume
The average number of times a keyword is searched per month, typically measured over a 12-month rolling window. Available through keyword research APIs. High volume does not always mean high opportunity - keyword difficulty and intent matter equally.
SERP API
An API that retrieves Google (or other search engine) results as structured JSON, without requiring you to build or maintain a scraper. The provider manages proxies, CAPTCHA solving, JavaScript rendering, and HTML parsing. You send a query; you get clean data back.
Read full guide →SERP Features
Non-standard result types Google inserts into the search results page beyond the classic "10 blue links". Includes Featured Snippets, Local Pack, People Also Ask, AI Overviews, Shopping carousel, Image pack, Video results, Knowledge Graph, and more.
SERP Volatility
The degree to which search rankings fluctuate over time for a given keyword or set of keywords. High volatility often indicates a Google algorithm update or increased competition. Rank tracking APIs make volatility visible by recording positions daily.
SDK (Software Development Kit)
A pre-built package of libraries, helpers, and documentation that simplifies integrating with an API in a specific language. SerpAPI publishes SDKs for Python, Ruby, Node.js, Go, and PHP. Using an SDK reduces boilerplate and handles common edge cases.
Snippet
The short description text Google shows under a result's title and URL. Generated by Google from page content - not directly controlled by the publisher, though meta descriptions influence it. Returned as a snippet field in SERP API responses.
Structured Data
Data organized in a predictable, machine-readable format with named fields and consistent types. SERP APIs convert raw HTML into structured data (JSON objects), making it trivial to access specific fields like position, title, or PAA questions without writing a custom parser.
Scrapy
An open-source Python web scraping framework designed for large-scale, concurrent crawling. Includes built-in support for request scheduling, middleware, item pipelines, and output to multiple formats (JSON, CSV, databases). Better suited for structured crawl projects than ad-hoc scraping with requests + BeautifulSoup.
Selenium
One of the oldest browser automation frameworks, originally built for web testing. Supports multiple browsers and languages. Widely used for scraping JavaScript-rendered pages but slower than Playwright and Puppeteer, and more easily detected by anti-bot systems due to identifiable browser characteristics.
Session
A series of HTTP requests from a single client, maintained through cookies, tokens, or session IDs. When scraping sites that require login or maintain state between pages, the scraping tool must carry session cookies across requests. SERP APIs handle session management internally - you never deal with cookies directly.
Sitemap.xml
A file at the root of a website that lists all URLs the site owner wants search engines to crawl and index. Useful for web crawlers and scrapers to discover page URLs without following links. Also tells crawlers the last modified date and relative priority of each URL.
T
TLS Fingerprinting
A technique that identifies HTTP clients by the pattern of parameters in their TLS handshake - cipher suites, extensions, and their order. Each client library (Python requests, Node.js fetch, curl) has a distinct fingerprint that can reveal scraper identity even when routing through a proxy. Advanced anti-bot systems layer TLS fingerprinting on top of IP-based detection.
Throttling
Server-side rate limiting that slows or degrades responses for clients exceeding request thresholds, without outright blocking them. Unlike a hard IP ban, throttling returns slower responses or partial data. Adding delays between requests and limiting concurrent connections reduces the risk of triggering it.
U
User Agent
A string sent in HTTP request headers identifying the client software - browser type, OS, version. Anti-bot systems use user agents as one signal to distinguish real browsers from scrapers. Web data APIs and web unblockers rotate realistic user agents automatically.
V
Viewport
The visible area of a web page as rendered by a browser, defined by width and height in pixels. When using a headless browser, setting the correct viewport (e.g. 1920x1080 for desktop, 390x844 for mobile) affects which layout breakpoints are triggered and which elements are rendered. Some anti-bot systems flag non-standard or zero-size viewports as scraper signals.
W
Web Data API
An API that returns structured data extracted from public web sources. Examples include SERP APIs (Google search results), product APIs (Amazon, Walmart), local business APIs (Google Maps), job data APIs (Google Jobs), and finance APIs (stock quotes, property listings).
Web Unblocker
A service that routes web requests through rotating residential proxies, handles JavaScript rendering, and bypasses anti-bot challenges - returning the raw HTML a real browser would see. Used when direct HTTP requests to a target site are consistently blocked.
Webhook
A mechanism where a server pushes data to a specified URL when an event occurs, rather than waiting for the client to poll. Async data APIs like DataForSEO can deliver completed task results via webhook, removing the need to repeatedly check for completion.
X
XPath
A query language for selecting nodes from XML and HTML documents. An alternative to CSS selectors in scraping tools, often preferred for traversing complex or deeply nested structures. Supported by lxml (Python), Scrapy, and browser DevTools.
Related reading
What is a SERP API? →
Deep dive into how SERP APIs work, what data they return, and when to use one.
Best SERP APIs in 2026 →
7 providers tested side-by-side on pricing, data depth, and developer experience.
Real-Time Web Search API →
OpenWeb Ninja's Google SERP API - organic results, PAA, Local Pack, and more.
OpenWeb Ninja vs SerpAPI →
Head-to-head comparison on pricing, features, and code examples.
OpenWeb Ninja vs Serper →
How OWN compares to Serper on price, SERP coverage, and data breadth.
One key. 45+ APIs. Start free.
100 free requests/month - no credit card required. SERP, Amazon, Google Maps, Jobs, Finance, and more under one key.
API by Category
Search
Business & Location
E-commerce & Products
Contact & Social
Jobs & Finance
Didn't find the API you are looking for? Request an API
