Developers
Scrape Exchange is built API-first. Access bulk data, query specific records, stream real-time updates, and contribute your own scraped datasets.
Access Data
Bulk Data Dumps
Full database exports are available for bulk download. Whenever possible, use the BitTorrent protocol to reduce server load — we provide .torrent files for each dump, which any standard client (qBittorrent, Transmission, etc.) can open. Distributing load across seeders means faster downloads for you and lower infrastructure costs for us.
Data dumps are available on the downloads page.
REST API
Call the REST API to retrieve specific records or apply filters not available in the bulk dumps — filter by platform, entity type, uploader, creator ID, content ID, and more. No authentication is required for read access.
Full reference: scrape.exchange/docs
Real-Time WebSocket Feed
Subscribe to a live stream of new uploads via the WebSocket listener API. Choose what you receive: full scraped data, Scrape Exchange upload metadata only, or the platform metadata for each item. Filter by platform, uploader, entity type, or content creator.
A ready-to-use listener tool is available in the scrape-python repository.
Contribute Data
Upload via API
Upload scraped data programmatically via the REST API. Sign up for a free account to get your API keys. When choosing a schema for your upload, use an existing one where possible — it reduces friction for people who download the data you contribute.
Python examples are available in the scrape-python repository under the tools/ folder.
Register New JSON Schemas
If you are scraping data not yet covered by an existing schema — or scraping a new platform — you can register a new JSON Schema. Schemas must meet the following requirements:
- —Written against the JSONSchema specification draft 2020-12
- —Describes an entity type on one of the supported platforms (e.g. YouTube channel, TikTok video, Instagram comment)
- —Self-contained — no external $refs; URLs may only point to jsonschema.org or scrape.exchange domains
- —Versioned with semantic versioning (semver)
Schema names must follow the convention:
For examples of valid schemas, see the schemas in scrape-python/tests/collateral.
Data Quality
Data quality is a primary focus. Contributions are most valuable when they are complete and verifiable. Before uploading, ensure that:
- —Your data validates against the JSON schema you specify
- —Your data validates against the source URL included in the upload
- —You cover as many schema fields as possible, not just the required ones
Data quality metrics are tracked and displayed in the stats section. A reputation system for uploaders is planned for a future release.
Community
Discuss Scrape Exchange with other developers, share projects built with the data, ask for help, or give feedback: