Skip to main content
There are two core datasets:
  • People flat file — around 600M profiles
  • Companies flat file — around 50M companies

Flat file contracts can be supplemented with:
  • Data change feed (only profiles that were changed in are delivered) - delivered on daily/weekly/monthly cadence
  • Job change feed (only profiles that changed job) - delivered on daily/weekly/monthly cadence

Every 30 to 60 days, we may deliver a full refresh when new fields become available or significant data improvements are made.

Technical details

  • By default, these dataset are exported as JSONL files. 
  • Parquet and CSV files are also supported.
  • Size:
    • Profiles: 550GB (around 5900 gziped JSONL, each containing around 100k profiles)
    • Companies: 10GB (around 130 gzipped JSONL files, each containing around 350k companies)

How Trials Work

  1. You can request a trial of our flat files by contacting our sales team at theswarm.com/start. Please inquire about our pricing and early-stage program for startups.
  2. Once an MNDA & Data Evaluation Agreement is in place, we deliver the flat files (people + companies) to you.
  3. You get get 14 days to evaluate the data at no cost
  4. If the evaluation meets expectations, a contract is sent and billing starts after the trial period. Please note you only start receiving the daily data change feed and the weekly job change feeds once a paying customer.

How Delivery Works

  • We support many delivery methods, the most popular of which are: 
    • Replication into cloud storage (AWS S3, GCP, Azure, Cloudflare R2) - see details for most popular methods below
    • Delivery directly into data warehouse (Databricks, Snowflake etc.)
  • We offer free support and consultation on ingesting and hosting this data in your system 

Instructions for delivery to GCP

  1. Please send your Google account ID. We will need it to give you access to our S3 bucket. According to Google docs this is the “Google Cloud service account subject ID”.
  2. Full data export is at s3://theswarm-data-exports/v2/profiles/full/
  3. Daily exports at s3://theswarm-data-exports/v2/profiles/data-changed/daily/
  4. After that, we will send you the AWS IAM role ARN needed for authentication.

Instructions for delivery to AWS

  1. Create an AWS S3 bucket for data deliveries.
  2. Enable versioning.
  3. Add the bucket policy using the snippet below, replace YOUR-BUCKET-NAME with your bucket name
  4. When ready, send us the name of your bucket and ID of your AWS account (12 digits).

Policy


{
  "Version": "2012-10-17",
  "Id": "",
  "Statement": [
    {
      "Sid": "Set-permissions-for-objects",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::768778768475:role/service-role/s3crr_role_for_theswarm-data-exports_1"
      },
      "Action": [
        "s3:ReplicateObject",
        "s3:ReplicateDelete",
        "s3:ObjectOwnerOverrideToBucketOwner"
      ],
      "Resource": "arn:aws:s3:::YOUR-BUCKET-NAME/*"
    },
    {
      "Sid": "Set permissions on bucket",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::768778768475:role/service-role/s3crr_role_for_theswarm-data-exports_1"
      },
      "Action": [
        "s3:GetBucketVersioning",
        "s3:PutBucketVersioning"
      ],
      "Resource": "arn:aws:s3:::YOUR-BUCKET-NAME"
    }
  ]
}