Skip to content

SpareCores/sc-data-dumps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2,640 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spare Cores Navigator Data Dumps

This repository contains filtered data dumps of the Spare Cores Navigator database, collected via the Spare Cores Crawler tool.

The crawler is scheduled to run every 5 minutes to update spot prices of most vendors (e.g. Azure lookups are slow so we cannot do that so frequently), and hourly to update all region, availability zone, server, storage, traffic etc vendor data at the supported vendors.

The most recent version of the collected data is available in a single compressed SQLite database file. See the References section for exact location.

This repository contains JSON dumps of selected tables of this database to facilitate data exploration and change tracking of records via git diffs. Note that pricing and benchmark data is excluded due to repository size limits and the high update frequency of these datasets.

License

The data records published in this repository and the referenced SQLite database file are licensed under CC BY-SA 4.0.

In case you are not comfortable with these licensing terms, please contact us to discuss your needs.

Use

  • The JSON dumps of this repo are tagged with the version number of the sparecores-crawler tool that was used to generate them. In case you need a compatible dataset for an earlier version of the Crawler, you can check the tags and use the corresponding commit.
  • Although most tables are included in this repository, some tables are excluded due to repository size limits and the high update frequency of these datasets. If you need pricing and benchmark data, you can use the SQLite database file referenced below.
  • In case of Python, we recommend using the sparecores-data package to access the data, as it comes with helpers to automatically fetch the latest version of the database file and update it periodically in a background thread.
  • For other languages or in case you are looking for a managed database solution, you can use our public Navigator API to query the data.

Repository Structure

The repository is updated via the dump command of the sparecores-crawler tool. In short, it creates a folder for each table of the SQLite database and dumps each record as a prettified JSON file, named after its primary keys.

Example path for the t3a.small server record by AWS:

server/aws/t3a.small.json

Example to count the number of monitored servers with 200+ vCPUs:

$ find server -name '*.json' -exec cat {} \; | jq -c 'select(.vcpus > 200)' | wc -l
30

Historical Records

Data collection started in Q1 of 2024 as part of the sparecores-data package. Later (Q1 2026) we decided to separate the collected data from the thin Python package due to release management and licensing complexities, and this repository was born with a much cleaner design.

Historical records were reconstructed in this repository from the Data repository, but you can find the original binary SQLite database file in the git history of the Data repository if needed. All backfill commits reference the original commit hash and URL on GitHub.

Further References

About

The most comprehensive open dataset of cloud server specifications, regions, zones, storage types, pricing, and benchmark-measured performance across 5,000+ instance types ⛅

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors