Recap reads and writes schemas from web services, databases, and schema registries in a standard format.
⭐️ If you like this project, please give it a star! It helps the project get more visibility.
| Format | Read | Write |
|---|---|---|
| Avro | ✅ | ✅ |
| BigQuery | ✅ | |
| Confluent Schema Registry | ✅ | |
| Hive Metastore | ✅ | |
| JSON Schema | ✅ | ✅ |
| MySQL | ✅ | |
| PostgreSQL | ✅ | |
| Protobuf | ✅ | ✅ |
| Snowflake | ✅ | |
| SQLite | ✅ |
Install Recap and all of its optional dependencies:
pip install 'recap-core[all]'You can also select specific dependencies:
pip install 'recap-core[avro,kafka]'See pyproject.toml for a list of optional dependencies.
Recap comes with a command line interface that can list and read schemas from external systems.
List the children of a URL:
recap ls postgresql://user:pass@host:port/testdb[
"pg_toast",
"pg_catalog",
"public",
"information_schema"
]Keep drilling down:
recap ls postgresql://user:pass@host:port/testdb/public[
"test_types"
]Read the schema for the test_types table as a Recap struct:
recap schema postgresql://user:pass@host:port/testdb/public/test_types{
"type": "struct",
"fields": [
{
"type": "int64",
"name": "test_bigint",
"optional": true
}
]
}Recap comes with a stateless HTTP/JSON gateway that can list and read schemas from data catalogs and databases.
Start the server at http://localhost:8000:
recap serveList the schemas in a PostgreSQL database:
curl http://localhost:8000/gateway/ls/postgresql://user:pass@host:port/testdb["pg_toast","pg_catalog","public","information_schema"]And read a schema:
curl http://localhost:8000/gateway/schema/postgresql://user:pass@host:port/testdb/public/test_types{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}The gateway fetches schemas from external systems in realtime and returns them as Recap schemas.
An OpenAPI schema is available at http://localhost:8000/docs.
You can store schemas in Recap's schema registry.
Start the server at http://localhost:8000:
recap servePut a schema in the registry:
curl -X POST \
-H "Content-Type: application/x-recap+json" \
-d '{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]}' \
http://localhost:8000/registry/some_schemaGet the schema (and version) from the registry:
curl http://localhost:8000/registry/some_schema[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]Put a new version of the schema in the registry:
curl -X POST \
-H "Content-Type: application/x-recap+json" \
-d '{"type":"struct","fields":[{"type":"int32","name":"test_int","optional":true}]}' \
http://localhost:8000/registry/some_schemaList schema versions:
curl http://localhost:8000/registry/some_schema/versions[1,2]Get a specific version of the schema:
curl http://localhost:8000/registry/some_schema/versions/1[{"type":"struct","fields":[{"type":"int64","name":"test_bigint","optional":true}]},1]The registry uses fsspec to store schemas in a variety of filesystems like S3, GCS, ABS, and the local filesystem. See the registry docs for more details.
An OpenAPI schema is available at http://localhost:8000/docs.
Recap has recap.converters and recap.clients packages.
- Converters convert schemas to and from Recap schemas.
- Clients read schemas from external systems (databases, schema registries, and so on) and use converters to return Recap schemas.
Read a schema from PostgreSQL:
from recap.clients import create_client
with create_client("postgresql://user:pass@host:port/testdb") as c:
c.schema("testdb", "public", "test_types")Convert the schema to Avro, Protobuf, and JSON schemas:
from recap.converters.avro import AvroConverter
from recap.converters.protobuf import ProtobufConverter
from recap.converters.json_schema import JSONSchemaConverter
avro_schema = AvroConverter().from_recap(struct)
protobuf_schema = ProtobufConverter().from_recap(struct)
json_schema = JSONSchemaConverter().from_recap(struct)Transpile schemas from one format to another:
from recap.converters.json_schema import JSONSchemaConverter
from recap.converters.avro import AvroConverter
json_schema = """
{
"type": "object",
"$id": "https://recap.build/person.schema.json",
"properties": {
"name": {"type": "string"}
}
}
"""
# Use Recap as an intermediate format to convert JSON schema to Avro
struct = JSONSchemaConverter().to_recap(json_schema)
avro_schema = AvroConverter().from_recap(struct)Store schemas in Recap's schema registry:
from recap.storage.registry import RegistryStorage
from recap.types import StructType, IntType
storage = RegistryStorage("file:///tmp/recap-registry-storage")
version = storage.put(
"postgresql://localhost:5432/testdb/public/test_table",
StructType(fields=[IntType(32)])
)
storage.get("postgresql://localhost:5432/testdb/public/test_table")
# Get all versions of a schema
versions = storage.versions("postgresql://localhost:5432/testdb/public/test_table")
# List all schemas in the registry
schemas = storage.ls()Recap's gateway and registry are also available as a Docker image:
docker run \
-p 8000:8000 \
-e RECAP_URLS=["postgresql://user:pass@localhost:5432/testdb"]' \
ghcr.io/recap-build/recap:latestSee Recap's Docker documentation for more details.
See Recap's type spec for details on Recap's type system.
Recap's documentation is available at recap.build.
