Data Repositories Overview

1. Six Data Repository Types/Architectures

Data Fabric
- Allows for distributed data to be quickly/easily accessed, managed, and integrated.
  - This unites data from various sources and provides orchestration, governance, and virtualization.
- Data silos (data separated from other parts of an organization's system) can be easily accessed and have their data analyzed.
  - This supports concentrated and informed decision-making.
Data Lake
- Stores structured, semi-structured, and unstructured ('raw') data in a large and centralized manner.
- Pulls data from multiple (or even many) sources, for both analysis and processing.
- Data models and schemas are not applied (data are in their native formats).
  - This allows for quick and adaptive data analysis and reading.
    - Machine learning and complex analytics/reporting can pull from various kinds of data, due to data lake integration.
Data Mart
- A specialized data warehouse subset (smaller in size and scope and concentrated on a single purpose/subject).
- Data marts can satisfy both departmental and specific business needs by focusing on directly relevant data.
  - Schemas and data models are tailored to specific data analysis/reporting/decision-making purposes.
Data Mesh
- Scalable, accessible, and decentralized through a domain-driven data ownership model.
  - Data are distributed to data owners as self-service products, with potential for data sharing/collaboration and feedback.
    - Owners/managers across an organization can conveniently claim their data.
  - Popular in enterprise environments, because data governance and scalability can get complicated (and bottleneck productivity).
Data Pipeline
- Transfers data from source to target/destination systems, for convenient data storage and/or analysis.
Data Warehouse
- Stores structured data in an organized system, with the potential for data tracking/logging.
  - Popular for business intelligence (BI), decision support, and reporting tools (especially due to consistent data formatting).
  - Data structure is provided through the use of schemas and models, which are predefined by developers/engineers.
  - Data transformation from source systems to a consistent data warehouse format is referred to as an 'extracted, transformed, and loaded', or 'ETL', process.

2. Data-Characteristic Terms

Data Observability
- Observing the performance of, and tracking/monitoring, data at any point in its lifecycle.
Data Quality
- How accurate and consistent data are.
  - Can the data be used to perform reliable analysis and make informed decisions?

TODO: Add examples of different data repo types and how they operate.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Repositories Overview

Table of Contents:

1. Six Data Repository Types/Architectures

2. Data-Characteristic Terms

About

Releases

Packages

chaseofthejungle/data-repositories-overview

Folders and files

Latest commit

History

Repository files navigation

Data Repositories Overview

Table of Contents:

1. Six Data Repository Types/Architectures

2. Data-Characteristic Terms

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages