Skip to content

Knowledge Graph Storage Component #99

@cbizon

Description

@cbizon

On our Miro board, we have a component labeled "KG-Hub". This is the location where KGX or similar files created by data ingestion are held and transferred to the data access working group.

We need to make a few decisions around this component.

First, what is it called? KG-Hub is a specific tool which we might or might not use (see below). So I propose that we call this component something else for the time being until we decide what it is to avoid confusion. Maybe something simple like KGStore.

Second, how is KGStore implemented. I think we're expecting that these files will be kept in some form of cloud storage such as an S3 bucket (or whatever is appropriate for the cloud provider that ITRB is using).

But on top of that, do we want to use anything around that bucket? I am aware of 3 options, there are probably more:

  1. Nope, just a regular old bare bucket. We make some rules about the structure of the filesystem. Very low overhead/upkeep but also provides the least support
  2. KG-Hub https://github.com/knowledge-Graph-Hub. This tool comes from LBNL so we have institutional knowledge. Handles metadata and references graphs that are stored in other places (e.g. our bucket).
  3. lakefs (https://lakefs.io/) gives you git-like versioning and actions on the data. Can be backed with numerous storage types. We have some experience with this from a different project (ProtoOKN).

For the group, I have a few questions:

@sierra-moxon (?) can you say a couple of words about what features kghub could provide?
@YaphetKG could you add a comment about what lakefs could provide?

others - are there other options in this space that we should consider?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions