Skip to content

blank nodes reuse memory addresses, causing problems for persistent stores #418

@doriantaylor

Description

@doriantaylor

Here's another good one: I created rdf-lmdb a while ago but only recently noticed, incidentally when I had a structure containing a lot of RDF::Lists, that blank node IDs got derived from memory addresses that would reliably repeat across runs on Linux (although only occasionally on macOS which is why it took me so long to notice). The net effect is that I would get bnode structures in persistent storage that were all snarled up together.

Now, I should probably go and make some kind of bnode mapping table in rdf-lmdb, but I'm also wondering if it would make sense to generalize such a mapping table, since it's a common thing. That, or at least have some way to override RDF::Node's ID generating function in the singleton class (which is what I ended up doing to solve my short-term problem).

Anyway, I think what I am proposing is a module that one can include into, for example, a persistent RDF::Repository, that will automatically map whatever bnodes get thrown at it to guaranteed-unique (does that count as skolemized?) identifiers that are passed along to the persistent store, and translated back on the way back out. This mapping would be kept around for the duration of the process. As for the guaranteed-unique identifiers, the answer is of course UUIDs (which indeed the current RDF::Node implementation is already capable of), though because they are shorter, I propose the using the uuid-ncname representation I designed some time ago for just this purpose. See my monkey-patched implementation for what that looks like:

[8] pry(main)> n = RDF::Node.new
=> #<RDF::Node:0x3e8(_:Enc7Jm1BRnK9y9jKSMCzBJ)>
[9] pry(main)> n.id
=> "Enc7Jm1BRnK9y9jKSMCzBJ"
[10] pry(main)> UUID::NCName.from_ncname n.id, version: 1
=> "9dcec99b-5051-49ca-9f72-f63292302cc1"

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions