QuickGraph#8 revisited: LossLess graph copy between Neo4j DBs with RDF-star

Roughly a year and a half ago I posted QuickGraph#8 on how to copy all or part of your graph between neo4j DBs by serialising it as RDF with Neosemantics. It concluded on a sad note though, something along these lines: “Relationship properties will be lost in this process because RDF does not allow the representation of properties in edges”. Well, now we have RDF-star and the problem is solved. This is a brief update to that post where I explain how to overcome that hurdle.

The dataset

For this example, we used the northwind graph. Start your Neo4j browser and run :play northwind-graph . You will get step-by-step instructions and will have the graph loaded in just a few clicks.

The graph copying (serialise – import)

Both source and destination graphs need to have neosemantics (n10s) installed. If you’re using Neo4j desktop you can get n10s installed locally in a click.

But you can also do it manually following the instructions in the docs.

The import is run from the target DB, and it’s a simple invocation of the n10s.rdf.import.fetch method. All we need to do is configure the serialisation format and set the source of the RDF:

  • In order to make sure there’s no data loss (and by that we mean make sure properties in relationships are preserved), we select “Turtle*” as serialisation format, which is the one that the source DB will be producing for us.
  • The source of the RDF is the other Neo4j DB, so we will set the http endpoint rdf/northwind/cypher and we will add the required parameters: the actual cypher query to run on the source DB and the authentication credentials. Also we set the output serialisation format that we want which will obviously be RDF-star.

Let’s do it! But before we run the actual import, we will need to set up the graph to import RDF.

call n10s.graphconfig.init( {handleVocabUris: "IGNORE"})

Ok, so now we’re ready to run the import. Here is the single statement you’ll need to run.

WITH "http://localhost:7474/rdf/northwind/cypher" as sourceDBEndpoint,
 { headerParams: { Authorization: "Basic " + apoc.text.base64Encode("neo4j:neo") }, 
   payload: '{ "cypher": "MATCH (cust:Customer)-[pu:PURCHASED]->(or:Order)-[o:ORDERS]->(p:Product)-[po:PART_OF]->(c) RETURN " , "format": "Turtle"}'} as params
 CALL n10s.rdf.import.fetch(sourceDBEndpoint,"Turtle*", params) 
 YIELD terminationStatus, triplesLoaded, triplesParsed, namespaces, extraInfo, callParams
 RETURN terminationStatus, triplesLoaded, triplesParsed, namespaces, extraInfo, callParams

Let’s disect it:

That’s it. If you think of it, excluding the boilerplate all you need is the credentials to access the source DB, and a cypher query that determines what you want to extract from it. Neosemantics and RDF-star will take care of all the rest and you’ll have in your DB a perfect replica of the source graph.

Like we mentioned in the original post, you can even introduce some transformations in the process by defining schema element mappings, check it out there if that is of interest.

What’s interesting about this revised QG?

Well, the interesting thing is obviously that there is a new way graph serialisation format that makes it really easy to migrate data from one Neo4j instance to another without any data loss, and you can use it today. All you’ll need is your Neo4j db and neosemantics.

Also if you want to follow the RDF-star journey to (hopefully) become a W3C recommendation, join the community group here.

2 thoughts on “QuickGraph#8 revisited: LossLess graph copy between Neo4j DBs with RDF-star

  1. Thank you for your excellent contributions to Neo4j education. I’m currently working my way through the Going Meta YouTube channel as well.

    I notice that all entities imported from RDF gets the label “Resource” associated. I do understand the purpose of tagging imports like this, but the term “Resource” is quite generic and, depending on data domain, is likely to be already used as a first class citizen (label) for instance nodes in the target graph. Can you please consider enhancing the n10s.graphconfig with an option to specify the name of the tagging label (with “Resource” as default, to stay backward compatible)

    Like

    1. Fair point, Nils-Petter. I was thinking of prefixing it with an underscore or something that would minimise the chances of clashing with existing labels. I think it would be a reasonable compromise that would do the job and reduce the impact to the codebase. What do you think?

      Like

Leave a reply to Nils-Petter Ottesen Cancel reply