QuickGraph#8 revisited: LossLess graph copy between Neo4j DBs with RDF-star

Roughly a year and a half ago I posted QuickGraph#8 on how to copy all or part of your graph between neo4j DBs by serialising it as RDF with Neosemantics. It concluded on a sad note though, something along these lines: “Relationship properties will be lost in this process because RDF does not allow the representation of properties in edges”. Well, now we have RDF-star and the problem is solved. This is a brief update to that post where I explain how to overcome that hurdle.

The dataset

For this example, we used the northwind graph. Start your Neo4j browser and run :play northwind-graph . You will get step-by-step instructions and will have the graph loaded in just a few clicks.

The graph copying (serialise – import)

Both source and destination graphs need to have neosemantics (n10s) installed. If you’re using Neo4j desktop you can get n10s installed locally in a click.

But you can also do it manually following the instructions in the docs.

The import is run from the target DB, and it’s a simple invocation of the n10s.rdf.import.fetch method. All we need to do is configure the serialisation format and set the source of the RDF:

  • In order to make sure there’s no data loss (and by that we mean make sure properties in relationships are preserved), we select “Turtle*” as serialisation format, which is the one that the source DB will be producing for us.
  • The source of the RDF is the other Neo4j DB, so we will set the http endpoint rdf/northwind/cypher and we will add the required parameters: the actual cypher query to run on the source DB and the authentication credentials. Also we set the output serialisation format that we want which will obviously be RDF-star.

Let’s do it! But before we run the actual import, we will need to set up the graph to import RDF.

call n10s.graphconfig.init( {handleVocabUris: "IGNORE"})

Ok, so now we’re ready to run the import. Here is the single statement you’ll need to run.

WITH "http://localhost:7474/rdf/northwind/cypher" as sourceDBEndpoint,
 { headerParams: { Authorization: "Basic " + apoc.text.base64Encode("neo4j:neo") }, 
   payload: '{ "cypher": "MATCH (cust:Customer)-[pu:PURCHASED]->(or:Order)-[o:ORDERS]->(p:Product)-[po:PART_OF]->(c) RETURN " , "format": "Turtle"}'} as params
 CALL n10s.rdf.import.fetch(sourceDBEndpoint,"Turtle*", params) 
 YIELD terminationStatus, triplesLoaded, triplesParsed, namespaces, extraInfo, callParams
 RETURN terminationStatus, triplesLoaded, triplesParsed, namespaces, extraInfo, callParams

Let’s disect it:

That’s it. If you think of it, excluding the boilerplate all you need is the credentials to access the source DB, and a cypher query that determines what you want to extract from it. Neosemantics and RDF-star will take care of all the rest and you’ll have in your DB a perfect replica of the source graph.

Like we mentioned in the original post, you can even introduce some transformations in the process by defining schema element mappings, check it out there if that is of interest.

What’s interesting about this revised QG?

Well, the interesting thing is obviously that there is a new way graph serialisation format that makes it really easy to migrate data from one Neo4j instance to another without any data loss, and you can use it today. All you’ll need is your Neo4j db and neosemantics.

Also if you want to follow the RDF-star journey to (hopefully) become a W3C recommendation, join the community group here.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s