QuickGraph#8 Cloning subgraphs between Neo4j instances with Cypher+RDF

I have two Neo4j instances: let’s call them instance-one and instance-two. My problem is simple, I want an easy way to copy fragments of the graph stored in instance-one to instance-two. In this post, I’ll explain here how to use:

  • Cypher to define the subgraph to be cloned and
  • RDF as the model for data exchange (serialisation format)

All with the help of the neosemantics plugin.

The dataset

To keep it simple, I’ll populate instance-one with the northwind graph (you can do it by following a step by step guide on your Neo4j browser: Just run :play northwind-graph and follow the instructions.

Let’s say we want to copy (from instance-one to instance-two) the subgraph of all customer orders for products in the “Beverages” category. The query that returns this subgraph is included in the guide, so le’ts reuse it. Here is what it looks like:

MATCH (cust:Customer)-[pu:PURCHASED]->(or:Order)-[o:ORDERS]->
(p:Product)-[po:PART_OF]->(c:Category {categoryName:"Beverages"}) 
RETURN *

When we run it, in instance-one it returns 450 nodes and 770 relationships.

These are essentially the nodes and relationships in all instances of this pattern terminating in the “Beverages” category.

Screenshot 2019-08-14 at 09.14.16.png

Cloning subgraphs across Neo4j instances

I’ve said many times that RDF is a good model for data exchange, so in this post, I’m going to prove it by using it as the serialisation format for copying graph data across Neo4j instances. Here’s my plan: I’ll run the previous cypher query on instance-one, I’ll serialise the output of that query as RDF using neosemantics and I’ll import the generated RDF into instance-two (again using neosemantics). It will take just a couple of lines of Cypher to do it.

Screenshot 2019-08-13 at 23.38.59.png

I’ve explained in previous posts how to import RDF into Neo4j …and also how to serialise data in Neo4j as RDF. Today we’ll be combining the two.

We’ll start by putting our query in a JSON map that we’ll use to query instance-one through the /rdf/cypher endpoint:

:param lecypher: '{ "cypher": "MATCH (cust:Customer)-[pu:PURCHASED]->(or:Order)-[o:ORDERS]->(p:Product)-[po:PART_OF]->(c:Category {categoryName:\\"Beverages\\"}) RETURN *" }'

This endpoint returns the result of a query serialised as RDF, so we’ll take this RDF and directly ingest it into instance-two using the semantics.importRDF procedure.

Here’s the complete Cypher statement we need to run on instance-two:

CALL semantics.importRDF("http://instance-one:7474/rdf/cypher","Turtle",
{ handleVocabUris: "IGNORE" , headerParams: { Authorization: "Basic " + apoc.text.base64Encode("neo4j:neo") }, payload: $lecypher})

There’s a number of config parameters being used here. You can read about all of them in the Neosemantics manual, but basically, we can see the URL we’re fetching the RDF from, along with all the details of the HTTP request to get the RDF: the query and the header Authorization param because the request is authenticated. We also can see that we’re specifying that the RDF serialisation is Turtle and that we want to ignore RDF namespaces when we import the data.

Screenshot 2019-08-14 at 09.57.41

That’s all, run it and you’ll find in instance-two a copy of the 450 nodes and 770 relationships (along with their properties, labels, etc) returned by the original Cypher query.

Read-Transform-Write

Let’s introduce a small variant of the problem. I want to apply some small transformations to the subgraph copied from instance-one to instance-two. A renaming of a label here, a change in the capitalisation of a relationship type there… this kind of things. The model mapping capability in neosemantics can help us here. Here’s how.

First, we’ll have to define the mappings for the elements we want to transform. Let’s say we want to re-label Customers as Clients, rename the customer attribute customerID as clientID and finally use relationship PLACES_ORDER instead of ORDERS.

These element mappings can be defined on instance-two (the instance importing the subgraph) with the following script. Check the neosemantics manual for more details on defining model mappings.

call semantics.mapping.addSchema("neo4j://vocabulary#","neo");
call semantics.mapping.addMappingToSchema("neo4j://vocabulary#","clientID","customerID");
call semantics.mapping.addMappingToSchema("neo4j://vocabulary#","PLACES_ORDER","ORDERS");
call semantics.mapping.addMappingToSchema("neo4j://vocabulary#","Client","Customer");

Once done, we can check all mappings have been properly created by running

call semantics.mapping.listMappings()

which should produce:

╒═════════════════════╤══════════════╤═══════════════╤══════════════╕
│"schemaNs"           │"schemaPrefix"│"schemaElement"│"elemName"    │
╞═════════════════════╪══════════════╪═══════════════╪══════════════╡
│"neo4j://vocabulary#"│"neo"         │"ORDERS"       │"PLACES_ORDER"│
├─────────────────────┼──────────────┼───────────────┼──────────────┤
│"neo4j://vocabulary#"│"neo"         │"Customer"     │"Client"      │
├─────────────────────┼──────────────┼───────────────┼──────────────┤
│"neo4j://vocabulary#"│"neo"         │"customerID"   │"clientID"    │
└─────────────────────┴──────────────┴───────────────┴──────────────┘

These are the transformations that will be applied. Every schema element in the schemaElement column will be transformed into the one in the elemName column.

Now we can re-run the subgraph cloning as before but this time setting the config parameter handleVocabUris: "MAP" instead of IGNORE as we did before. By doing this, the semantics.importRDF procedure will apply the transformations for the mapped elements on import and we’ll get the modified graph instead of an exact clone.

A couple of things to keep in mind

First one: On import, all nodes will be labelled as Resource and will be given a property called uri that contains a unique identifier needed to link nodes together properly during the export-import process. If you’re familiar with RDF you’ll be familiar with these. One can think of them as explicit unique identifiers for nodes in the graph.

Update Feb-2021: You can now use RDF-STAR as serialisation format and no information will be lost in the process. Check out this revised version of this post.

Second one: Relationship properties will be lost in this process, this is because RDF does not allow the representation of properties in edges. If you need them, your best option is to write a script that migrates them after you’ve run the cloning as described in this post.

And that’s it! Simple, right? Give it a try and share your experience. See you in the next Quickgraph.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s