QuickGraph #1 European Politics from DBpedia. Loading data from an RDF triple store into Neo4j via SPARQL

The first of a series of quick graphs in Neo4j built from public data. Watch this space! I’ll analyse a dataset on European politics by building a graph and querying across a number of dimensions.

The dataset

For this example I’ve used DBpedia data about European cities, the political parties in their local governments and their ideologies. So big disclaimer, this data is not meant to be complete or more accurate than what can be expected from DBpedia/Wikipedia data. The DBpedia stores data using the RDF model so in order to query it we’ll have to use the SPARQL query language.

Here is the SPARQL query I’ve used to extract the cities and countries and the political parties currently in the local government. The query can be tested on the DBPedia SPARQL endpoint.

select distinct ?party ?city ?cityName ?ctr ?ctrName ?pop
where {

     ?city a dbo:Location ;
          rdfs:label ?cityName ;
           dbo:country ?ctr ;
           dbo:leaderParty ?party .
     FILTER(langMatches(lang(?cityName), "EN"))

     ?ctr dct:subject dbc:Countries_in_Europe ;
          rdfs:label ?ctrName .
     FILTER(langMatches(lang(?ctrName), "EN"))

     optional { ?city dbo:populationTotal ?pop }

And a second SPARQL query that returns for each party, the ideologies they are associated with. Again, according to DBpedia/Wikipedia of the different political parties.

select distinct ?party ?partyName ?ideology ?ideologyName where {

     ?city a dbo:Location ;
           dbo:country ?ctr ;
           dbo:leaderParty ?party .

     ?ctr dct:subject dbc:Countries_in_Europe .

     ?party dbo:ideology ?ideology ;
            rdfs:label ?partyName .
     FILTER(langMatches(lang(?partyName), "EN"))

     ?ideology rdfs:label ?ideologyName
     FILTER(langMatches(lang(?ideologyName), "EN"))

SPARQL queries of type SELECT return a set of variable bindings, or in other words, a tabular structure that the endpoint can serialise as CSV. This is quite convenient because it can be used directly by the LOAD CSV instruction in Cypher.

Loading the data into Neo4j

Here are the two Cypher statements that create the model in Neo4j. You may also want to create an index on city nodes first to get better performance:

CREATE INDEX ON :City(city_uri)
WITH "http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+distinct+%3Fparty+%3Fcity+%3FcityName+%3Fctr+%3FctrName+%3Fpop+where+%7B%0D%0A%3Fcity+a+dbo%3ALocation+%3B%0D%0A++++++dbo%3Acountry+%3Fctr+%3B%0D%0A++++++dbo%3AleaderParty+%3Fparty+.%0D%0A%0D%0A%3Fctr+dct%3Asubject+dbc%3ACountries_in_Europe+.%0D%0A%0D%0A%3Fparty+dbo%3Aideology+%3Fideology+.%0D%0A%0D%0Aoptional+%7B+%3Fcity+dbo%3ApopulationTotal+%3Fpop+%7D%0D%0A%0D%0A%3Fcity+rdfs%3Alabel+%3FcityName%0D%0AFILTER%28langMatches%28lang%28%3FcityName%29%2C+%22EN%22%29%29%0D%0A%0D%0A%3Fctr+rdfs%3Alabel+%3FctrName%0D%0AFILTER%28langMatches%28lang%28%3FctrName%29%2C+%22EN%22%29%29%0D%0A%0D%0A%7D&format=csv" AS queryOnSPARQLEndpoint
WITH row
MERGE (cty:City {city_uri: row.city}) 
      SET cty.city_pop= coalesce(row.pop,0), cty.city_name = row.cityName
MERGE (ctr:Country {ctry_uri: row.ctr}) 
      SET ctr.ctry_name = row.ctrName
MERGE (pty:Party {party_uri: row.party})
MERGE (cty)-[:CTY_IN_COUNTRY]->(ctr)

Note that the query runs directly off the DBpedia SPARQL endpoint (same for the next one) so if the endpoint is down for maintenance or any other reason, this script won’t do much 🙂

WITH "http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+distinct+%3Fparty+%3FpartyName+%3Fideology+%3FideologyName+where+%7B%0D%0A%0D%0A%3Fcity+a+dbo%3ALocation+%3B%0D%0A++++++dbo%3Acountry+%3Fctr+%3B%0D%0A++++++dbo%3AleaderParty+%3Fparty+.%0D%0A%0D%0A%3Fctr+dct%3Asubject+dbc%3ACountries_in_Europe+.%0D%0A%0D%0A%3Fparty+dbo%3Aideology+%3Fideology+%3B%0D%0A+++++++rdfs%3Alabel+%3FpartyName+.%0D%0AFILTER%28langMatches%28lang%28%3FpartyName%29%2C+%22EN%22%29%29%0D%0A%0D%0A%3Fideology+rdfs%3Alabel+%3FideologyName%0D%0AFILTER%28langMatches%28lang%28%3FideologyName%29%2C+%22EN%22%29%29%0D%0A%0D%0A%7D&format=csv" AS queryOnSPARQLEndpoint
WITH row
MERGE (pty:Party {party_uri: row.party}) 
      SET pty.party_name = row.partyName
MERGE (ide:Ideology {ideology_uri: row.ideology}) 
      SET ide.ideology_name = row.ideologyName
MERGE (pty)-[:HAS_IDEOLOGY]->(ide)

And that’s pretty much it. The data in your graph should look something like this for a given city, for instance Bern in Switzerland:


Screen Shot 2016-07-27 at 10.22.33

Or like this for a few cities in Spain:

Screen Shot 2016-07-27 at 15.40.54

Querying the graph

Now a couple of interesting queries:

Which are the most transnational ideologies in Europe?

MATCH (ctr:Country)<-[:CTY_IN_COUNTRY]-(city)-[:GOVERNING_PARTY]->(party)-[:HAS_IDEOLOGY]->(i:Ideology)
RETURN i.ideology_name, COUNT(DISTINCT(ctr)) AS presentInCountries, COLLECT(DISTINCT(ctr.ctry_name)) AS CountryList
ORDER BY presentInCountries DESC LIMIT 5

Resulting in the following records:

│i.ideology_name    │presentInCountries│CountryList                   │
│Social democracy   │25                │[Bosnia and Herzegovina, Bulga│
│                   │                  │ria, Albania, Austria, Denmark│
│                   │                  │, United Kingdom, Estonia, Cze│
│                   │                  │ch Republic, Turkey, Switzerla│
│                   │                  │nd, Croatia, Cyprus, Spain, Sw│
│                   │                  │eden, Greece, Germany, France,│
│                   │                  │ Lithuania, Italy, Slovenia, S│
│                   │                  │erbia, Romania, Portugal, Norw│
│                   │                  │ay, Netherlands]              │
│Christian democracy│17                │[Croatia, Bulgaria, Bosnia and│
│                   │                  │ Herzegovina, Austria, Estonia│
│                   │                  │, Switzerland, Germany, France│
│                   │                  │, Italy, Lithuania, Spain, Slo│
│                   │                  │venia, Slovakia, Serbia, Roman│
│                   │                  │ia, Poland, Netherlands]      │
│Euroscepticism     │15                │[Ukraine, Turkey, Croatia, Swi│
│                   │                  │tzerland, Romania, Hungary, Gr│
│                   │                  │eece, Germany, France, United │
│                   │                  │Kingdom, Lithuania, Italy, Spa│
│                   │                  │in, Serbia, Netherlands]      │
│Conservatism       │14                │[Bulgaria, Kosovo, Bosnia and │
│                   │                  │Herzegovina, Austria, Turkey, │
│                   │                  │Estonia, Denmark, Cyprus, Croa│
│                   │                  │tia, France, Spain, Italy, Slo│
│                   │                  │venia, Romania]               │
│Pro-Europeanism    │13                │[Bosnia and Herzegovina, Bulga│
│                   │                  │ria, Albania, Kosovo, Turkey, │
│                   │                  │Denmark, Switzerland, Croatia,│
│                   │                  │ Greece, Spain, Romania, Polan│
│                   │                  │d, Norway]                    │

Which are the most ideologically similar parties from different countries?

MATCH (p1:Party)-[:HAS_IDEOLOGY]->(i)<-[:HAS_IDEOLOGY]-(p2:Party)
WHERE (ID(p1) < ID(p2))
WITH p1, p2, COUNT(DISTINCT(i)) AS sharedIdeologyCount, 
     COLLECT(DISTINCT(i.ideology_name)) AS sharedIdeologies
WHERE sharedIdeologyCount > 2
WHERE ctr1 <> ctr2
RETURN DISTINCT p1.party_name AS Party1, ctr1.ctry_name AS Country1, 
      p2.party_name AS Party2, ctr2.ctry_name AS Country2, 
      sharedIdeologyCount, sharedIdeologies 
ORDER BY sharedIdeologyCount DESC LIMIT 5

Producing these results:

│Party1           │Party2           │sharedIdeologyCou│sharedIdeologies │
│                 │                 │nt               │                 │
│Croatian Party of│Independent Greek│3                │[Euroscepticism, │
│ Rights from Croa│s from Greece    │                 │Social conservati│
│tia              │                 │                 │sm, National cons│
│                 │                 │                 │ervatism]        │
│Movement for Righ│Convergence and U│3                │[Liberalism, Popu│
│ts and Freedoms f│nion from Spain  │                 │lism, Centrism]  │
│rom Bulgaria     │                 │                 │                 │
│Greater Romania P│Swiss People's Pa│3                │[Euroscepticism, │
│arty from Romania│rty from Switzerl│                 │Right-wing populi│
│                 │and              │                 │sm, National cons│
│                 │                 │                 │ervatism]        │
│Movement for Righ│ANO 2011 from Cze│3                │[Liberalism, Cent│
│ts and Freedoms f│ch Republic      │                 │rism, Populism]  │
│rom Bulgaria     │                 │                 │                 │
│Independent Greek│Swiss People's Pa│3                │[Euroscepticism, │
│s from Greece    │rty from Switzerl│                 │Right-wing populi│
│                 │and              │                 │sm, National cons│
│                 │                 │                 │ervatism]        │


What’s interesting about this QuickGraph?

The graph is built directly from querying an RDF triple store via a SPARQL endpoint and consuming the output of the SPARQL query directly with ‘LOAD CSV’ in Cypher.

2 thoughts on “QuickGraph #1 European Politics from DBpedia. Loading data from an RDF triple store into Neo4j via SPARQL

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s