The first of a series of quick graphs in Neo4j built from public data. Watch this space! I’ll analyse a dataset on European politics by building a graph and querying across a number of dimensions.
The dataset
For this example I’ve used DBpedia data about European cities, the political parties in their local governments and their ideologies. So big disclaimer, this data is not meant to be complete or more accurate than what can be expected from DBpedia/Wikipedia data. The DBpedia stores data using the RDF model so in order to query it we’ll have to use the SPARQL query language.
Here is the SPARQL query I’ve used to extract the cities and countries and the political parties currently in the local government. The query can be tested on the DBPedia SPARQL endpoint.
select distinct ?party ?city ?cityName ?ctr ?ctrName ?pop
where {
?city a dbo:Location ;
rdfs:label ?cityName ;
dbo:country ?ctr ;
dbo:leaderParty ?party .
FILTER(langMatches(lang(?cityName), "EN"))
?ctr dct:subject dbc:Countries_in_Europe ;
rdfs:label ?ctrName .
FILTER(langMatches(lang(?ctrName), "EN"))
optional { ?city dbo:populationTotal ?pop }
}
And a second SPARQL query that returns for each party, the ideologies they are associated with. Again, according to DBpedia/Wikipedia of the different political parties.
select distinct ?party ?partyName ?ideology ?ideologyName where {
?city a dbo:Location ;
dbo:country ?ctr ;
dbo:leaderParty ?party .
?ctr dct:subject dbc:Countries_in_Europe .
?party dbo:ideology ?ideology ;
rdfs:label ?partyName .
FILTER(langMatches(lang(?partyName), "EN"))
?ideology rdfs:label ?ideologyName
FILTER(langMatches(lang(?ideologyName), "EN"))
}
SPARQL queries of type SELECT return a set of variable bindings, or in other words, a tabular structure that the endpoint can serialise as CSV. This is quite convenient because it can be used directly by the LOAD CSV instruction in Cypher.
Loading the data into Neo4j
Here are the two Cypher statements that create the model in Neo4j. You may also want to create an index on city nodes first to get better performance:
CREATE INDEX ON :City(city_uri)
WITH "http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+distinct+%3Fparty+%3Fcity+%3FcityName+%3Fctr+%3FctrName+%3Fpop+where+%7B%0D%0A%3Fcity+a+dbo%3ALocation+%3B%0D%0A++++++dbo%3Acountry+%3Fctr+%3B%0D%0A++++++dbo%3AleaderParty+%3Fparty+.%0D%0A%0D%0A%3Fctr+dct%3Asubject+dbc%3ACountries_in_Europe+.%0D%0A%0D%0A%3Fparty+dbo%3Aideology+%3Fideology+.%0D%0A%0D%0Aoptional+%7B+%3Fcity+dbo%3ApopulationTotal+%3Fpop+%7D%0D%0A%0D%0A%3Fcity+rdfs%3Alabel+%3FcityName%0D%0AFILTER%28langMatches%28lang%28%3FcityName%29%2C+%22EN%22%29%29%0D%0A%0D%0A%3Fctr+rdfs%3Alabel+%3FctrName%0D%0AFILTER%28langMatches%28lang%28%3FctrName%29%2C+%22EN%22%29%29%0D%0A%0D%0A%7D&format=csv" AS queryOnSPARQLEndpoint LOAD CSV WITH HEADERS FROM queryOnSPARQLEndpoint AS row WITH row MERGE (cty:City {city_uri: row.city}) SET cty.city_pop= coalesce(row.pop,0), cty.city_name = row.cityName MERGE (ctr:Country {ctry_uri: row.ctr}) SET ctr.ctry_name = row.ctrName MERGE (pty:Party {party_uri: row.party}) MERGE (cty)-[:CTY_IN_COUNTRY]->(ctr) MERGE (cty)-[:GOVERNING_PARTY]->(pty)
Note that the query runs directly off the DBpedia SPARQL endpoint (same for the next one) so if the endpoint is down for maintenance or any other reason, this script won’t do much 🙂
WITH "http://dbpedia.org/sparql?default-graph-uri=http%3A%2F%2Fdbpedia.org&query=select+distinct+%3Fparty+%3FpartyName+%3Fideology+%3FideologyName+where+%7B%0D%0A%0D%0A%3Fcity+a+dbo%3ALocation+%3B%0D%0A++++++dbo%3Acountry+%3Fctr+%3B%0D%0A++++++dbo%3AleaderParty+%3Fparty+.%0D%0A%0D%0A%3Fctr+dct%3Asubject+dbc%3ACountries_in_Europe+.%0D%0A%0D%0A%3Fparty+dbo%3Aideology+%3Fideology+%3B%0D%0A+++++++rdfs%3Alabel+%3FpartyName+.%0D%0AFILTER%28langMatches%28lang%28%3FpartyName%29%2C+%22EN%22%29%29%0D%0A%0D%0A%3Fideology+rdfs%3Alabel+%3FideologyName%0D%0AFILTER%28langMatches%28lang%28%3FideologyName%29%2C+%22EN%22%29%29%0D%0A%0D%0A%7D&format=csv" AS queryOnSPARQLEndpoint LOAD CSV WITH HEADERS FROM queryOnSPARQLEndpoint AS row WITH row MERGE (pty:Party {party_uri: row.party}) SET pty.party_name = row.partyName MERGE (ide:Ideology {ideology_uri: row.ideology}) SET ide.ideology_name = row.ideologyName MERGE (pty)-[:HAS_IDEOLOGY]->(ide)
And that’s pretty much it. The data in your graph should look something like this for a given city, for instance Bern in Switzerland:
Or like this for a few cities in Spain:
Querying the graph
Now a couple of interesting queries:
MATCH (ctr:Country)<-[:CTY_IN_COUNTRY]-(city)-[:GOVERNING_PARTY]->(party)-[:HAS_IDEOLOGY]->(i:Ideology) RETURN i.ideology_name, COUNT(DISTINCT(ctr)) AS presentInCountries, COLLECT(DISTINCT(ctr.ctry_name)) AS CountryList ORDER BY presentInCountries DESC LIMIT 5
Resulting in the following records:
╒═══════════════════╤══════════════════╤══════════════════════════════╕ │i.ideology_name │presentInCountries│CountryList │ ╞═══════════════════╪══════════════════╪══════════════════════════════╡ │Social democracy │25 │[Bosnia and Herzegovina, Bulga│ │ │ │ria, Albania, Austria, Denmark│ │ │ │, United Kingdom, Estonia, Cze│ │ │ │ch Republic, Turkey, Switzerla│ │ │ │nd, Croatia, Cyprus, Spain, Sw│ │ │ │eden, Greece, Germany, France,│ │ │ │ Lithuania, Italy, Slovenia, S│ │ │ │erbia, Romania, Portugal, Norw│ │ │ │ay, Netherlands] │ ├───────────────────┼──────────────────┼──────────────────────────────┤ │Christian democracy│17 │[Croatia, Bulgaria, Bosnia and│ │ │ │ Herzegovina, Austria, Estonia│ │ │ │, Switzerland, Germany, France│ │ │ │, Italy, Lithuania, Spain, Slo│ │ │ │venia, Slovakia, Serbia, Roman│ │ │ │ia, Poland, Netherlands] │ ├───────────────────┼──────────────────┼──────────────────────────────┤ │Euroscepticism │15 │[Ukraine, Turkey, Croatia, Swi│ │ │ │tzerland, Romania, Hungary, Gr│ │ │ │eece, Germany, France, United │ │ │ │Kingdom, Lithuania, Italy, Spa│ │ │ │in, Serbia, Netherlands] │ ├───────────────────┼──────────────────┼──────────────────────────────┤ │Conservatism │14 │[Bulgaria, Kosovo, Bosnia and │ │ │ │Herzegovina, Austria, Turkey, │ │ │ │Estonia, Denmark, Cyprus, Croa│ │ │ │tia, France, Spain, Italy, Slo│ │ │ │venia, Romania] │ ├───────────────────┼──────────────────┼──────────────────────────────┤ │Pro-Europeanism │13 │[Bosnia and Herzegovina, Bulga│ │ │ │ria, Albania, Kosovo, Turkey, │ │ │ │Denmark, Switzerland, Croatia,│ │ │ │ Greece, Spain, Romania, Polan│ │ │ │d, Norway] │ └───────────────────┴──────────────────┴──────────────────────────────┘
Which are the most ideologically similar parties from different countries?
MATCH (p1:Party)-[:HAS_IDEOLOGY]->(i)<-[:HAS_IDEOLOGY]-(p2:Party)
WHERE (ID(p1) < ID(p2))
WITH p1, p2, COUNT(DISTINCT(i)) AS sharedIdeologyCount,
COLLECT(DISTINCT(i.ideology_name)) AS sharedIdeologies
WHERE sharedIdeologyCount > 2
MATCH (p1)<-[:GOVERNING_PARTY]-()-[:CTY_IN_COUNTRY]->(ctr1),
(p2)<-[:GOVERNING_PARTY]-()-[:CTY_IN_COUNTRY]->(ctr2)
WHERE ctr1 <> ctr2
RETURN DISTINCT p1.party_name AS Party1, ctr1.ctry_name AS Country1,
p2.party_name AS Party2, ctr2.ctry_name AS Country2,
sharedIdeologyCount, sharedIdeologies
ORDER BY sharedIdeologyCount DESC LIMIT 5
Producing these results:
╒═════════════════╤═════════════════╤═════════════════╤═════════════════╕ │Party1 │Party2 │sharedIdeologyCou│sharedIdeologies │ │ │ │nt │ │ ╞═════════════════╪═════════════════╪═════════════════╪═════════════════╡ │Croatian Party of│Independent Greek│3 │[Euroscepticism, │ │ Rights from Croa│s from Greece │ │Social conservati│ │tia │ │ │sm, National cons│ │ │ │ │ervatism] │ ├─────────────────┼─────────────────┼─────────────────┼─────────────────┤ │Movement for Righ│Convergence and U│3 │[Liberalism, Popu│ │ts and Freedoms f│nion from Spain │ │lism, Centrism] │ │rom Bulgaria │ │ │ │ ├─────────────────┼─────────────────┼─────────────────┼─────────────────┤ │Greater Romania P│Swiss People's Pa│3 │[Euroscepticism, │ │arty from Romania│rty from Switzerl│ │Right-wing populi│ │ │and │ │sm, National cons│ │ │ │ │ervatism] │ ├─────────────────┼─────────────────┼─────────────────┼─────────────────┤ │Movement for Righ│ANO 2011 from Cze│3 │[Liberalism, Cent│ │ts and Freedoms f│ch Republic │ │rism, Populism] │ │rom Bulgaria │ │ │ │ ├─────────────────┼─────────────────┼─────────────────┼─────────────────┤ │Independent Greek│Swiss People's Pa│3 │[Euroscepticism, │ │s from Greece │rty from Switzerl│ │Right-wing populi│ │ │and │ │sm, National cons│ │ │ │ │ervatism] │ └─────────────────┴─────────────────┴─────────────────┴─────────────────┘
What’s interesting about this QuickGraph?
The graph is built directly from querying an RDF triple store via a SPARQL endpoint and consuming the output of the SPARQL query directly with ‘LOAD CSV’ in Cypher.
2 thoughts on “QuickGraph #1 European Politics from DBpedia. Loading data from an RDF triple store into Neo4j via SPARQL”