Transformations on Graph Databases for Polyglot Persistence with NotaQL 2017-03-08 Johannes Schildgen schildgen@cs.uni-kl.de Yannick Krück Stefan Deßloch
Polyglot Persistence 3
Polyglot Persistence db.product.insert({ }) db.category.find() 4
Polyglot Persistence OUT._id <- IN._k.split( _ )[0], OUT.clicks <- SUM(IN._v) Data Transformation db.product.insert({ }) db.category.find() INCR dvd_174_cnt 5
6
Graph 8
G = (V, E) 9
v 1 v 2 10
v 1 v 2 11
v 1 v 2 12
0.2 v 1 v 2 0.6 13
Property Graphs Friend v 1 v 2 firstname:kai, nastname:li since:2015-11-11 firstname:ute, lastname:li 14
1 0 0 0 1 1 1 1 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 15
v 1 : v 2, v 5 v 2 : v 1, v 2, v 3 v 3 : v 3, v 4 v 4 : v 1 v 5 : 16
vid attribute value v1 vorname Kai v1 nachname Li v2 vorname Ute v2 nachname Li v2 geboren 1985-01-01 eid start ziel label e1 v1 v3 folgt e2 v2 v3 Bruder e3 v2 v4 folgt e4 v3 v4 folgt SELECT xv.value, xn.value vid FROM knoten kai, kanten, knoten xv, knoten xn, knotenlabelsv1 WHERE kai.attribute = vorname AND kai.wert = Kai AND v2 kanten.start=kai.vid AND v2 kanten.ziel = xv.vid AND xv.vid = knotenlabels.vid AND v3 xn.vid=xv.vid AND xv.attribute= vorname AND v4 xn.attribute= nachname AND knotenlabels.label = student eid attribute value e1 seit 2015 e3 seit 2014 e4 seit 2015 e4 priorität 5 label person person student person person 17
vid v1 v2 v3 v4 properties {vorname: Kai, nachname: Li } {vorname: Ute, nachname: Li, geboren:date(1985-01-01)} vid v1 v2 v2 v3 v4 label person person student person person eid start ziel label properties e1 v1 v3 folgt { seit : 2015 } e2 v2 v3 Bruder { } e3 v2 v4 folgt { seit: 2014 } e4 v3 v4 folgt { seit : 2015, Priorität:5 } 18
Row-id graph properties edges v1 label person vorname Kai nachname Li folgt_v3 2015 v2 label person vorname nachname geboren Ute Li 1985-01-01 Bruder_v3 folgt_v4-2015 v3 label person v4 label person 19
ergebnis = []; { _id: v1, kai = db.personen.find( label: person, {vorname: Kai },{folgt:1}) vorname: Kai, while(kai.hasnext()) { nachname: Li, p = folgt.next(); for(i in p.folgt) { folgt:[{_id: v2, seit:2015}] id = p.folgt[i]._id; } s = db.personen.findone({_id:id, { label: student, _id: v2, {vorname:1,nachname:1}) label:[ person, student ],.toarray(); vorname: Ute, if(s!=null){ergebnis.concat(s);} nachname: Li, } geboren:1985 } folgt:[{_id: v4, seit:2014, prioritaet:5}], Bruder:[ v3 ] }... 20
subjekt prädikat objekt http://dbpedia.org/resource/ Krefeld_Hauptbahnhof http://dbpedia.org/resource/ Krefeld_Hauptbahnhof http://dbpedia.org/resource/ Krefeld_Hauptbahnhof http://dbpedia.org/resource/ Krefeld_Hauptbahnhof http://dbpedia.org/resource/ Krefeld_Hauptbahnhof http://dbpedia.org/resource/ Germany rdf:type foaf:name http://schema.org/place Krefeld Hauptbahnhof georss:point 51.325833333333335 6.569444444444445 rdf:comment country foaf:name Krefeld Hauptbahnhof ist der größte Bahnhof der Stadt Krefeld. Dort http://dbpedia.org/resource/germany Germany 21
Storage Index Support (+Apache Lucene) Graph Query Languages ACID REST API Replication 22
{ _id: 77, firstname: Kate, age: 38, city: Rome } { _id: 19, firstname: Jane, age: 36, city: Bern } OUT._id <- IN._id, OUT.type <- person, OUT.firstname <- IN.firstname, OUT.age <- IN.age, OUT.city <- IN.city type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern 23
{ _id: 77, firstname: Kate, age: 38, city: Rome } { _id: 19, firstname: Jane, age: 36, city: Bern } OUT._id <- IN._id, OUT.type <- person, OUT.$(IN.*.name()) <- IN.@ type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern 24
{ _id: 77, firstname: Kate, age: 38, city: Rome } { _id: 19, firstname: Jane, age: 36, city: Bern } IN-FILTER: type= person, OUT._id <- IN._id, OUT.$(IN.name()) <- IN.@ type : person _id : 77 name : Kate age : 37 city : Rome type : person _id : 19 name : Jane age : 35 city : Bern 25
Accessing & Traversing Edges 26
type : person _id : 77 name : Kate age : 37 city : Rome friend since:2016-01-01 type : person _id : 19 name : Jane age : 35 city : Bern IN.age 37 35 27
type : person _id : 77 name : Kate age : 37 city : Rome friend since:2016-01-01 type : person _id : 19 name : Jane age : 35 city : Bern IN._>e 28
type : person _id : 77 name : Kate age : 37 city : Rome friend since:2016-01-01 type : person _id : 19 name : Jane age : 35 city : Bern IN._>e.since 2016-01-01 29
type : person _id : 77 name : Kate age : 37 city : Rome friend since:2016-01-01 type : person _id : 19 name : Jane age : 35 city : Bern IN._<e.since 2016-01-01 30
type : person _id : 77 name : Kate age : 37 city : Rome friend since:2016-01-01 type : person _id : 19 name : Jane age : 35 city : Bern IN._e.since 2016-01-01 2016-01-01 31
type : person _id : 77 name : Kate age : 37 city : Rome friend since:2016-01-01 type : person _id : 19 name : Jane age : 35 city : Bern IN._e_ type : person _id : 19 name : Jane age : 35 city : Bern type : person _id : 77 name : Kate age : 37 city : Rome 32
type : person _id : 77 name : Kate age : 37 city : Rome friend since:2016-01-01 type : person _id : 19 name : Jane age : 35 city : Bern IN._e_.name Jane Kate 33
type : person _id : 77 name : Kate age : 37 city : Rome friend since:2016-01-01 type : person _id : 19 name : Jane age : 35 city : Bern IN._e?( friend )_.name Jane Kate 34
35
Creating Edges 36
type : person type : person type : person _id : 77 name : Kate father _id : 25 name : Carl mother _id : 26 name : Carla age : 37 age : 57 age : 77 city : Rome city : Rome city : Rome create an edge to every persons grandmother grandmother via: father OUT._>e?(_id= IN._>e?( mother father )_._>e?( mother )_._id ) <- EDGE( grandmother, via <- IN._>e[@]._l ) 37
Iterative Computations Pagerank: PR q = p in q PR(p) out p OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) 38
Iterative Computations Pagerank: PR q = p in q PR(p) out p REPEAT: 10, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) 39
Iterative Computations Pagerank: PR q = p in q PR(p) out p REPEAT: 99999999, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) 40
Iterative Computations Pagerank: PR q = p in q PR(p) out p REPEAT: -1, OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) 41
Iterative Computations Pagerank: PR q = p in q PR(p) out p REPEAT: pr(0.0005%), OUT._id <- IN._>e_._id, OUT.pr <- SUM(IN.pr/COUNT(IN._>e._id)) 42
Implementation Details 43
44
Tinkerpop Blueprints Generic Graph API 45
Blueprints API Graph graph = new Neo4jGraph("/tmp/my_graph"); for (Vertex v : graph.getvertices()) { System.out.println(v.getId()); System.out.println(v.getProperty( vorname )); for(edge e : v.getedges(out)) {... } } 46
JSON 47
Applications 48
(15,000 vertices and 200,000 edges) 207 Min. 95 Sek. Java & Cypher 49
Graph-Transformationen in MongoDB 50
23 Min. Graph-Transformationen in MongoDB 2 Min. 51
Conclusions NotaQL language extension for graph transformations access / create properties and edges iterative algorithms cross-system graph transformations prototype based on Blueprints and Spark 52