-- Users and their posts SELECT u.name, p.title FROM users u JOIN posts p ON p.user_id = u.id;
"For several decades, developers have tried to accommodate connected, semi-structured datasets inside relational databases. But whereas relational databases were initially designed to codify paper forms and tabular structures — something they do exceedingly well — they struggle when attempting to model the adhoc, exceptional relationships that crop up in the real world. Ironically, relational databases deal poorly with relationships."
Graph Databases — Ian Robinson, Jim Webber, Emil Eifrem
Users and Friendships.
CREATE TABLE users ( id INT PRIMARY KEY, name VARCHAR(100) ); CREATE TABLE friendships ( user_id INT REFERENCES users(id), friend_id INT REFERENCES users(id), PRIMARY KEY (user_id, friend_id) );
Alice ── Bob ── Dave ── Frank └── Carol ── Eve
SELECT u.name AS friend FROM friendships f JOIN users u ON u.id = f.friend_id WHERE f.user_id = 1;
SELECT DISTINCT u2.name AS fof FROM friendships f1 JOIN friendships f2 ON f2.user_id = f1.friend_id JOIN users u2 ON u2.id = f2.friend_id WHERE f1.user_id = 1 AND f2.friend_id != 1 AND f2.friend_id NOT IN ( SELECT friend_id FROM friendships WHERE user_id = 1 );
SELECT DISTINCT u3.name AS third FROM friendships f1 JOIN friendships f2 ON f2.user_id = f1.friend_id JOIN friendships f3 ON f3.user_id = f2.friend_id JOIN users u3 ON u3.id = f3.friend_id WHERE f1.user_id = 1 AND f3.friend_id != 1 AND f3.friend_id NOT IN ( SELECT friend_id FROM friendships WHERE user_id = 1 ) AND f3.friend_id NOT IN ( SELECT f2i.friend_id FROM friendships f1i JOIN friendships f2i ON f2i.user_id = f1i.friend_id WHERE f1i.user_id = 1 );
| Depth | JOINs needed | Subqueries | Lines of SQL | Status |
|---|---|---|---|---|
| 1 hop | 1 | 0 | ~5 | ✓ Clean |
| 2 hops | 2 | 1 | ~12 | Needs a comment |
| 3 hops | 3 | 2 | ~25 | Requires explanation |
| N hops | N | N−1 | O(N²) | Unmaintainable |
| Variable depth | ? | ? | ? | Impossible without recursion |
WITH RECURSIVE reachable AS ( -- Base case: Alice's direct friends (depth 1) SELECT f.friend_id AS user_id, 1 AS depth, ARRAY[1, f.friend_id] AS visited -- track path to prevent cycles FROM friendships f WHERE f.user_id = 1 UNION ALL -- Recursive case: follow one more hop SELECT f.friend_id, r.depth + 1, r.visited || f.friend_id FROM friendships f JOIN reachable r ON r.user_id = f.user_id WHERE f.friend_id != ALL(r.visited) -- avoid cycles AND r.depth < 3 -- stop at depth 3 ) SELECT u.name, r.depth FROM reachable r JOIN users u ON u.id = r.user_id WHERE r.user_id != 1 ORDER BY r.depth, u.name;
| Database | Support | Notes |
|---|---|---|
| PostgreSQL | ✅ Full | Since v8.4 (2008). Cycle detection via CYCLE clause since v14 |
| MySQL 8.0+ | ✅ Full | Since v8.0 (2018). Not in MySQL 5.x — still common in legacy |
| SQL Server | ✅ Full | Max recursion depth defaults to 100 — set OPTION (MAXRECURSION N) |
| Oracle | ✅ Full | Since 11g. Also has proprietary CONNECT BY syntax |
| SQLite | ✅ Full | Since v3.8.3 (2014) |
| BigQuery | ✅ Full | Added in 2022 — notably late for a major platform |
| Redshift | ⚠️ Partial | Poor performance on deep recursion — not recommended for graph traversal |
| Hive / Spark SQL | ❌ None | Workaround: iterative jobs or graph libraries (GraphX, GraphFrames) |
| Type | What it means | Example |
|---|---|---|
| Directed | A→B ≠ B→A | "follows" on X |
| Undirected | A—B = B—A | "friends" on Facebook |
| Weighted | Edges carry a value | Road distances |
| Labelled | Nodes/edges carry type info | Property graph |
clients
(id, name, age)
accounts
(id, client_id, balance)
transactions
(id, from_account_id,
to_account_id, amount, date)
ip_addresses
(id, value)
account_ip
(account_id, ip_id)
-- "Which clients share an IP?"
SELECT c1.name, c2.name, ip.value
FROM clients c1
JOIN accounts a1 ON a1.client_id = c1.id
JOIN account_ip ai1 ON ai1.account_id = a1.id
JOIN account_ip ai2 ON ai2.ip_id = ai1.ip_id
JOIN accounts a2 ON a2.id = ai2.account_id
JOIN clients c2 ON c2.id = a2.client_id
JOIN ip_addresses ip ON ip.id = ai1.ip_id
WHERE c1.id < c2.id;
(:Client)-[:OWNS]->(:Account) -[:TRANSFERRED_TO]->(:Account) (:Account)-[:USES]->(:IPAddress)
MATCH (c1:Client)-[:OWNS]->(a1:Account) -[:USES]->(ip:IPAddress) <-[:USES]-(a2:Account) <-[:OWNS]-(c2:Client) WHERE c1.id < c2.id RETURN c1.name, c2.name, ip.value
-- Friends within 3 hops — in Cypher (same line, any depth) MATCH (:User {name:"Alice"})-[:FRIENDS_WITH*1..3]->(friend) RETURN DISTINCT friend.name
| Model | Storage unit | Query lang |
|---|---|---|
| Property graph | Node + typed rel + props | Cypher, Gremlin |
| RDF / Triple store | Subject–Predicate–Object | SPARQL |
| Multi-model | Documents + graphs | AQL |
MATCH (a:User)-[:FRIENDS_WITH]->(b:User) WHERE a.name = "Alice" RETURN b.name
MATCH (:User {name: "Alice"})-[:FRIENDS_WITH*1..3]->(friend) RETURN DISTINCT friend.name;
3 to 6 — still one line. Change 3 to a variable — still one line.SELECT DISTINCT m.title FROM movies m JOIN watchings w ON w.movie_id = m.id JOIN friendships f ON f.user_id_2 = w.user_id JOIN users u ON u.id = f.user_id_1 WHERE u.name = 'Alice' AND m.id NOT IN ( SELECT w2.movie_id FROM watchings w2 JOIN users u2 ON u2.id = w2.user_id WHERE u2.name = 'Alice' );
SELECT DISTINCT m.title FROM movies m JOIN watchings w ON w.movie_id = m.id JOIN friendships f1 ON f1.user_id_2 = w.user_id JOIN friendships f2 ON f2.user_id_2 = f1.user_id_1 JOIN users alice ON alice.id = f2.user_id_1 WHERE alice.name = 'Alice' AND m.id NOT IN ( SELECT w2.movie_id FROM watchings w2 JOIN users u ON u.id = w2.user_id WHERE u.name = 'Alice' );
MATCH (alice:Person {name: "Alice"}) -[:FRIENDS_WITH]->(friend) -[:WATCHED]->(m:Movie) WHERE NOT (alice)-[:WATCHED]->(m) RETURN DISTINCT m.title
MATCH (alice:Person {name: "Alice"}) -[:FRIENDS_WITH*1..2]->(friend) -[:WATCHED]->(m:Movie) WHERE NOT (alice)-[:WATCHED]->(m) RETURN DISTINCT m.title, count(friend) AS recommended_by ORDER BY recommended_by DESC
*1..2 → *1..3. One token change.// Find all clients MATCH (c:Client) RETURN c.id, c.name
SELECT * FROM clients — but we describe the shape of data, not the table name.// Who owns which account? MATCH (c:Client)-[:OWNS]->(a:Account) RETURN c.name, a.id, a.balance ORDER BY a.balance DESC
// Accounts with balance under 500 MATCH (c:Client)-[:OWNS]->(a:Account) WHERE a.balance < 500 RETURN c.name, a.id, a.balance
// Accounts that share an IP address — fraud signal MATCH (a1:Account)-[:USES]->(ip:IPAddress) <-[:USES]-(a2:Account) WHERE a1.id < a2.id -- avoid duplicate pairs RETURN a1.id, a2.id, ip.value AS shared_ip
// All accounts reachable within 3 transfer hops from ACC-001 MATCH path = (:Account {id: 'ACC-001'}) -[:TRANSFERRED_TO*1..3]-> (b:Account) RETURN b.id, length(path) AS hops, [n IN nodes(path) | n.id] AS chain
*1..3 — follow this relationship type between 1 and 3 times.3 to 10. The query is identical. No rewrite.// Shortest transfer chain between two accounts MATCH path = shortestPath( (:Account {id: 'ACC-001'}) -[:TRANSFERRED_TO*]-> (:Account {id: 'ACC-004'}) ) RETURN [n IN nodes(path) | n.id] AS chain, length(path) AS hops
shortestPath() is a built-in function — not something you implement.// Hub accounts — anomalously high incoming transfer count MATCH (src:Account)-[:TRANSFERRED_TO]->(hub:Account) RETURN hub.id, count(src) AS incoming_count, sum(src.balance) AS total_incoming ORDER BY incoming_count DESC
GROUP BY in SQL — but operating over a traversal result.count(), sum(), avg(), collect() all work as expected.| Clause | Purpose | SQL analogy |
|---|---|---|
| MATCH | Find a pattern in the graph | FROM + JOIN |
| WHERE | Filter on properties | WHERE |
| RETURN | Output columns | SELECT |
| CREATE | Create node or relationship | INSERT |
| MERGE | Create if not exists | INSERT … ON CONFLICT |
| SET | Update a property | UPDATE |
| DELETE / DETACH DELETE | Remove node or relationship | DELETE |
| WITH | Pipeline intermediate results | CTE |
| UNWIND | Expand a list into rows | UNNEST / LATERAL |
MATCH works like INNER JOIN: if the pattern has no match, the row is dropped entirely.MATCH (p:Person)-[:WATCHED]->(m:Movie)OPTIONAL MATCH — if pattern not found, variables are null.MATCH (p:Person) OPTIONAL MATCH (p)-[:WATCHED]->(m:Movie) RETURN p.name, m.title
MATCH (p:Person) OPTIONAL MATCH (p)-[:WATCHED]->(m:Movie) RETURN p.name, count(m) AS watched_count ORDER BY watched_count DESC
MATCH (p:Person) WHERE NOT (p)-[:WATCHED]->(:Movie) RETURN p.name AS never_watched
WITH passes results from one stage to the next — like a Unix pipe or a SQL CTE.// ERROR: can't use count() in WHERE directly MATCH (p:Person)-[:FRIENDS_WITH]->(f) WHERE count(f) > 2 RETURN p.name
MATCH (p:Person)-[:FRIENDS_WITH]->(f) WITH p, count(f) AS friends WHERE friends > 2 RETURN p.name, friends ORDER BY friends DESC
MATCH (p:Person)-[:FRIENDS_WITH]->(f) WITH p, count(f) AS friends ORDER BY friends DESC LIMIT 3 MATCH (p)-[:WATCHED]->(m:Movie) RETURN p.name, m.title, friends
FROM + JOIN, then WITH is a subquery you can keep building on — but it reads much cleaner.// PageRank — find influential accounts CALL gds.pageRank.stream('myGraph') YIELD nodeId, score RETURN gds.util.asNode(nodeId).id AS account, score ORDER BY score DESC LIMIT 10
# Official Python driver from graphdatascience import GraphDataScience gds = GraphDataScience(uri, auth=(user, pwd)) results = gds.pageRank.stream(G) # returns a Pandas DataFrame
| Task | Right tool |
|---|---|
| Social graph, recommendations | Graph DB |
| Hierarchies, dependency trees | Graph DB |
| Fraud detection, routing | Graph DB |
| Reports, aggregations | PostgreSQL |
| Flat logs, time series | ClickHouse |
| Documents, flexible schema | MongoDB |
| Cache, sessions, counters | Redis |
| Type | Examples | Best for |
|---|---|---|
| Document | MongoDB, CouchDB | Semi-structured records |
| Key-value | Redis, DynamoDB | Caching, sessions |
| Column-family | Cassandra, HBase | Time series, write-heavy |
| Graph | Neo4j, FalkorDB | Connected data, traversals |