**Graphs** # Definition *Graph* is a data structure that consists of a set of nodes (also called vertices) and a set of edges that connect pairs of nodes: $G=(V,E)$, where $V$ is the set of vertices and $E$ is the set of edges. Example: ``` A / \ / \ / \ B C / \ \ / \ \ / \ \ D E ----- J \ / \ / \ / G ``` wheres vertices are `A, B, C, D, E, G, J` and edges are `(A,B), (A,C), (B,D), (B,E), (C,J), (D,G), (E,G), (E,J)`. graph does not have to be "in one piece": ``` V0 / \ / \ / \ V4---------V1 V5---V6 | | | | V3------V8---V7-----V2 \ / \ / \ / \ / \ / \ / V9----------V8 ``` # Basic Operations and Notations Basic operation on the graphs my include - `adjacent(u,v)`: check if there is an edge between vertices `u` and `v` - `neighbors(u)`: return the set of vertices that are adjacent to vertex `u` - `add_vertex(v)`: add a vertex `v` to the graph - `add_edge(u,v)`: add an edge between vertices `u` and `v` - `remove_vertex(v)`: remove vertex `v` and all edges connected to it - `remove_edge(u,v)`: remove the edge between vertices `u` and `v` Similar to tree nodes, graph vertices may contain additional data: label, problem specific data, etc. To retreive the data, we can use - `get_vertex_data(v)` - `set_vertex_data(v, data)` Edges may also contain additional data (weight, label), and we can use - `get_edge_data(u,v)` - `set_edge_data(u,v, data)` Graphs may be directed or undirected. In a directed graph, edges have a direction, meaning that if there is an edge from vertex `u` to vertex `v`, it does not imply that there is an edge from `v` to `u`. In an undirected graph, edges do not have a direction, meaning that if there is an edge between vertices `u` and `v`, it implies that there is an edge between `v` and `u`. Directed graph: ``` A ^ \ / \ / v B C ^ ^ ^ / \ \ v \ v D E ----> J \ / \ / v v G ``` Edges (B,A), (A,C), (E,B), (D,G), (E,G), (E,J) are directe edges, while (C,J), (B,D) are bidirectional. For directed graphs, we can also define the following operations: - `out_neighbors(u)`: return the set of vertices that are adjacent to vertex `u` by outgoing edges (edges that start from `u`) - `in_neighbors(u)`: return the set of vertices that are adjacent to vertex `u` by incoming edges (edges that end at `u`) Graphs may be weighted or unweighted. In a weighted graph, edges have a property *weight* (or cost) associated with them, which can represent the distance, time, or any other metric that is relevant to the problem at hand. ``` A---+ / \ \ 3/ \4 \5 / \ \ B C + / \ \ | 1/ \2 3\ | / \ 4 \| D E ----- J ``` # Examples of Graphs in Real World Examples of graphs from different domains: - road network: vertices are cities, edges are roads, weights are distances or travel times - social network: vertices are people, edges are friendships, weights can be strength of friendship - computer network: vertices are computers and routers, edges are connections, weights can be bandwidth (throughput) - molecular structure: vertices are atoms, edges are chemical bonds, weights can be bond strength or length Benzene molecule can be represented as a graph where vertices are carbon atoms and edges are chemical bonds: ``` H H \ / C -- C // \\ H -- C C -- H \ / C == C / \ H H ``` or - using weighted edges to represent bond strength. Assuming default bond strength is 1, we can represent double bonds with weight 2: ``` H H \ / C -- C 2/ \2 H -- C C -- H \ / C -- C / 2 \ H H ``` More notations for graphs: - $G=(V,E)$: graph with vertices $V$ and edges $E$ - undirected graph: edges are represented as unordered pairs of vertices, e.g. $e=\{u,v\}$ - directed graph: edges are represented as ordered pairs of vertices, e.g. $e=(u,v)$. `u` is called the *source* vertex and `v` is called the *target* vertex, or *origin* and *destination/terminus* vertex. - if two vertices `u` and `v` are connected by an edge, we say that `u` and `v` are *adjacent* or *neighbors*. # Paths and Connectivity A sequence of vertices $v_1, v_2, ..., v_k$ is called a *path* if there is an edge between every pair of consecutive vertices in the sequence. The length of a path is the number of edges in the path, which is equal to $k-1$. A path can also be represented as a sequence of edges: $e_1, e_2, ..., e_{k-1}$ where $target(e_i)=source(e_{i+1})$. A weight of a path is the sum of the weights of the edges in the path. A path is *simple* if it does not contain any repeated vertices. A path is a *cycle* if it starts and ends at the same vertex and contains at least one edge. ``` A---+ / \ \ 3/ \4 \5 / 2 \ \ B ----- C + / \ \ | 1/ \2 3\ | / \ 4 \| D E ----- J ``` Examples of paths in the graph above: - $A \to B \to D$: simple path of length 2, weight 4 - $B \to E \to J \to A \to C$: simple path of length 4, weight 15 - $B \to A \to C \to B \to E$: path of length 4, weight 11 (not simple because it contains repeated vertex `B`) - $A \to B \to E \to J \to C \to A$: cycle of length 4, weight 16 A *subgraph* of a graph $G=(V,E)$ is a graph $G'=(V',E')$ where $V' \subseteq V$ and $E' \subseteq E$ ($E'$ can only use vertices from $V'$ ). Basically *subgraph* of a graph $G$ is a graph that can be obtained by deleting some vertices and edges from $G$. Example of a subgraph using the graph above: ``` A / \ / \ / \ B ----- C \ \ \ E ----- J ``` notice - vertices `D` and consequently edge `(B,D)` are removed from the original graph. We also removed edges (B,E) and (A,J). Special types of subgraphs: - *vertex induced subgraph*: a subgraph that contains some vertices of the original graph $V' \subseteq V$ and **ALL** edges of the original graph that connect those vertices. Formally, $G'=(V',E')$ is an induced subgraph of $G=(V,E)$ if $E'=\{e \in E | source(e) \in V' \text{ and } target(e) \in V'\}$. - *edge induced subgraph*: a subgraph that contains some edges of the original graph $E' \subseteq E$ and **ALL** vertices of the original graph that are connected by those edges. Formally, $G'=(V',E')$ is an edge induced subgraph of $G=(V,E)$ if $V'=\{v \in V | v \text{ is connected by an edge in } E'\}$. - *spanning subgraph*: a subgraph that contains all vertices of the original graph. Formally, $G'=(V',E')$ is a spanning subgraph of $G=(V,E)$ if $V'=V$. - *spanning tree*: a spanning subgraph that is also a tree (connected and acyclic). Formally, $G'=(V',E')$ is a spanning tree of $G=(V,E)$ if $V'=V$, $G'$ is connected, and $G'$ does not contain any cycles. Examples: - vertex induced - delete vertices D and E, and all edges connected to them ``` A / \ / \ / \ B ----- C / / / D ``` - edge induced - keep only edges (A,B), (B,E), (E,J), and (A,J) and all vertices connected by those edges ``` A---+ / \ / \ / \ B + \ | \ | \ | E ----- J ``` - spanning subgraph - keep all vertices and delete edges (A,C), (A,J), make sure to keep all vertices connected by the remaining edges ``` A / / / B ----- C / \ \ / \ \ / \ \ D E ----- J ``` - spanning tree - keep all vertices and delete edges (A,J), (B,C), (C,J) to eliminate cycles, but make sure to keep all vertices connected by the remaining edges ``` A / \ / \ / \ B C / \ / \ / \ D E ----- J ``` A *connected component* of an undirected graph is a (vertex-induced) subgraph in which any two vertices are connected to each other by paths, and which is connected to no additional vertices in the supergraph. A graph is *connected* if it has only one connected component. Here is an example of a graph with two connected components: ``` V0 / \ / \ V4------V1 V5---V6 | | | | V8---V7 ``` Example of a connected graph: ``` V0 / \ / \ / \ V4---------V1 / \ / \ / \ / \ / V5---V6 \ / | | \ / | | \ V3------V8---V7-----V2 \ / \ / \ / \ / \ / \ / V9----------V8 ``` For directed graphs, we can define *strongly connected components* as subgraphs in which any two vertices are connected to each other by paths in both directions. A directed graph is *strongly connected* if it has only one strongly connected component. Example of a strogly connected directed graph (single strongly connected component): ``` A ----> B ^ ^ \ / \ \ / + v F <-----+-------C \ | / \ | / \ | / +-> D <-+ ``` if we remove edge (C,F), the graph is no longer strongly connected: we cannot reach `F` from any other vertex: ``` A ----> B ^ ^ \ / \ \ / + v F | C \ | / \ | / \ | / +-> D <-+ ``` # Famous problems on Graphs - *shortest path*: find the shortest path between two vertices in a graph. This can be solved using Dijkstra's algorithm, Bellman-Ford algorithm, or A* algorithm. - *minimum spanning tree*: find a spanning tree of a graph that has the minimum total weight. This can be solved using Prim's algorithm or Kruskal's algorithm. - *maximum flow*: find the maximum flow from a source vertex to a sink vertex in a flow network. This can be solved using Ford-Fulkerson algorithm or Edmonds-Karp algorithm. - *graph/map coloring*: assign colors to the vertices of a graph such that no two adjacent vertices share the same color, and minimize the number of colors used. - *clique*: find the largest complete subgraph (clique) in a graph. This is an NP-hard problem, but can be solved using backtracking or approximation algorithms. - *Eulerian path*: find a path that visits every edge of a graph exactly once. This can be solved using Hierholzer's algorithm. Legend says that Euler was looking for a path (for his walk) that would cross each of the seven bridges of Königsberg exactly once, and he proved that such a path does not exist: ``` +-------B | / \ | +---+ \ |/ \ A-----------C |\ / | +---+ / | \ / +-------B ``` - *Hamiltonian path*: find a path that visits every vertex of a graph exactly once. Is much harder then previous. - *planar graph*: determine if a graph can be drawn on a plane without any edges crossing and draw it. This can be solved using Kuratowski's theorem or the planarity testing algorithm. # Graph representations The two most common ways to represent a graph in a computer are *adjacency list* and *adjacency matrix*. ## Adjacency Matrix An adjacency matrix is a 2D array of size $|V| \times |V|$ where the entry at row $i$ and column $j$ is 1 if there is an edge from vertex $i$ to vertex $j$, and 0 otherwise. For example the graph ``` A / \ / \ / \ B ----- C / \ \ / \ \ / \ \ D E ----- F ``` is represented as an adjacency matrix of `boolean` values as follows: ``` | A B C D E F --+------------- A | 0 1 1 0 0 0 B | 1 0 1 1 1 0 C | 1 1 0 0 0 1 D | 0 1 0 0 0 0 E | 0 1 0 0 0 1 F | 0 0 1 0 1 0 ``` notice that the matrix is symmetric because the graph is undirected. Thus we may save half of the space by only storing the upper or lower triangle of the matrix. ``` | A B C D E F --+------------- A | 0 1 1 0 0 0 B | 0 1 1 1 0 C | 0 0 0 1 D | 0 0 0 E | 0 1 F | 0 ``` Note that *loop* (edge from a vertex to itself) is a legal edge in a graph, therefore we keep diagonal entries in the adjacency matrix. In the case when loops are impossible, you may save a little more space by not storing the diagonal entries. For weighted graphs, the entry can be the weight of the edge instead of just 1: ``` A / \ 2/ \4 / 3 \ B ----- C / \ \ 5/ \2 \7 / \ 1 \ D E ----- F ``` and the corresponding adjacency may of weights will be: ``` | A B C D E F --+------------- A | 0 2 4 0 0 0 B | 2 0 3 5 2 0 C | 4 3 0 0 0 7 D | 0 5 0 0 0 0 E | 0 2 0 0 0 1 F | 0 0 7 0 1 0 ``` again - it is symmetric because the graph is undirected. Note that using `0` to represent the absence of an edge is not always possible, because `0` can be a valid weight for an edge. In that case, we can use `null` or `infinity` (whatever is possible in the used programming language) to represent the absence of an edge. In C++ we can use `std::numeric_limits::infinity()`. For some algorithms it will even simplify the implementation. Instead of first checking if the edge exists and then comparing the weights: ```cpp if (adj_matrix[u][v] != 0 ) { if ( total + adj_matrix[u][v] < best ) { // do something } } ``` we can simply use the weights without checking and rely on the fact that `infinity` will make the comparison to evaluate to `false` if the edge does not exist: ```cpp if ( total + adj_matrix[u][v] < best ) { // do something } ``` If infinity is not available in the programming language, we can use a very large number that is **guaranteed to be larger than any value in the computation*. Notice that just bigger than the maximum possible weight in the graph may not be enough, because we may be comparing **several** weights added together. In some cases, like shortest path problem, we know that the maximum possible weight of a path is the sum of all edge weights in the graph, so we can use a number larger than that. In other cases, we may need to analyze the problem to find a suitable value for infinity. Fir directed graphs, the adjacency matrix is not necessarily symmetric. For example, the directed graph ``` A ^ \ / \ / v B ----> C ^ \ \ / \ \ / v v D E <---- F ``` is represented as an adjacency matrix of `boolean` values as follows: ``` | A B C D E F --+------------- A | 0 0 1 0 0 0 B | 1 0 1 0 1 0 C | 0 0 0 0 0 1 D | 0 1 0 0 0 0 E | 0 0 0 0 0 0 F | 0 0 0 0 1 0 ``` ## Adjacency List An adjacency list is a collection of lists or arrays where each list corresponds to a vertex in the graph and contains adjacent vertices (or edges) to that vertex. For example the graph undirected graph with no weights ``` A / \ / \ / \ B ----- C / \ \ / \ \ / \ \ D E ----- F ``` may be represented as an adjacency list as follows: ``` A: B, C B: A, C, D, E C: A, B, F D: B E: B, F F: C, E ``` For directed graphs, the adjacency list will only contain the outgoing edges. For example, the directed graph ``` A ^ \ / \ / v B ----> C ^ \ \ / \ \ / v v D E <---- F ``` adjacency list will be: ``` A: C B: C, E C: F D: B E: F: E ``` If edges have weights (or any other information), we can include the weights in the adjacency list. For example, the graph ``` A / \ 2/ \4 / 3 \ B ----- C / \ \ 5/ \2 \7 / \ 1 \ D E ----- F ``` adjacency list with weights will be: ``` A: (B,2), (C,4) B: (A,2), (C,3), (D,5), (E,2) C: (A,4), (B,3), (F,7) D: (B,5) E: (B,2), (F,1) F: (C,7), (E,1) ``` Note that each line only lists target vertices and weights, but not the source vertex, because it is implied by the line itself. If we don't want to rely on the line itself to imply the source vertex, we can include the source vertex in the list of edges. Some information is duplicated (the source vertex), but it may be more convenient for some algorithms: ``` A: (A,B,2), (A,C,4) B: (B,A,2), (B,C,3), (B,D,5), (B,E,2) C: (C,A,4), (C,B,3), (C,F,7) D: (D,B,5) E: (E,B,2), (E,F,1) F: (F,C,7), (F,E,1) ``` # Graph algorithms ## Graph traversals Graph traversals are similar to tree traversals with three important differences: - there is no root vertex in a graph, so we can start the traversal from any vertex - graph may contain cycles, so we need to keep track of visited vertices to avoid infinite loops - graph may not be connected, so we may need to start a new traversal from an unvisited vertex after finishing the traversal from the first vertex The first is usually simple to handle - we either know that any vertex may be used as a starting point (Prim's algorithm), or we are given a specific node to start from (Dijkstra's algorithm). The second and third can be handled together by keeping track of visited vertices. We either maintain a container of visited vertices, or vertex data type has a boolean field `visited`. Depth-first traversal can be implemented recursively as follows: ``` DF_traversal( G ) { for each vertex v in G { // this loop is needed to handle disconnected graphs if (v is not visited) { DF_traversal_helper(v) } } } DF_traversal_helper( v ) { mark v as visited (or add v to visited set) for each neighbor u of v { if (u is not visited) { DF_traversal_helper(u) } } } ``` Note that if all vertices in the graph are reachable from the starting vertex, we can simply call `DF_traversal_helper` once without the for-loop: i.e. all verices will be visited during the first iteration of the for-loop. This is a case with undirected connected graphs. In that case, the implementation will be: We can also implement depth-first traversal iteratively using a stack instead of recursion: ``` DF_traversal( G ) { Stack stack for each vertex v in G { // this loop is needed to handle disconnected graphs if (v is not visited) { stack.push(v) while (!stack.isEmpty()) { vertex = stack.pop() if (vertex is not visited) { mark vertex as visited for each neighbor u of vertex { if (u is not visited) { stack.push(u) } } } } } } } ``` Changin `Stack` to `Queue` will give us breadth-first traversal instead of depth-first traversal. But for graphs there is no significant difference between the two traversals, because there is no `level` concept in graphs. Both traversals will visit all vertices in the same order, but the order of visiting neighbors may be different. Example: ``` A / \ / \ / \ B ----- C / \ \ / \ \ / \ \ D E ----- F ``` using `DF_traversal` with `Stack` ``` Start for-loop wirh v=A ======================= Stack: A Visited: ----------------- Pop A out, children B and C, both are not visited, so add both to the stack Stack: C, B Visited: A ----------------- Pop B out, children A, C, D, E. Vertex A is already visited, add C, D, E to the stack Stack: C, C, E, D Note that C is duplicated in the stack. We could have avoided this by marking C as visited when we add it to the stack, but it is not necessary, because we will check if it is visited when we pop it out of the stack. ----------------- Pop D out, children B. Vertex B is already visited, so we do not add it to the stack Stack: C, C, E Visited: A, B, D ----------------- Pop E out, children B, F. Vertex B is already visited, add F to the stack Stack: C, C, F Visited: A, B, D, E ----------------- Pop F out, children C, E. Both are already visited. Stack: C, C Visited: A, B, D, E, F ----------------- Pop C out, children A, B, F. All are already visited. Stack: C Visited: A, B, D, E, F, C ----------------- Pop C out, it alread visited - skip Stack: Visited: A, B, D, E, F, C ----------------- for-loop wirh v=B, it is already visited - skip ======================= for-loop wirh v=C, it is already visited - skip ======================= for-loop wirh v=D, it is already visited - skip ======================= for-loop wirh v=E, it is already visited - skip ======================= for-loop wirh v=F, it is already visited - skip ======================= ``` Note that since the graph is connected, all nodes were visited during the first iteration of the for-loop. Let's consider another example - this time a directed graph. ``` A ^ \ / \ / v B ----> C ^ \ \ / \ \ / v v D E <---- F ``` DF_traversal with `Stack`: ``` Start for-loop wirh v=A ======================= Stack: A Visited: ----------------- Pop A out, children C. Vertex C is not visited, so add it to the stack Stack: C Visited: A ----------------- Pop C out, children F. Vertex F is not visited, so add it to the stack Stack: F Visited: A, C ----------------- Pop F out, children E. Vertex E is not visited, so add it to the stack Stack: E Visited: A, C, F ----------------- Pop E out, no children Stack: Visited: A, C, F, E ----------------- for-loop wirh v=B, it is not visited, so we start a new traversal from ======================= Stack: B Visited: A, C, F, E ----------------- Pop B out, children C, E, A. All are already visited, so we do not add them to the stack Stack: Visited: A, C, F, E, B ----------------- for-loop wirh v=C, it is already visited - skip ======================= for-loop wirh v=D, it is not visited, so we start a new traversal from ======================= Stack: D Visited: A, C, F, E, B ----------------- Pop D out, children B. Vertex B is already visited, so we do not add it to the stack Stack: Visited: A, C, F, E, B, D ----------------- for-loop wirh v=E, it is already visited - skip ======================= for-loop wirh v=F, it is already visited - skip ======================= ``` Because not all vertices are reachable from vertex `A`, we had to start a new traversal from vertex `B` and then from vertex `D` to visit all vertices in the graph. ## Minimum Spanning Tree Minimum spanning tree is a spanning tree of a graph that has the minimum total weight. A spanning tree is a subgraph that contains all vertices of the original graph and is a tree (connected and acyclic). Obviously the original graph must be connected, otherwise there is no spanning tree. Given ``` A---+ / \ \ 3/ \4 \7 / 2 \ \ B ----- C + / \ \ | 1/ \2 5\ | / \ \| D ----- E ----- J 3 6 ``` examples of a spanning tree for the graph (not necessarily minimum): ``` A---+ / \ 3/ \7 / \ total weight 20 B C + \ \ | \2 5\ | \ \| D ----- E J 3 ``` ``` A / 3/ total weight 17 / B C / \ \ 1/ \2 5\ / \ \ D E ----- J 6 ``` ``` A / \ 3/ \4 total weight 15 / \ B C / \ \ 1/ \2 5\ / \ \ D E J ``` ``` A / 3/ total weight 13 / 2 B ----- C / \ \ 1/ \2 \5 / \ \ D E J ``` the last is actually the minimum spanning tree for the graph. ### Kruskal's algorithm The idea is to - try edges from shortest to longest (since we want MINIMUM spanning tree) - before adding check that we do not create a cycle (since we want a minimum spanning TREE) - stop when we have added $|V|-1$ edges (since minimum SPANNING tree) ``` Kruskal( G ) { // assumes that G is connected sort edges of G in non-decreasing order of weight T = empty graph for each edge e in sorted order and while T has less than |V|-1 edges { if (T + e does not contain a cycle) { T = T + e } } return T } ``` ``` A---+ / \ \ 3/ \4 \7 / 2 \ \ B ----- C + / \ \ | 1/ \2 5\ | / \ \| D ----- E ----- J 3 6 ``` Let's execute Kruskal's algorithm on the graph above: Sorted edges: (B,D,1), (B,E,2), (B,C,2), (D,E,3), (A,B,3), (A,C,4), (C,J,5), (E,J,6), (A,J,7) T=empty Edge (B,D,1) - does not create a cycle, add it to T ``` A B C / / D E J ``` Edge (B,E,2) - does not create a cycle, add it to T ``` A B C / \ / \ D E J ``` Edge (B,C,2) - does not create a cycle, add it to T ``` A B --- C / \ / \ D E J ``` Edge (D,E,3) - creates a cycle (B-D-E-B), do not add it to T Edge (A,B,3) - does not creates a cycle, add it to T ``` A / / B --- C / \ / \ D E J ``` Edge (A,C,4) - creates a cycle (A-B-C-A), do not add it to T Edge (C,J,5) - does not creates a cycle, add it to T ``` A / / B --- C / \ \ / \ \ D E J ``` required number of edges is |V|-1=5, we stop and return T How exactly do we detect cycles. Here is a simple idea: keep track of the connected components of the growing spanning tree T. When adding a new edge - make sure the end points are **not** in the same component. If endpoints are in the same component, then there is a path connecting them, and adding an edge between first and last vertices on that path will complete a cycle. Steps: - maintain a set of connected components (start with each vertex is in its own component). You may think of components as vertices of the same color, initially all vertices have different colors. - when we want to add an edge (u,v), check if u and v are in the same component (same color) - if same color, then adding the edge will create a cycle, so we do not add it. - if not same color. Add edge and change the color of all vertices in the component of v to the color of the component of u (or vice versa) - i.e. merge the two components into one component. Example - using graph from the previous example: ``` A---+ / \ \ 3/ \4 \7 / 2 \ \ B ----- C + / \ \ | 1/ \2 5\ | / \ \| D ----- E ----- J 3 6 ``` Sorted edges: (B,D,1), (B,E,2), (C,B,2), (D,E,3), (A,B,3), (A,C,4), (C,J,5), (E,J,6), (A,J,7) Components ``` A B C D E J 0 1 2 3 4 5 <-- initial components, each vertex is in its own component/color ``` Edge (B,D,1) - color(B)=1, color(D)=3, different colors, add edge and merge components of B and D (change all vertices of color of 3 to 1) ``` A B C D E J 0 1 2 1 4 5 ``` Edge (B,E,2) - color(B)=1, color(E)=4, different colors, add edge and merge components of B and E (change all vertices of color of 4 to 1) ``` A B C D E J 0 1 2 1 1 5 ``` Edge (C,B,2) - color(C)=2, color(B)=1, different colors, add edge and merge components of C and B (change all vertices of color of 1 to 2) ``` A B C D E J 0 2 2 2 2 5 ``` Edge (D,E,3) - color(D)=color(E)=2, same color, skip Edge (A,B,3) - color(A)=0, color(B)=2, different colors, add edge and merge components of A and B (change all vertices of color of 2 to 0) ``` A B C D E J 0 0 0 0 0 5 ``` Edge (A,C,4) - color(A)=color(C)=0, same color, skip Edge (C,J,5) - color(C)=0, color(J)=5, different colors, add edge and merge components of C and J (change all vertices of color of 5 to 0) ``` A B C D E J 0 0 0 0 0 0 ``` stop - added all 5 required adges. Notice that all vertices are in the same component, which means our tree is a spanning tree. Runtime analysis - sorting edges takes $O(E \log E)$ time - checking if two vertices are in the same component is $O(1)$ - changing colors is $O(V)$, since we have to traverse the whole array to change colors. Changing colors happens exactly |V|-1 times, so total time for changing colors is $O(V^2)$ Thus the total runtime of this implementation of Kruskal's algorithm is $O(E \log E + V^2)$. Note: notice that we cannot tell which of the two terms in the runtime is dominant, because it depends on the density of the graph: - for sparse graphs, $E$ is close to $V$, so $O(E \log E)$ is close to $O(V \log V)$ and dominant term is $O(V^2)$ - for dense graphs, $E$ is close to $V^2$, so $O(E \log E)$ is close to $O(V^2 \log V)$ and dominant term is $O(E \log E)$ #### Disjoint-set data structure The implementation of Kruskal's algorithm described above is not very efficient because changing colors takes $O(V)$ time. We can improve this by using a more sophisticated data structure called *disjoint-set* (also known as union-find) to keep track of connected components. Disjoint-set data structure supports the following operations: - `make_set(x)`: initialize by creating a new set containing the single element x - `find_set(x)`: returns the representative (or "parent") of the set containing x - `union(x, y)`: merges the sets containing x and y into a single set Implementation uses an array of linked lists. Each linked list represents a connected component and the element of the list are the vertices that belong to that component. The first element of the list is the *representative* of the component. Example: ``` Component 1: A -> B -> E Component 2: C -> D Component 3: F ``` First component consists of vertices A, B, and E, and the representative of this component is A. Second component consists of vertices C, D, and the representative of this component is C. Third component consists of a single vertex F, and the representative of this component is F. Second data structure is an array of representatives, which maps each vertex to the representative of the component it belongs to. It is identical to the color array in the previous implementation, but instead of storing a color, it stores the representative of the component. Example: ``` Vertex: A B C D E F Rep: A A C C A F ``` Similar to before - to check if two vertices u and v are in the same component, we check if `rep[u] == rep[v]`, but to combine components we - merge the linked list of the smaller component into the linked list of the larger component (to make the next step faster) - update the representative of all vertices in the smaller component to be the representative of the larger component (to keep the `rep` array up to date). This is where we save time - we know exactly which vertices whould change their representatives - vertices on the smaller linked list. Thus, instead of traversing the whole array, we only traverse the smaller linked list. To keep track of the head, tail, and size of the linked lists, we can use a special head node for each linked list that stores this information: ``` class HeadNode { int size; // number of vertices in the component VerteNode* head; // pointer to the first vertex in the component VertexNode* tail; // pointer to the last vertex in the component }; class VertexNode { int vertex; // the vertex itself VertexNode* next; // pointer to the next vertex in the component }; ``` Example of the linked list describing a component of size 3 with vertices A, H, and D: ``` head node +--+---+--+ | 3| | +-----------------------+ +--+-+-+--+ | | | | | | vertex vertex | vertex | node node v node | +---+---+ +---+---+ +---+---+ +-->| A | +-->| B | +-->| E | 0 | +---+---+ +---+---+ +---+---+ ``` Linked lists are organized in an array, linked lists are at the index that corresponds to their representative vertex. Vertices that are not representatives of any component (i.e. vertices that are not the head of any linked list) will be empty lists. For example, for the components above the array of linked lists will look like this: ``` A B C D E F +---------------+---------------+---------------+---------------+---------------+---------------+ | +---+---+---+ | +---+---+---+ | +---+---+---+ | +---+---+---+ | +---+---+---+ | +---+---+---+ | | | 3 | | | | | 0 | | | | | 2 | | | | | 0 | | | | | 0 | | | | | 1 | | | | | +---+-+-+-+-+ | +---+---+---+ | +---+-+-+-+-+ | +---+---+---+ | +---+---+---+ | +---+-+-+-+-+ | +-------+---+---+---------------+-------+---+---+---------------+---------------+-------+---+---+ | | | | | | +----+ +------------+ | +------+ +---+ | | | | | v v v v v +--+---+ +--+---+ +--+---+ +--+---+ +--+---+ +--+---+ | A| +-->| B| +-->| E| 0 | | C| +-->| D| 0 | | F| 0 | +--+---+ +--+---+ +--+---+ +--+---+ +--+---+ +--+---+ ``` Example of union operation. Say we want to add an edge (E,D): - first we check if E and D are in the same component by comparing their representatives: `rep[E]` is A and `rep[D]` is C - using the representatives from the previous step we realize that D's component is smaller than E's component, - traverse the smaller list and update the array of representatives for corresponding vertices (rep[C]=A, rep[D]=A) - attached linked list of the smaller component (component of D) to the end of the linked list of the larger component (component of E) by - update the `next` pointer of the tail of the larger component to point to the head of the smaller component - update the `tail` of the larger component to be the tail of the smaller component - update the `size` of the larger component to be the sum of the sizes of the two components - update the head node of the smaller component to be empty (since it is now merged into the larger component) ``` A B C D E F +---------------+---------------+---------------+---------------+---------------+---------------+ | +---+---+---+ | +---+---+---+ | +---+---+---+ | +---+---+---+ | +---+---+---+ | +---+---+---+ | | | 3 | | | | | 0 | | | | | 0 | | | | | 0 | | | | | 0 | | | | | 1 | | | | | +---+-+-+-+-+ | +---+---+---+ | +---+---+---+ | +---+---+---+ | +---+---+---+ | +---+-+-+-+-+ | +-------+---+---+---------------+---------------+---------------+---------------+-------+---+---+ | | | | +----+ +--------------------------------------+ +---+ | | | v v v +--+---+ +--+---+ +--+---+ +--+---+ +--+---+ +--+---+ | A| +-->| B| +-->| E| +------>| C| +-->| D| 0 | | F| 0 | +--+---+ +--+---+ +--+---+ +--+---+ +--+---+ +--+---+ ``` - lastly, we update the representative of all vertices in the smaller component to be the representative of the larger component (to keep the `rep` array up to date). We can do this by traversing the linked list of the smaller component and updating the representative for each vertex. Run-time complexity, V is the number of vertices - `make_set(x)`: $O(V)$, linear - `find_set(x)`: $O(1)$, constant - `union(x, y)`: $O(V)$, linear, since we traverse a list. Union's run-time does not provide a real picture of what's happening: consider a sequence of $V-1$ unions (as it will happen in Kruskal's algorithm, the first unions will be using very short lists, and only the last unions will be proportional to the number of vertices. It can be shown that runtime of a sequence of n unions is actually just $O(V \log V)$. Back to Kruskal's algorithm. Run-time complexity: sort edges $O(E \log E)$, E is the number of edges for-loop - at most E iterations: at most E finds ( total O(E) ) at most V-1 will be followed by a union ( total $O(V \log V)$ ) Thus Kruskal's algorithm run-time complexity: $O(E \log E) + O(E) + O(V \log V) = O(E \log E)$, since $E>V-2$ due to connectedness of the graph. ## Dijkstra's algorithm ## Prim's algorithm ## Bellman-Ford algorithm # Exercises 1. Given graph ``` A / \ / \ E--+-----+--B | / \ | |/ \| + + /| |\ / | | \ / F-----------G \ / \ D---------------------C ``` - provide graph description using adjacency matrix - provide graph description using adjacency list - connected components of the graph 2. Given graph ``` A / \ / \ / \ B-------C | | | | | | | | D-------E ``` - provide Eulerian path for the graph - provide Hamiltonian path for the graph 3. Can you draw this graph in 2D plane without edge crossings? I.e. is this graph planar? ``` A-----------+ /|\ \ / | \ \ / | \ \ B---+---+-----------C | | \ / | + \ / | / \ / | / \ / |/ \ / D-------------E ``` Explain using English. For example: - keep vertices A, B were they are and move vertex C to the left of vertex A, or inside triangle, etc - arrange vertices A, B, C in a triangle or circle, or on a line, etc 4. Same question ``` A-------------B / /| / | / | / / | D-------------C | | | | | | | | | E - - - | - F | | / | / | / | |/ H-------------G ``` 5. Execute Kruskal's algorithm on the graph: ``` A / \ 1/ \2 / \ / 1 \ B-----------C / \3 2/ \ 3/ \ 4 / \5 / D-----E \ / |3 5| \ / 7 | | 5 \ F--------G-----H------ K \ / 5 \ / 6\ 8/ \4 /1 \ / 9 \ / M ------------ N ``` provide a list of sorted edges and under each edge indicate whether it is added to the spanning tree or not. ``` 1 1 1 2 <--- weights - just for error checking A-B, B-C, N-K, A-C, .... + + + - "+" - add "-" - skip ```