Graphs

Goals

Understand and identify data structures.
Understand node and graph data structures.
Implement and use a linked list.

Concepts

array
data structure
directed graph
doubly-linked list
edge
graph
instance graph
linked list
node
traverse
vertex

Library

java.lang.UnsupportedOperationException

Lesson

When you learned about algorithms, you discovered that certain of them require you to first organize data in a certain way. A binary search, for example, requires that all input data first be sorted. Not only the order but sometimes even the general organization of data in the computer memory can be important or even necessary for an algorithm to work.

Data Structures

Until this point, whenever you need groups of objects, you have been storing them in arrays. In computer science many other approaches have been invented for holding groups of objects in addition to arrays. These approaches are called data structures, and they serve as general-purpose containers for data. Many times an algorithm will require that data be stored in a particular data structure in order for the algorithm to work. Data structures generally the same way across all programming languages, and they each have their own benefits and trade-offs.

Array

The Java “array objects” you have been using are objects like any non-primitive value. You may have given no thought about how actual values are stored inside the array object, but conceptually the values are stored in a row, in contiguous memory positions. In data structure terms, this is the actual “array”. You could say that the Java array object stores its values in an array data structure.

Index	`0`	`1`	`2`	`3`	…`n-1`
Value	`element_1`	`element_2`	`element_3`	`element_4`	`element_n`

An array data structure is rather rigid. It has a fixed length, and to add new elements we would somehow have to allocate more memory and/or move the array somewhere else in memory. Inserting an element into the array requires shifting all the other elements down to make room. But perhaps one of the biggest drawbacks is that finding an object in an array means that we have to look through all the elements of the array, one at a time. We could get lucky and find an object toward the front of the array. If we were not so lucky we might not find the object until we reached the end of the array.

Graph

A solution to many of the drawbacks of arrays is to link the objects together in memory, much like a TinkerToy® set. Linking objects together creates a data structure called a graph.

In graph terminology, each object above is called a vertex of the graph, and the lines between the objects are called edges. Graphs can come in all shapes an sizes. A graph could consist of several vertices strung out in a single line. A graph could consist of several vertices each connected to all the other vertices. A graph could even consist of a single vertex not connected to anything. The key concept of a graph is that a vertex has a way to link to one or more other vertices.

Node

But how can we link our objects in real life? If we have three Animal instances, for example, how could we link them together? One solution would be to add a nextAnimal variable to the Animal class. If we wanted to link together several animals, we could use animal1.setNextAnimal(animal2), and use animal2.setNextAnimal(animal3), and so on—sort of like a row of dogs on leashes, each leashed to the other. But do we really want to pollute our Animal class with special linking variables just because we want a way to group our animals together? This approach would also come with some restrictions: it would be difficult for an Animal to appear in more than one group, for example.

A more flexible approach is to create a separate “wrapper class” for each of the vertexes. Normally this data structure called a node, and its sole purpose is to hold the object and point to one or more other nodes.

Here is conceptually how a Node class might be implemented. In reality the number of nodes would depend on the specific type of graph.

Example Node implementation.

public class Node {

  private final Object value;

  /** @return The value represented by this node. */
  public @Nonnull Object getValue() {
    return value;
  }

  private Node[] links = new Node[0];

  /**
   * Creates a node containing an object.
   * @param value The object to store in the node.
   */
  public Node(@Nonnull final Object value) {
    this.value = checkNotNull(value);
  }

  /**
   * Sets the links to other nodes.
   * @param links The other nodes this node links to.
   */
  public void setLinks(@Nonnull final Node... links)
  {
    this.links = checkNotNull(links);
  }

}

We could then construct the graph pictured above:

Creating a graph data structure with Node instances.

final Object object1 = new Object();  //sample objects; could be any objects
final Object object2 = new Object();
final Object object3 = new Object();
final Object object4 = new Object();
final Object object5 = new Object();
final Node node1 = new Node(object1);  //create the nodes
final Node node2 = new Node(object2);
final Node node3 = new Node(object3);
final Node node4 = new Node(object4);
final Node node5 = new Node(object5);
node1.setLinks(node3, node2, node5);  //link the nodes
node2.setLinks(node3);
node3.setLinks(node4, node5);
node5.setLinks(node3);

A Node instance is not a “value object”—something you will learn about in a future lesson! Although a Node “wraps” an object much like a value object, one Node instance is not equivalent other arbitrary Node instance that contain the same values—those nodes might be part of a different graph! Additionally a Node may be mutable so as to create or remove links with other nodes as needed. There may be some larger, enclosing data type that is immutable and hides access to the node, however.

If we were wanting to create a graph of Animals, we could have made our Node class hold an Animal instance. But then we'd have to create another Node type if we wanted to represent a graph of Vehicles. Instead we have fashioned Node so that it holds an Object instance, making it work for any non-primitive type. How could you store primitive values in a graph in which the Nodes store Object instances?

Graphs in general can become quite complex and can help solve complicated problems. But for merely storing objects there are a few main graph types that are relatively simple.

Linked List

A linked list data structure is a graph consisting of a single line of nodes strung together.

Because each node can only connect to one node in front of it (or none at all, if it is the last node), a LinkedListNode for a linked list can be simpler than the more general Node source code shown in the earlier section above.

Example LinkedListNode implementation.

public class LinkedListNode {

  private final Object value;  //stores a single value

  /** @return The value represented by this node. */
  public @Nonnull Object getValue() {
    return value;
  }

  private LinkedListNode next = null; //links at most to one other node

  /** @return The next node in the list, or {@code null} if this node is the end of the list. */
  public @Nullable LinkedListNode getNext() {
    return next;
  }

  /**
   * Sets the link to next node in the list.
   * @param next The next node in the list, or {@code null} if this node is the end of the list.
   */
  public void setNext(@Nullable final LinkedListNode next) {
    this.next = next;
  }

  /**
   * Creates a node containing an object.
   * @param value The object to store in the node.
   */
  public LinkedListNode(@Nonnull final Object value) {
    this.value = checkNotNull(value);
  }

}

A linked list can thus be a substitution for an array! Here is an abbreviated implementation of a linked list, allowing elements to be added and a length to be calculated.

Example LinkedList implementation.

public class LinkedList {

  private LinkedListNode first = null;

  /**
   * Adds an object to the end of the list.
   * @param object The object to add to the list.
   */
  public void add(final Object object) {
    final LinkedListNode newNode = new LinkedListNode(object);
    if(first == null) {  //if the list is empty
      first = newNode;
      return;
    }
    //look for the end of the list
    LinkedListNode node = first;
    while(node.getNext() !=null) {
      node = node.getNext();
    }
    assert node.getNext() == null;
    //add the node to the end of the list
    node.setNext(newNode);
  }

  /** @return The length of the list. */
  public int getLength() {
    if(first == null) {  //if the list is empty
      return 0;
    }
    //look for the end of the list
    int length=1;
    LinkedListNode node = first;
    while(node.getNext() !=null) {
      node = node.getNext();
      length++;
    }
    return length;
  }
}

A linked list is very efficient for adding elements; although it must traverse (or pass across) each node to find the end of the list, it doesn't need to copy any nodes—it can simply rearrange references to nodes. But a plain linked list is less efficient than an array for certain operations. While an array can immediately return the element at a given index, a linked list will need to traverse each element of the list until it reaches the ith node. Even finding the length of the list requiring traversing over each node until the end of the list is located. Some hybrid linked lists mitigate some but not all of of these inefficiencies, for example by storing a reference to the last node, or storing a separate variable with the current length which it constantly keeps up to date.

Some linked lists use nodes that have a reference to the previous node in addition to the next. This allows traversal both forwards and backwards. Such a list is called a doubly-linked list.

Review

Gotchas

Arrays are always mutable; anyone with access to an array can change its contents.
Arrays allow very fast access, but take a long time to insert or delete items because the other elements must be copied.
Linked list allow quick insertion and deletion, but take longer to look up a certain index because the list must be traversed.

In the Real World

Data structures are the workhorses of computer programs. You will use them day in and day out to shuffle data around.

Think About It

If you have worked with graph-based structures before, the needed implementation may become obvious to you and you will be able to sit down and code it from the top. But if you are just starting to work with graphs and node traversal, it may all seem overwhelming! That's when you should start with what you know and take it one step at a time. Find a small piece that is so simple that you can't fail to understand it. Implement it, and then make a unit test for it. Then add another step, and another unit test, and keep going until it all works. Let's walk through an example.

Linked List Length

Let's start with finding the length of a linked list. The solution for this can already be found in the main part of this lesson; that solution is iterative. But let's take another approach at how we might think about the problem.

LinkedListNode class diagram. — `LinkedListNode` class diagram.

First what is a linked list made up of? Nodes. Before trying to implement the whole “kit and caboodle” as it were, let's just think about nodes. In fact don't even think of nodes: think of a single node. You can look at the the implementation of LinkedListNode in the lesson to follow along.

What if we have a single node containing the string "foo" as its value, and containing a next node of null? This is more or less the simplest node we can make. Can it still be considered a linked list? Yes, it can! What is its length? It's length is 1. How did you know that? Because the next node is null. You already knew that, right? So write a method for it.

Let's not worry about the actual LinkedList implementation yet, but we can put it in a LinkedList class as a static method, just to have somewhere for the method to go.

LinkedList class for initial implementation of static methods.

public class LinkedList {

  /**
   * Determines the length of one or more nodes for a linked list.
   * @param node The first node in the chain.
   * @return The number of nodes strung together.
   */
  static int getLength(@Nonnull final LinkedListNode node) {
    if(node.getNext() == null)) {
      return 1;
    } else {
      throw new UnsupportedOperationException("I don't know how to implement length > 1 yet.");
    }
  }

}

So if LinkedListNode.getNext() == null then there is only one node. That's not hard to do at all! Now you need to test it. Create a unit test LinkedListTest in the same package.

You'll notice above that we gave the static LinkedList.getLength(LinkedListNode) method the default package access. This way nothing outside the package can access it—but our unit test, which is in the same package, can access it for testing.

LinkedListTest class for testing the static methods in the LinkedList class.

public class LinkedListTest {

  @Test
  public void testSingleNodeGetLength() {
    final LinkedListNode node = new LinkedListNode("foo");
    assertThat(LinkedList.getLength(node), is(1));
  }

}

At this point you should be on very firm footing. You know that if a node has a next node of null, there are no other nodes so the length is 1. You've written a test for it, and it passed. You might even think that writing a test for such a simple case is ridiculous—but that would be wrong. In fact you could go ahead and write a test for lists longer than 1. There is an entire movement that advocates writing tests before you do the implementation, just so you'll know what you're trying to do! Let's try it for a linked list of two nodes.

Test for length of two LinkedListNodes.

public class LinkedListTest {

  @Test
  public void testSingleNodeGetLength() {
    final LinkedListNode node = new LinkedListNode("foo");
    assertThat(LinkedList.getLength(node), is(1));
  }

  @Test
  public void testTwoNodesGetLength() {
    final LinkedListNode node1 = new LinkedListNode("foo");
    final LinkedListNode node2 = new LinkedListNode("bar");
    node1.setNext(node2);
    assertThat(LinkedList.getLength(node1), is(2));
  }

}

Run the test; it will fail. That's good—you now know what you're trying to implement.

So let's go back to our LinkedList.getLength(LinkedListNode) method. We already know that if LinkedListNode.getNext() is null, then there is only one node in the list. But what if LinkedListNode().getNext() is not null? What do we know? We know that there is one or more nodes in linked to the first node. You can think of these remaining nodes (however many they are) as a linked list themselves. In other words, if you take away the first node, we know that there exists one or more nodes linked together, and the first node in that list is whatever LinkedListNode().getNext() references. If we only knew how long that list was, we could simply add 1 to it, because we've counted the first node:

Planning implementation of length calculation for more than one LinkedListNode.

public class LinkedList {

  static int getLength(@Nonnull final LinkedListNode node) {
    if(node.getNext() == null)) {
      return 1;
    } else {
      /* TODO implement for length > 1
       * The length must be 1 (our current node) plus
       * the length of the linked list starting with "next".
       * Something like  1 + somehowGetLengthOf(next).
       * But how do we calculate the length of "next"?
       */
    }
  }

}

So all you need to do know is find out how to find the length of the linked list starting with the next node. But don't you already have a method for calculating the length of a linked list of nodes starting with some node?

Take a step back and look at your code; you are writing a getLength(LinkedListNode) already! Why can't you call that to get the length of the next node?

“But wait,” you may protest, “that method is not finished yet!” But is it not? Try it:

Recursive implementation of length calculation for more than one LinkedListNode.

public class LinkedList {

  static int getLength(@Nonnull final LinkedListNode node) {
    if(node.getNext() == null)) {
      return 1;
    } else {
      return 1 + getLength(node.getNext());
    }
  }

}

Think about what this method can do now:

getLength(LinkedListNode) can find the length of a list with length 1. This is the base case.
getLength(LinkedListNode) can find the the length of any list greater than 1 by adding 1 to the length of the nodes in front of it. This is the general case.

And isn't that all it needs to know? Instead of the iterative approach in the lesson, this is in fact the recursive approach. If you have a string of 5 beads, you know that the length of the beads is 1 plus the length of the remaining 4 beads. And the length of 4 beads is 1 plus the length of the remaining 3 beads, and so on down to the last bead, which of course has length of 1. (We always need a way to stop when we are using recursion.)

Now run the unit test LinkedListTest.testTwoNodesGetLength(); it will pass. Write one for three nodes, for five nodes, and for 500 nodes; they will all pass.

We now can calculate the length of a list of one node and anything greater than one node. What about no nodes—an empty list? We can't represent an empty list with a node; we'll have to go to the LinkedList level for that, but we don't have to throw away all our hard work in getLength(LinkedListNode). If the list is not empty, we can delegate to the method we already wrote, like this:

Delegating to static LinkedList.getLength(LinkedListNode) method from LinkedList.getLength().

public class LinkedList {

  private LinkedListNode first;

  …

  public int getLength() {
    if(first == null) {  //if there is not a node... there are no nodes!
      return 0;
    } else {  //if we have at least one node, we already know how to calculate the length!
      return getLength(first);
    }
  }

  static int getLength(@Nonnull final LinkedListNode node) {
    if(node.getNext() == null)) {
      return 1;
    } else {
      return 1 + getLength(node.getNext());
    }
  }

}

So now if you feel lost, you have an approach to attack any of these graph traversal problems, and indeed most problems in software development.

Take the simplest case you can think of.
Start working directly with the nodes; you can work with the higher-level data structure later.
Create some static method to work directly with the simple case.
Create a unit test to test that simple case.
Write another unit test to test something just a little more difficult.
Go back and finish writing the method to handle the more difficult case.
Repeat until finished.

Self Evaluation

Why are data structures needed?
What is the distinction between a Java array object and an array data structure? How are the two related?
What are some drawbacks of an array data structure?
What must be done in order to insert an object into the middle of an array?
How could you arbitrarily indicate relationships among Foo instances in a graph without modifying the Foo class?
In what ways is a linked list better than an array? How might a linked list be less efficient than an array?

Task

Create a new Maven project in the same Git repository to hold the data structures you implement in the two sections below.

Move your existing Booker project files to a separate booker subdirectory. Because you are mostly just moving files, you can commit these changes directly to the repository without creating a merge request. The remaining steps will be part of your merge request for this lesson.
- You will need to move your .gitignore file as well, if you used the absolute path form for any files or subdirectories.
Create a new subdirectory named datastruct for your new data structure project.
In the datastruct directory create a new Maven Java project using the package my.package.datastruct (substituting your base package name).
- You will need to create an additional .gitignore file for the new project directory.
Provide appropriate Maven coordinates, using the artifact ID datastruct.
Add the datastruct project coordinates as a dependency to the booker project.

When you build the booker project, its datastruct dependency project will have to have first been built and added to the local Maven repository cache. That is, you will need to use mvn install for the datastruct project each time it is updated before you attempt to build the booker project.

Create your own implementation of a linked list, implementing the following interface, complete with unit tests. Override the default implementation of add(Object object) only if you think it appropriate.

public interface LinkedList {

  /** Returns the first node.
   * @return The first node in the list, or {@code null} if there is no first node.
   * @deprecated Normally a linked list shouldn't expose its nodes; this method
   *     in the interface is temporary to provide access to the search method,
   *     and will be removed when better approaches for searching are added.
   */
  @Deprecated
  @Nullable LinkedListNode getFirstNode();

  /** @return The length of the list. */
  int getLength();

  /**
   * Retrieves the object at the given index.
   * @param index The zero-based index at which to retrieve the object.
   * @return The object at the given index, which may be {@code null}.
   * @throws IndexOutOfBoundsException if the given index is less than zero,
   *     or greater than or equal to the number of elements in the list.
   */
  @Nullable Object get(int index);

  /**
   * Add an item into the list after the current last item.
   * @param object the item to add.
   */
  default void add(@Nullable Object object) {
    insert(getLength(), object);
  }

  /**
   * Inserts an item into the list at a given index.
   * @param index The index at which the object should be inserted,
   *     with a minimum of zero and a maximum of the length of the list.
   * @param object the item to add.
   * @throws IndexOutOfBoundsException if the given index is less than zero,
   *     or greater than the number of elements in the list.
   */
  void insert(int index, @Nullable Object object);

}

If you implement an interface and have no particular name for distinguishing the implementation, it is customary to add an Impl suffix (meaning “implementation”), e.g. LinkedListImpl.

Your LinkedList implementation should replace the array for storing books in your Booker program.

Implement the following method in your Booker application and add appropriate unit tests for it. This method should use LinkedList.getFirstNode() and traverse the nodes manually; it should not iterate over the indexes as you would an array.

/**
 * Finds a book in a list by its title.
 * @param list The list of books; it can be assumed that all items in the list are instances of {@link Book}.
 * @param title The title of the book.
 * @return The first book with the given title, or <code>null</code> if no book with that title was found.
 */
public static @Nullable Book findBookByTitle(@Nonnull LinkedList list, @Nonnull String title) {
  …
}

Normally a linked list should not expose its nodes. The method getFirstNode() in the LinkedList interface is temporary to provide access to the search method, and will be removed in a future lesson when better approaches for searching are introduced.

References

Acknowledgments

“TINKERTOY” is a trademark of Hasbro.