Graphs
Goals
- Understand and identify data structures.
- Understand node and graph data structures.
- Implement and use a linked list.
Concepts
- array
- data structure
- directed graph
- doubly-linked list
- edge
- graph
- instance graph
- linked list
- node
- traverse
- vertex
Library
Lesson
When you learned about algorithms, you discovered that certain of them require you to first organize data in a certain way. A binary search, for example, requires that all input data first be sorted. Not only the order but sometimes even the general organization of data in the computer memory can be important or even necessary for an algorithm to work.
Data Structures
Until this point, whenever you need groups of objects, you have been storing them in arrays. In computer science many other approaches have been invented for holding groups of objects in addition to arrays. These approaches are called data structures, and they serve as general-purpose containers for data. Many times an algorithm will require that data be stored in a particular data structure in order for the algorithm to work. Data structures generally the same way across all programming languages, and they each have their own benefits and trade-offs.
Array
The Java “array objects” you have been using are objects like any non-primitive value. You may have given no thought about how actual values are stored inside the array object, but conceptually the values are stored in a row, in contiguous memory positions. In data structure terms, this is the actual “array”. You could say that the Java array object stores its values in an array data structure.
Index | 0 | 1 | 2 | 3 | …n-1 |
---|---|---|---|---|---|
Value | element_1 | element_2 | element_3 | element_4 | element_n |
An array data structure is rather rigid. It has a fixed length, and to add new elements we would somehow have to allocate more memory and/or move the array somewhere else in memory. Inserting an element into the array requires shifting all the other elements down to make room. But perhaps one of the biggest drawbacks is that finding an object in an array means that we have to look through all the elements of the array, one at a time. We could get lucky and find an object toward the front of the array. If we were not so lucky we might not find the object until we reached the end of the array.
Graph
A solution to many of the drawbacks of arrays is to link the objects together in memory, much like a TinkerToy® set. Linking objects together creates a data structure called a graph.
In graph terminology, each object above is called a vertex of the graph, and the lines between the objects are called edges. Graphs can come in all shapes an sizes. A graph could consist of several vertices strung out in a single line. A graph could consist of several vertices each connected to all the other vertices. A graph could even consist of a single vertex not connected to anything. The key concept of a graph is that a vertex has a way to link to one or more other vertices.
Node
But how can we link our objects in real life? If we have three Animal
instances, for example, how could we link them together? One solution would be to add a nextAnimal
variable to the Animal
class. If we wanted to link together several animals, we could use animal1.setNextAnimal(animal2)
, and use animal2.setNextAnimal(animal3)
, and so on—sort of like a row of dogs on leashes, each leashed to the other. But do we really want to pollute our Animal
class with special linking variables just because we want a way to group our animals together? This approach would also come with some restrictions: it would be difficult for an Animal
to appear in more than one group, for example.
A more flexible approach is to create a separate “wrapper class” for each of the vertexes. Normally this data structure called a node, and its sole purpose is to hold the object and point to one or more other nodes.
Here is conceptually how a Node
class might be implemented. In reality the number of nodes would depend on the specific type of graph.
We could then construct the graph pictured above:
Graphs in general can become quite complex and can help solve complicated problems. But for merely storing objects there are a few main graph types that are relatively simple.
Linked List
A linked list data structure is a graph consisting of a single line of nodes strung together.
Because each node can only connect to one node in front of it (or none at all, if it is the last node), a LinkedListNode
for a linked list can be simpler than the more general Node
source code shown in the earlier section above.
A linked list can thus be a substitution for an array! Here is an abbreviated implementation of a linked list, allowing elements to be added and a length to be calculated.
Review
Gotchas
- Arrays are always mutable; anyone with access to an array can change its contents.
- Arrays allow very fast access, but take a long time to insert or delete items because the other elements must be copied.
- Linked list allow quick insertion and deletion, but take longer to look up a certain index because the list must be traversed.
In the Real World
- Data structures are the workhorses of computer programs. You will use them day in and day out to shuffle data around.
Think About It
If you have worked with graph-based structures before, the needed implementation may become obvious to you and you will be able to sit down and code it from the top. But if you are just starting to work with graphs and node traversal, it may all seem overwhelming! That's when you should start with what you know and take it one step at a time. Find a small piece that is so simple that you can't fail to understand it. Implement it, and then make a unit test for it. Then add another step, and another unit test, and keep going until it all works. Let's walk through an example.
Linked List Length
Let's start with finding the length of a linked list. The solution for this can already be found in the main part of this lesson; that solution is iterative. But let's take another approach at how we might think about the problem.
First what is a linked list made up of? Nodes. Before trying to implement the whole “kit and caboodle” as it were, let's just think about nodes. In fact don't even think of nodes: think of a single node. You can look at the the implementation of LinkedListNode
in the lesson to follow along.
What if we have a single node containing the string
This is more or less the simplest node we can make. "foo"
as its value, and containing a next
node of null
?Can it still be considered a linked list?
Yes, it can! What is its length?
It's length is 1
. How did you know that?
Because the next
node is null
. You already knew that, right? So write a method for it.
So if LinkedListNode.getNext() == null
then there is only one node. That's not hard to do at all! Now you need to test it. Create a unit test LinkedListTest
in the same package.
At this point you should be on very firm footing. You know that if a node has a next
node of null
, there are no other nodes so the length is 1
. You've written a test for it, and it passed. You might even think that writing a test for such a simple case is ridiculous—but that would be wrong. In fact you could go ahead and write a test for lists longer than 1
. There is an entire movement that advocates writing tests before you do the implementation, just so you'll know what you're trying to do! Let's try it for a linked list of two nodes.
Run the test; it will fail. That's good—you now know what you're trying to implement.
So let's go back to our LinkedList.getLength(LinkedListNode)
method. We already know that if LinkedListNode.getNext()
is null
, then there is only one node in the list. But what if
What do we know? We know that there is one or more nodes in linked to the first node. You can think of these remaining nodes (however many they are) as a linked list themselves. In other words, if you take away the first node, we know that there exists one or more nodes linked together, and the first node in that list is whatever LinkedListNode().getNext()
is not null
?LinkedListNode().getNext()
references. If we only knew how long that list was, we could simply add 1
to it, because we've counted the first node:
So all you need to do know is find out how to find the length of the linked list starting with the next
node. But don't you already have a method for calculating the length of a linked list of nodes starting with some node?
Take a step back and look at your code; you are writing a getLength(LinkedListNode)
already! Why can't you call that to get the length of the next
node?
“But wait,” you may protest, “that method is not finished yet!” But is it not? Try it:
Think about what this method can do now:
getLength(LinkedListNode)
can find the length of a list with length1
. This is the base case.getLength(LinkedListNode)
can find the the length of any list greater than1
by adding1
to the length of the nodes in front of it. This is the general case.
And isn't that all it needs to know? Instead of the iterative approach in the lesson, this is in fact the recursive approach. If you have a string of 5
beads, you know that the length of the beads is 1
plus the length of the remaining 4
beads. And the length of 4
beads is 1
plus the length of the remaining 3
beads, and so on down to the last bead, which of course has length of 1
. (We always need a way to stop when we are using recursion.)
Now run the unit test LinkedListTest.testTwoNodesGetLength()
; it will pass. Write one for three nodes, for five nodes, and for 500
nodes; they will all pass.
We now can calculate the length of a list of one node and anything greater than one node. What about no nodes—an empty list? We can't represent an empty list with a node; we'll have to go to the LinkedList
level for that, but we don't have to throw away all our hard work in getLength(LinkedListNode)
. If the list is not empty, we can delegate to the method we already wrote, like this:
So now if you feel lost, you have an approach to attack any of these graph traversal problems, and indeed most problems in software development.
- Take the simplest case you can think of.
- Start working directly with the nodes; you can work with the higher-level data structure later.
- Create some static method to work directly with the simple case.
- Create a unit test to test that simple case.
- Write another unit test to test something just a little more difficult.
- Go back and finish writing the method to handle the more difficult case.
- Repeat until finished.
Self Evaluation
- Why are data structures needed?
- What is the distinction between a Java array object and an array data structure? How are the two related?
- What are some drawbacks of an array data structure?
- What must be done in order to insert an object into the middle of an array?
- How could you arbitrarily indicate relationships among
Foo
instances in a graph without modifying theFoo
class? - In what ways is a linked list better than an array? How might a linked list be less efficient than an array?
Task
Create a new Maven project in the same Git repository to hold the data structures you implement in the two sections below.
- Move your existing Booker project files to a separate
booker
subdirectory. Because you are mostly just moving files, you can commit these changes directly to the repository without creating a merge request. The remaining steps will be part of your merge request for this lesson.- You will need to move your
.gitignore
file as well, if you used the absolute path form for any files or subdirectories.
- You will need to move your
- Create a new subdirectory named
datastruct
for your new data structure project. - In the
datastruct
directory create a new Maven Java project using the packagemy.package.datastruct
(substituting your base package name).- You will need to create an additional
.gitignore
file for the new project directory.
- You will need to create an additional
- Provide appropriate Maven coordinates, using the artifact ID
datastruct
. - Add the
datastruct
project coordinates as a dependency to thebooker
project.
Create your own implementation of a linked list, implementing the following interface, complete with unit tests. Override the default implementation of add(Object object)
only if you think it appropriate.
Implement the following method in your Booker application and add appropriate unit tests for it. This method should use LinkedList.getFirstNode()
and traverse the nodes manually; it should not iterate over the indexes as you would an array.
See Also
References
- Data structure (Wikipedia)
- Dictionary of Algorithms and Data Structures (NIST)
- Graph (Wikipedia)
- Linked list (Wikipedia)
Acknowledgments
- “TINKERTOY” is a trademark of Hasbro.