Collections

Goals

Understand how the Java Collections Framework defines abstract data types with corresponding data structure implementations.
Recognize the major Collection<E> interfaces and their most-used implementations.
Explore the Java collection utilities.

Concepts

capacity
collection
decorator pattern
deque
fail-fast iterator
first-in, first-out (FIFO)
head
inclusive
insertion-order
Java Collections Framework (JCF)
last-in, first-out (LIFO)
list
live view
natural ordering
queue
set
stack
tail
view

Library

Lesson

At this point you're very familiar with a few basic data structures; including arrays; linked lists; and trees. You realize that on a higher level, an abstract data type (ADT) is the type of interface that determines how we use data; we can choose among data structures to use as the implementation for a particular ADT. For example, one ADT is list of items; one could store a list of items in an array or a linked list (or even a tree structure, although that isn't common). The list interface allows us to add and remove things at list indexes; the underlying data structure determines how best to store that data in memory. You've even written an ListADT type and put a LinkedListADT implementation behind it.

Java Collections Framework interfaces class diagram. — Java Collections Framework interfaces.

Java has an entire set of ADTs and implementations, referred to as the Java Collections Framework (JCF), as part of the standard library. These collections are used day-in and day-out with Java, and third-party libraries such as Google Guava have added extensions and utilities for working with Java collections.

The core Java collections library consists of a set of interfaces, mostly organized as either a Collection or a Map. These interfaces represent Java's conception of abstract data types, and the collections library includes many data structures that implement these ADT interfaces. In this lesson and subsequent lessons, you will learn to use the core Java collection classes. You will discontinue use of your custom ADTs and implementations, and switch to using Java collections.

Collections

In the Java Collections Framework, a collection refers specifically to those ADTs representing groups of individual elements. Technically therefore not all of Java's ADTs are collections per-se. Other ADTs that store item relationships will be discussed in a future lesson.

List: A list is a sequence of elements that can be accessed via indexes.
Set: A set is a collection of elements with no duplicates and no inherent order.
Queue: A queue is a first-in, first-out (FIFO) collection of elements, similar to a line of people waiting in a bank.
Deque: A deque (pronounced like deck) is a double-ended queue, allowing insertion and deletion from either end. In addition to functioning as a queue, a deque can also function as a last-in, first-out (LIFO) data type called a stack.

Lists Versus Sets

Main differences between List and Set.

	List	Set
Duplicates	yes	no
Ordered	yes	no Some set implementations can be sorted.

The two most-often used Java collection ADTs are lists and sets. Each has different characteristics, so it is important to choose the appropriate one for your purpose. The primary distinction between lists and sets is that lists allow duplicates and provide positional access, while sets prevent duplicates and may not guarantee order.

`Collection<E>`

All collections ultimately implement the base java.util.Collection<E> interface, which brings a wealth of functionality shared among all collection types. All Java collections automatically implement java.lang.Iterable<T>, for example. This means that you can get an iterator to the elements in any collection, or use it in an enhanced for(…) loop! Here are some of the most important methods Collection<E>:

Iterable<T>: All Java collections automatically implement java.lang.Iterable<T>. This means that you can get an iterator to the elements in any collection, or use it in an enhanced for(…) loop!
size(): You can easily ask any collection for the number of elements in it.
isEmpty(): This is a convenience method logically equivalent to checking size() == 0. Calling isEmpty() is preferred over checking the size, because the underlying implementation may have a more efficient way than retrieving the number of elements.
contains(Object object): You can ask a collection if a particular object is contained in the collection. Normally this uses Object.equals(…) for comparison, so most of the time you aren't checking to see if an actual instance is literally in the collection. Some Collection<E> implementations can check for an object more quickly and efficiently than others, as you will learn below.
add(E element): All collections provide a common method for adding items to the collection.
addAll(Collection<? extends E> collection): Convenience method for adding elements in bulk from another collection. This may be efficiently than adding the elements individually.
remove(Object object): All collections provide a common method for removing items from the collection.
clear(): Removes all items from a collection, leaving the collection empty. Depending on the specific ADT and implementation, this method may be more efficient than removing items individually.

Note that contains(Object) and remove(Object) do not take generic types. This was done for various reasons, primarily for backwards-compatibility with collections before generics were introduced. See Java Puzzlers IV: The Phantom Reference Menace, Attack of the Clone, and Revenge of The Shift (YouTube - GoogleTechTalks) at 7:00.

Not all collections allow modification of the collection contents. If the collection is read-only, methods such as Collection.add(…) (as well as modification methods added by sub-interfaces) will throw an UnsupportedOperationException.

The fact that collections normally make comparisons of the contained elements based upon Object.equals(…) can be a little tricky. And be careful: some collection implementations are made especially for comparing objects based upon identity rather than on equivalence. These collections bend the contract rules for very useful reasons, though, and they usually make it clear in their names and documentation this departure from the Collection<E> contract. Still—be careful.

The Collection<E> interface can be thought of as an interface in level of complexity sitting between a simple Iterable<T> and a more specific ADT collection type such as List<E> (below).

Because Collection<E> is so abstract, serving as the root interface for all the collection types, there are usually no direct implementations of the Collection<E> interface itself. If you need to pass Collection<E> to a method, just pass one of the implementations of a simple implementation of one of the most specific collection ADT types. (The Guava immutable collection utilities, explained in a future lesson, are especially useful for this.)

As you learned in the lesson on iterators, the Iterator.remove() method provides a convenient way to discard items from the underlying data structure, as shown in this example which removes all empty strings from a collection of strings:

public static void removeEmptyStrings(@Nonnull final Collection<String> strings) {
  final Iterator<String> stringIterator = strings.iterator();
  while(stringIterator.hasNext()) {
    if(stringIterator.next().isEmpty()) {
      stringIterator.remove();
    }
  }
}

Java does not allow you to manually modify a collection while you are iterating through it by calling modification methods directly on the underlying collection. The collection iterators are fail-fast and will detect the modification, throwing a java.util.ConcurrentModificationException.

`List<E>`

The java.util.List<E> interface is perhaps the most-used of the Java collection types, and is equivalent to the ListADT interface you created. The List.add(…) method (inherited from Collection<E>) will add the item to the end of the list. The List<E> interface also adds several methods that bring functionality specific to lists:

get(int index): Retrieves an item at a specific index in the list.
set(int index, E element): Changes the item at a specific index in the list.
add(int index, E element): Inserts an item into a specific index in the list. This operation inherently increases the indexes of all following items in the list, if any.
remove(int index): Removes an item from a specific index in the list. This operation inherently decreases the indexes of all following items in the list, if any.
listIterator(): Returns a java.util.ListIterator<E>, an iterator with special capabilities for traversing lists (such as the ability to iterate both forwards and backwards).
subList(int fromIndex, int toIndex): Returns a List<E> implementation that is a view of the original list for some particular range. The new list is a live view in that changes made to it are reflected in the original list. Note that for the expressed range the fromIndex parameter is inclusive, indicating the first index to include; but the toIndex parameter is exclusive, indicating one index past the last element to include.
sort(Comparator<? super E> comparator): Sorts the list, using the provided strategy for comparing individual items in the list. This is the collection equivalent of the java.util.Arrays.sort(T[], Comparator<? super T>) utility method for working with arrays you learned about in the lesson on the strategy pattern.

Because List.subList(…) provides such an easy way to create a list view of range of list elements, there is no need to create methods that accept a range of indexes, like this:

public void doSomething(List<Foo> list, int startIndex, int endIndex) { …

Instead you should simple accept a list as the method parameter:

public void doSomething(List<Foo> list) { …

After all you can simply call doSomething(list.subList(startIndex, endIndex)) if you want to process only a range of the list items; the method doSomething(…) will see the subrange view as a self-contained list itself.

Implementations

java.util.ArrayList<E>: A List<E> backed by an array. This is the analogue of ArrayListADT<E> in these lessons.
java.util.LinkedList<E>: A List<E> implemented using doubly-linked nodes. This is the analogue of LinkedListADT<E> in these lessons.
java.util.Vector<E>: A legacy data structure from Java 1.0, and later retrofitted to implement List<E>. This class is thread-safe, but its concurrency implementation is so overbearing that overwhelming that it is inefficient compared with other approaches.

An ArrayList<E> internally maintains a capacity indicating the amount of memory reserved for adding items before a new internal array must be created. The constructor ArrayList(int initialCapacity) allows this capacity to be specified at the beginning. If you are going to use the list to process an existing collection, you may know the maximum amount of items the list could ever hold, and specifying this value when creating the list could save memory or make the task more efficient by preventing copying from needing to occur in the middle of the algorithm.

public static List<String> getNonEmptyStrings(@Nonnull final Collection<String> strings) {
  final List<String> nonEmptyStrings = new ArrayList<String>(strings.size());
  for(final String string : strings) {
    if(!string.isEmpty)) {
      nonEmptyStrings.add(string);
    }
  }
  return nonEmptyStrings;
}

You've learned that linked lists provide more efficient insertion and deletion. This is true in general, but because java.util.ArrayList<E> uses low-level memory copying techniques, the difference in efficiency between it and java.util.LinkedList<E> is not as big as you might imagine for small lists. In addition list modification tends to to happen less frequently than list lookup, and ArrayList<E> greatly excels in indexed-based retrieval. LinkedList<E> is not as memory efficient, as it uses many small nodes rather than a simple array of elements. There is some discussion and debate in the community in this regard (see e.g. When to use LinkedList over ArrayList?), but in general use, unless you have a specific reason to use a linked list, use ArrayList<E> by default.

Do not use the Vector<E> class unless you have to. It provides a brute-force thread safety, but many times no thread safety is needed when used locally and temporarily. This class is provided mostly for backwards compatibility for those old APIs that require a Vector<E> instance. In new code use one of the other List<E> implementations with appropriate thread-safety added, as you will learn in an upcoming lesson on thread-safety.

`Set<E>`

A java.util.Set<E> is a collection of elements; in this way it is similar to a list. But that's where the similarities end. A Set<E> does not allow duplicates; if you try to add an object that equals(…) another item in the set, the new item will be ignored. Furthermore a Set<E> has no inherent order; when you iterator over the items in a Set<E>, you have no guarantees which items will be returned first. A Set<E> provides no additional methods over those included in Collection<E>, but its implementation will likely perform many of them more efficiently than other collection types.

If you want to make sure you have no duplicates of something, just add them all to a Set<E>. The resulting contents of the Set<E> will have no duplicates, as the Set<E> will throw out the duplicates automatically. Like ArrayList<E>, HashSet<E> (see below) has a constructor HashSet(int initialCapacity) that allows its initial capacity to be specified.

public static Set<String> getUniqueStrings(@Nonnull final Collection<String> strings) {
  final Set<String> uniqueStrings = new HashSet<String>(strings.size());
  for(final String string : strings) {
    uniqueStrings.add(string);  //simply adding to a set will remove duplicates
  }
  return uniqueStrings;
}

The same result could be obtained in a simpler and more efficient manner using the addAll(Collection<? extends E> collection) build add method.

public static Set<String> getUniqueStrings(@Nonnull final Collection<String> strings) {
  final Set<String> uniqueStrings = new HashSet<String>(strings.size());
  uniqueStrings.addAll(strings);
  return uniqueStrings;
}

Or even simpler by specifying the input collection in the HashSet(Collection<? extends E> collection) constructor.

public static Set<String> getUniqueStrings(@Nonnull final Collection<String> strings) {
  return new HashSet<String>(strings);
}

A Set<E> is therefore an ideal tool for making a checklist of things. If you want to keep track of which values have been included, visited, excluded, tallied, or whatever, just add them to a Set<E> one by one. You can check to see if the item has been included by using contains(…). All Collection<E> instances support contains(…), but a Set<E> implements it more efficiently—as well as automatically ignoring duplicate values.

Subinterfaces

java.util.SortedSet<E>: If a Set<E> implements this interface, it guarantees some ordering of its elements. This is usually the natural ordering or the ordering of some provided Comparator<T> strategy.
java.util.NavigableSet<E>: The NavigableSet<E> interface provides additional methods for navigating to items in order. For example higher(E element) returns the first element in the set that is higher than the given element. The headSet(E toElement) method returns a view to the set that only includes elements lower than the given element. The fact that a NavigableSet<E> has some order might have tipped you off that a NavigableSet<E> is a SortedSet<E>.

Implementations

java.util.HashSet<E>: This Set<E> implementation efficiently keeps track of objects and detects duplicates by using hash codes, similar to how you implemented your HashTableImpl<K, V>. Keys must implement Object.hashCode() correctly. HashSet<E> should be the default Set<E> implementation you choose unless you have reason to choose another implementation.
java.util.LinkedHashSet<E>: Equivalent to a HashSet<E>, except that it also maintains a doubly-linked list to maintain iteration order. Iteration occurs in insertion-order, the order in which you add items to the set.
java.util.TreeSet<E>: A Set<E> that automatically sorts its elements by storing items in a tree data structure. The elements are either sorted by their natural ordering, or by a java.util.Comparator<T> provided in the constructor. A TreeSet<E> is a NavigableSet<E>, which implies it is also a SortedSet<E>.

A Set<E> is conceptually no more than a Map<K, V> for which each object is mapped to a Boolean value indicating whether it appears in the set. Java provides a utility method java.util.Collections.newSetFromMap(Map<E,Boolean> map) for creating a set from any existing map with Boolean values. Thus if Java had no HashSet<E>, you could create the equivalent of one by calling Collections.newSetFromMap(HashMap<E, Boolean>()), where E is the type of set element. Additional helper methods are covered below under Utilities.

`Queue<E>`

One of the easiest ways to get a Queue<E> instance is to create a LinkedList<E>, as it implements the Queue<E> interface.

A java.util.Queue<E> represents a queue such a line at a bank, allowing first-in, first-out (FIFO) processing. A queue is an ordered collection of elements, like a list—and in fact a linked list is one of the most common data structure implementations for a queue. The main difference between a the Queue<E> and List<E> interfaces is that Queue<E> provides special methods for adding adding items to the tail of the queue (e.g. the end of a linked list) and removing items from the head of the queue (e.g. the beginning of a linked list). As such it is usually used for holding elements for later processing.

The Queue<E> interface provides pairs of method for performing various functions: one method that attempts to perform the operation and throws an exception if not possible; and another that only attempts the operation and returns a special value if the operation is not possible. The forms that return special values on error are mostly for use with capacity-restricted queues.

Operation	Description	Exception on Error	Special Value on Error
Add	Adds element to tail of queue.	`add(E element)`	`offer(E element)`
Remove	Removes element from head of queue.	`remove()`	`poll()`
Examine	Returns element from head of queue.	`element()`	`peek()`

Principal methods of the java.util.Queue<E> interface.

`Deque<E>`

The java.util.Deque<E> interface extends Queue<E> and represents a double-ended queue. Most importantly a deque allows insertion and removal at both the head and tail ends. The Deque<E> interface also provides pairs of method for performing various functions: one method that attempts to perform the operation and throws an exception if not possible; and another that only attempts the operation and returns a special value if the operation is not possible. The forms that return special values on error are mostly for use with capacity-restricted deques.

Operation	Description	Head		Tail
		Exception on Error	Special Value on Error	Exception on Error	Special Value on Error
Add	Adds element to tail of queue.	`addFirst(E element)`	`offerFirst(E element)`	`addLast(E element)`	`offerLast(E element)`
Remove	Removes element from head of queue.	`removeFirst()`	`pollFirst()`	`removeLast()`	`pollLast()`
Examine	Returns element from head of queue.	`getFirst()`	`peekFirst()`	`getLast()`	`peekLast()`

Principal methods of the java.util.Deque<E> interface.

Stack

Besides functioning as a first-in, first-out (FIFO) queue, a deque can be used as a stack of items, allowing last-in, first-out (LIFO) processing. The Deque<E> interface even provides special stack-related methods, although they duplicate the more general deque methods.

Operation	Description	Stack Method	Equivalent Deque Method
Push	Adds element to top of stack.	`push(E element)`	`addFirst(E element)`
Pop	Removes element from top of stack.	`pop()`	`removeFirst()`
Examine	Returns element from top of stack.	`peek()`	`peekFirst()`

Stack-related methods of the java.util.Deque<E> interface.

Utilities

In addition to the collection interfaces and their implementations, Java comes with a Collections class that comprises a set of utilities for working with collections—any implementation of the collection interfaces, not just those that ship with Java. It would be a good idea to take a look at the java.util.Collections documentation to get in idea of the full set of utilities. The most useful ones are explained below.

You may want to statically import java.util.Collections.* so that the utility methods read more fluently.

Empty Collections

If you know that your method will return an empty collection, instead of creating a collection implementation such as ArrayList<E> you can use one of Java's pre-made, immutable, empty collections. The Collections class provides methods such as emptyList() and emptySet() that even return the correct generic type of collection you require.

Example method using Collections.emptySet().

/**
 * Returns a set of all characters in the string, with no duplicates.
 * @param string The string of characters.
 * @return The set of characters encountered in the string.
 */
public static Set<Character> getCharacters(@Nonnull final String string) {
  if(string.isEmpty()) {
    return java.util.Collections.emptySet();  //provides the correct generic type
  }
  //TODO process string
}

Behind the scenes Java takes advantage of generics erasure, using the trick you already learned in the lesson on generics. Because the returned collection is empty and cannot be modified, it actually doesn't make any difference what generic type it is; at runtime the generic type will be erased anyway. Java therefore simply keeps a single empty collection and casts this static instance to the correct generic type, using the captured generic type the caller requested. The compiler knows this is normally unsafe, but because it doesn't matter in this case Java uses an annotation to suppress this warning.

@SuppressWarnings("unchecked")
public static final <T> Set<T> emptySet() {
    return (Set<T>) EMPTY_SET;
}

Immutable Collections

Sometimes you need to have a collection that you do not allow anyone to modify. Rather implementing this functionality into each collection type, Java provides a set of immutability utilities such as unmodifiableCollection(…), unmodifiableList(…), and unmodifiableSet(…) which turn any collection instance into an immutable one. The original collection is still mutable; Java instead returns a new collection instance that is provides immutable access to your existing collection. Calling any modifying method will result in an UnsupportedOperationException.

Creating an immutable set using Collections.unmodifiableSet(…).

public static final Set<String> NICKNAMES;

static {
  final Set<String> nicknames = new HashSet<String>();
  nickNames.add("Will");
  nickNames.add("Willy");
  nickNames.add("Bill");
  NICKNAMES = java.util.Collections.unmodifiableSet(nicknames);
}

Java pulls off this immutability trick by wrapping your collection with a lightweight implementation of the appropriate collection type. This wrapper class keeps a reference to your collection, and forwards all the methods to the methods of your collection instance—except for those methods that would modify the collection, and for those Java throws an UnsupportedOperationException. Here is the general idea, implemented as a static inner class:

private static class ImmutableSet<E> implements Set<E> {

  private final Set<E> wrappedSet;

  public ImmutableSet(@Nonnull final Set<E> wrappedSet) {
    this.wrappedSet = checkNotNull(wrappedSet);
  }

  …

  @Override
  public int size() {
    return wrappedSet.size();  //delegate to wrapped collection
  }

  …

  @Override
  public boolean add(final E element) {
    throw new UnsupportedOperationException();
  }

  …

}

This is in fact yet another design pattern, referred to as the decorator pattern because the wrapper class decorates some instance in order to give it more capabilities or change the way it functions. It is also yet another example of indirection.

Hamcrest

Many of these matchers are for Iterable<T>, but they work just as well with Collection<E> because collections are iterable.

The Hamcrest library you have been using with JUnit for unit testing comes with several matchers that assist in testing collections. They make it much easier to verify the result of tests that return things such as lists and sets. These matchers are all retrieved via static methods of org.hamcrest.Matchers; here are just a few of the most useful of them:

contains(E... items): Checks whether an Iterable<T> contains exactly the given items, in the order they are given. All items must be present. Useful for verifying the contents of a list of items.
containsInAnyOrder(T... items): Checks whether an Iterable<T> contains exactly the given items, in any order. All items must be present. Useful for verifying the contents of a set of items.
empty(): Checks whether a Collection<E> is empty.
hasItem(T item): Checks whether an Iterable<T> contains at least the given item.
hasItems(T... items): Checks whether an Iterable<T> contains at least the given items, in any order. All items must be present.
hasSize(int size): Checks whether a Collection<E> is of the given size.

Be careful not to confuse Matchers.hasItems(T... items) with Matchers.containsInAnyOrder(T... items). The former, hasItems(T... items), would still match a collection that has additional items than the ones given. The latter, Matchers.containsInAnyOrder(T... items), requires exactly the items given—no more and no less.

Review

Gotchas

Don't directly modify a collection while you are iterating or the collection will throw a java.util.ConcurrentModificationException. If you need to remove an item during iteration, use the iterator's Iterator.remove() method.
Don't forget that in the range of List.subList(int fromIndex, int toIndex) the toIndex parameter is exclusive, and indicates one index past the last element to include.
Don't assume that a Set<E>'s Iterator<T> will return the elements in any particular order.
Don't use Matchers.hasItems(T... items) if you want to check the exact contents of a collection. Instead use Matchers.contains(T... items) to verify the exact contents of a list, and Matchers.containsInAnyOrder(T... items) to verify the exact contents of a set.

In the Real World

Use Collection.isEmpty() as often as possible rather than checking its size for zero, as isEmpty() may be more efficient.
The Vector<E> class is old and outdated. It brings a thread-locking overhead that is not always necessary. Use one of the other Java List<E> implementations unless an old API requires an instance of Vector<E>.
Make ArrayList<E> your default go-to List<E> implementation. The insertion and deletion penalties are not as big might be imagined.

Think About It

Choose the correct type of collection for the job:
- Do the items you are keeping track of have duplicates, or a required positional order? If so you need a List<E>.
- Do you want to prevent duplicates, or quickly check to see if you've encountered an item before? If so use a Set<E>.
- Do want to keep a sequence of items for processing in the same order as you add more? You may want a Queue<E>.
- Do you want to process items in the reverse order you added them? You will find the stack-related methods of a Deque<E> useful.

Self Evaluation

What does it mean that collection iterators are fail-fast?
In what order does an iterator of a LinkedHashSet<E> return its contents? How does this differ from the iterator of a plain HashSet<E>?
What is an object's natural ordering?

Task

Convert your booker project to use equivalent Java collections rather than your linked list implementations from your datastruct project.

Once you are finished, your booker project should have no references to any linked lists from the datastruct project.
The only remaining datastruct dependency should be the use of your hash table implementation.

Put your main list of publications into a static immutable List<Publication>. Your main list of publications should have already been converted from using an array to use a list.

Make the list immutable.

References

Collections Framework Overview (Oracle)

Resources

The Collections Framework (Oracle - Java™ Platform Overview)

Acknowledgments

Some symbols are from Font Awesome by Dave Gandy.

Collections

Goals

Concepts

Library

Lesson

Collections

Lists Versus Sets

Collection<E>

List<E>

Implementations

Set<E>

Subinterfaces

Implementations

Queue<E>

Deque<E>

Stack

Utilities

Empty Collections

Immutable Collections

Hamcrest

Review

Gotchas

In the Real World

Think About It

Self Evaluation

Task

See Also

References

Resources

Acknowledgments

`Collection<E>`

`List<E>`

`Set<E>`

`Queue<E>`

`Deque<E>`