Collections
Goals
- Understand how the Java Collections Framework defines abstract data types with corresponding data structure implementations.
- Recognize the major
Collection<E>
interfaces and their most-used implementations. - Explore the Java collection utilities.
Concepts
- capacity
- collection
- decorator pattern
- deque
- fail-fast iterator
- first-in, first-out (FIFO)
- head
- inclusive
- insertion-order
- Java Collections Framework (JCF)
- last-in, first-out (LIFO)
- list
- live view
- natural ordering
- queue
- set
- stack
- tail
- view
Library
java.lang.Comparable<T>
java.lang.Comparable.compareTo(T)
java.lang.Iterable<T>
java.util.ArrayList<E>
java.util.ArrayList.ArrayList(int initialCapacity)
java.util.Collection<E>
java.util.Collection.add(E element)
java.util.Collection.addAll(Collection<? extends E> collection)
java.util.Collection.clear()
java.util.Collection.contains(Object object)
java.util.Collection.isEmpty()
java.util.Collection.remove(Object object)
java.util.Collection.size()
java.util.Collections
java.util.Collections.emptyList()
java.util.Collections.emptySet()
java.util.Collections.newSetFromMap(Map<E,Boolean> map)
java.util.Collections.unmodifiableCollection(…)
java.util.Collections.unmodifiableList(…)
java.util.Collections.unmodifiableSet(…)
java.util.Comparator<T>
java.util.ConcurrentModificationException
java.util.Deque<E>
java.util.Deque.addFirst(E element)
java.util.Deque.addLast(E element)
java.util.Deque.getFirst()
java.util.Deque.getLast()
java.util.Deque.offerFirst(E element)
java.util.Deque.offerLast(E element)
java.util.Dequq.peek()
java.util.Deque.peekFirst()
java.util.Deque.peekLast()
java.util.Deque.pollFirst()
java.util.Deque.pollLast()
java.util.Deque.pop()
java.util.Deque.push(E element)
java.util.Deque.removeFirst()
java.util.Deque.removeLast()
java.util.HashSet<E>
java.util.HashSet.HashSet(Collection<? extends E> collection)
java.util.HashSet.HashSet(int initialCapacity)
java.util.Iterator<E>
java.util.Iterator.remove()
java.util.LinkedHashSet<E>
java.util.LinkedList<E>
java.util.List<E>
java.util.List.add(int index, E element)
java.util.List.get(int index)
java.util.List.listIterator()
java.util.List.remove(int index)
java.util.List.set(int index, E element)
java.util.List.sort(Comparator<? super E> comparator)
java.util.List.subList(int fromIndex, int toIndex)
java.util.ListIterator<E>
java.util.NavigableSet<E>
java.util.Queue<E>
java.util.Queue.add(E element)
java.util.Queue.element()
java.util.Queue.offer(E element)
java.util.Queue.peek()
java.util.Queue.poll()
java.util.Queue.remove()
java.util.Set<E>
java.util.SortedSet<E>
java.util.TreeSet<E>
java.util.Vector<E>
org.hamcrest.Matchers
org.hamcrest.Matchers.contains(E... items)
org.hamcrest.Matchers.containsInAnyOrder(T... items)
org.hamcrest.Matchers.empty()
org.hamcrest.Matchers.hasItem(T item)
org.hamcrest.Matchers.hasItems(T... items)
org.hamcrest.Matchers.hasSize(int size)
Lesson
At this point you're very familiar with a few basic data structures; including arrays; linked lists; and trees. You realize that on a higher level, an abstract data type (ADT) is the type of interface that determines how we use data; we can choose among data structures to use as the implementation for a particular ADT. For example, one ADT is list
of items; one could store a list of items in an array or a linked list (or even a tree structure, although that isn't common). The list
interface allows us to add and remove things at list indexes; the underlying data structure determines how best to store that data in memory. You've even written an ListADT
type and put a LinkedListADT
implementation behind it.
Java has an entire set of ADTs and implementations, referred to as the Java Collections Framework (JCF), as part of the standard library. These collections are used day-in and day-out with Java, and third-party libraries such as Google Guava have added extensions and utilities for working with Java collections.
The core Java collections library consists of a set of interfaces, mostly organized as either a
or a Collection
. These interfaces represent Java's conception of abstract data types, and the collections library includes many data structures that implement these ADT interfaces. In this lesson and subsequent lessons, you will learn to use the core Java collection classes. You will discontinue use of your custom ADTs and implementations, and switch to using Java collections.Map
Collections
In the Java Collections Framework, a collection refers specifically to those ADTs representing groups of individual elements. Technically therefore not all of Java's ADTs are collections
per-se. Other ADTs that store item relationships will be discussed in a future lesson.
List
- A list is a sequence of elements that can be accessed via indexes.
Set
- A set is a collection of elements with no duplicates and no inherent order.
Queue
- A queue is a first-in, first-out (FIFO) collection of elements, similar to a line of people waiting in a bank.
Deque
- A deque (pronounced like
deck
) is a double-ended queue, allowing insertion and deletion from either end. In addition to functioning as a queue, a deque can also function as a last-in, first-out (LIFO) data type called a stack.
Lists Versus Sets
List | Set | |
---|---|---|
Duplicates | yes | no |
Ordered | yes | no Some set implementations can be sorted. |
The two most-often used Java collection ADTs are lists and sets. Each has different characteristics, so it is important to choose the appropriate one for your purpose. The primary distinction between lists and sets is that lists allow duplicates and provide positional access, while sets prevent duplicates and may not guarantee order.
Collection<E>
All collections ultimately implement the base java.util.Collection<E>
interface, which brings a wealth of functionality shared among all collection types. All Java collections automatically implement java.lang.Iterable<T>
, for example. This means that you can get an iterator to the elements in any collection, or use it in an enhanced for(…)
loop! Here are some of the most important methods Collection<E>
:
Iterable<T>
- All Java collections automatically implement
java.lang.Iterable<T>
. This means that you can get an iterator to the elements in any collection, or use it in an enhancedfor(…)
loop! size()
- You can easily ask any collection for the number of elements in it.
isEmpty()
- This is a convenience method logically equivalent to checking
size() == 0
. CallingisEmpty()
is preferred over checking the size, because the underlying implementation may have a more efficient way than retrieving the number of elements. contains(Object object)
- You can ask a collection if a particular object is contained in the collection. Normally this uses
Object.equals(…)
for comparison, so most of the time you aren't checking to see if an actual instance is literally in the collection. SomeCollection<E>
implementations can check for an object more quickly and efficiently than others, as you will learn below. add(E element)
- All collections provide a common method for adding items to the collection.
addAll(Collection<? extends E> collection)
- Convenience method for adding elements in bulk from another collection. This may be efficiently than adding the elements individually.
remove(Object object)
- All collections provide a common method for removing items from the collection.
clear()
- Removes all items from a collection, leaving the collection empty. Depending on the specific ADT and implementation, this method may be more efficient than removing items individually.
List<E>
The java.util.List<E>
interface is perhaps the most-used of the Java collection types, and is equivalent to the ListADT
interface you created. The List.add(…)
method (inherited from Collection<E>
) will add the item to the end of the list. The List<E>
interface also adds several methods that bring functionality specific to lists:
get(int index)
- Retrieves an item at a specific index in the list.
set(int index, E element)
- Changes the item at a specific index in the list.
add(int index, E element)
- Inserts an item into a specific index in the list. This operation inherently increases the indexes of all following items in the list, if any.
remove(int index)
- Removes an item from a specific index in the list. This operation inherently decreases the indexes of all following items in the list, if any.
listIterator()
- Returns a
java.util.ListIterator<E>
, an iterator with special capabilities for traversing lists (such as the ability to iterate both forwards and backwards). subList(int fromIndex, int toIndex)
- Returns a
List<E>
implementation that is a view of the original list for some particular range. The new list is a live view in that changes made to it are reflected in the original list. Note that for the expressed range thefromIndex
parameter is inclusive, indicating the first index to include; but thetoIndex
parameter is exclusive, indicating one index past the last element to include. sort(Comparator<? super E> comparator)
- Sorts the list, using the provided strategy for comparing individual items in the list. This is the collection equivalent of the
java.util.Arrays.sort(T[], Comparator<? super T>)
utility method for working with arrays you learned about in the lesson on the strategy pattern.
Implementations
java.util.ArrayList<E>
- A
List<E>
backed by an array. This is the analogue ofArrayListADT<E>
in these lessons. java.util.LinkedList<E>
- A
List<E>
implemented using doubly-linked nodes. This is the analogue ofLinkedListADT<E>
in these lessons. java.util.Vector<E>
- A legacy data structure from Java 1.0, and later retrofitted to implement
List<E>
. This class is thread-safe, but its concurrency implementation is so overbearing that overwhelming that it is inefficient compared with other approaches.
Set<E>
A java.util.Set<E>
is a collection of elements; in this way it is similar to a list. But that's where the similarities end. A Set<E>
does not allow duplicates; if you try to add an object that equals(…)
another item in the set, the new item will be ignored. Furthermore a Set<E>
has no inherent order; when you iterator over the items in a Set<E>
, you have no guarantees which items will be returned first. A Set<E>
provides no additional methods over those included in Collection<E>
, but its implementation will likely perform many of them more efficiently than other collection types.
Subinterfaces
java.util.SortedSet<E>
- If a
Set<E>
implements this interface, it guarantees some ordering of its elements. This is usually the natural ordering or the ordering of some providedComparator<T>
strategy. java.util.NavigableSet<E>
- The
NavigableSet<E>
interface provides additional methods fornavigating
to items in order. For examplehigher(E element)
returns the first element in the set that ishigher
than the given element. TheheadSet(E toElement)
method returns a view to the set that only includes elements lower than the given element. The fact that aNavigableSet<E>
has some order might have tipped you off that aNavigableSet<E>
is aSortedSet<E>
.
Implementations
java.util.HashSet<E>
- This
Set<E>
implementation efficiently keeps track of objects and detects duplicates by using hash codes, similar to how you implemented yourHashTableImpl<K, V>
. Keys must implementObject.hashCode()
correctly.HashSet<E>
should be the defaultSet<E>
implementation you choose unless you have reason to choose another implementation. java.util.LinkedHashSet<E>
- Equivalent to a
HashSet<E>
, except that it also maintains a doubly-linked list to maintain iteration order. Iteration occurs in insertion-order, the order in which you add items to the set. java.util.TreeSet<E>
- A
Set<E>
that automatically sorts its elements by storing items in a tree data structure. The elements are either sorted by their natural ordering, or by ajava.util.Comparator<T>
provided in the constructor. ATreeSet<E>
is aNavigableSet<E>
, which implies it is also aSortedSet<E>
.
Queue<E>
A java.util.Queue<E>
represents a queue such a line at a bank, allowing first-in, first-out (FIFO) processing. A queue is an ordered collection of elements, like a list—and in fact a linked list is one of the most common data structure implementations for a queue. The main difference between a the Queue<E>
and List<E>
interfaces is that Queue<E>
provides special methods for adding adding items to the tail of the queue (e.g. the end of a linked list) and removing items from the head of the queue (e.g. the beginning of a linked list). As such it is usually used for holding elements for later processing.
The Queue<E>
interface provides pairs of method for performing various functions: one method that attempts to perform the operation and throws an exception if not possible; and another that only attempts the operation and returns a special value if the operation is not possible. The forms that return special values on error are mostly for use with capacity-restricted queues.
Operation | Description | Exception on Error | Special Value on Error |
---|---|---|---|
Add | Adds element to tail of queue. | add(E element) | offer(E element) |
Remove | Removes element from head of queue. | remove() | poll() |
Examine | Returns element from head of queue. | element() | peek() |
java.util.Queue<E>
interface.Deque<E>
The java.util.Deque<E>
interface extends Queue<E>
and represents a double-ended queue
. Most importantly a deque allows insertion and removal at both the head and tail ends. The Deque<E>
interface also provides pairs of method for performing various functions: one method that attempts to perform the operation and throws an exception if not possible; and another that only attempts the operation and returns a special value if the operation is not possible. The forms that return special values on error are mostly for use with capacity-restricted deques.
Operation | Description | Head | Tail | ||
---|---|---|---|---|---|
Exception on Error | Special Value on Error | Exception on Error | Special Value on Error | ||
Add | Adds element to tail of queue. | addFirst(E element) | offerFirst(E element) | addLast(E element) | offerLast(E element) |
Remove | Removes element from head of queue. | removeFirst() | pollFirst() | removeLast() | pollLast() |
Examine | Returns element from head of queue. | getFirst() | peekFirst() | getLast() | peekLast() |
java.util.Deque<E>
interface.Stack
Besides functioning as a first-in, first-out (FIFO) queue, a deque can be used as a stack of items, allowing last-in, first-out (LIFO) processing. The Deque<E>
interface even provides special stack-related methods, although they duplicate the more general deque methods.
Operation | Description | Stack Method | Equivalent Deque Method |
---|---|---|---|
Push | Adds element to top of stack. | push(E element) | addFirst(E element) |
Pop | Removes element from top of stack. | pop() | removeFirst() |
Examine | Returns element from top of stack. | peek() | peekFirst() |
java.util.Deque<E>
interface.Utilities
In addition to the collection interfaces and their implementations, Java comes with a Collections
class that comprises a set of utilities for working with collections—any implementation of the collection interfaces, not just those that ship with Java. It would be a good idea to take a look at the java.util.Collections
documentation to get in idea of the full set of utilities. The most useful ones are explained below.
Empty Collections
If you know that your method will return an empty collection, instead of creating a collection implementation such as ArrayList<E>
you can use one of Java's pre-made, immutable, empty collections. The Collections
class provides methods such as emptyList()
and emptySet()
that even return the correct generic type of collection you require.
Immutable Collections
Sometimes you need to have a collection that you do not allow anyone to modify. Rather implementing this functionality into each collection type, Java provides a set of immutability utilities such as unmodifiableCollection(…)
, unmodifiableList(…)
, and unmodifiableSet(…)
which turn any collection instance into an immutable one. The original collection is still mutable; Java instead returns a new collection instance that is provides immutable access to your existing collection. Calling any modifying method will result in an UnsupportedOperationException
.
Hamcrest
The Hamcrest library you have been using with JUnit for unit testing comes with several matchers that assist in testing collections. They make it much easier to verify the result of tests that return things such as lists and sets. These matchers are all retrieved via static methods of org.hamcrest.Matchers
; here are just a few of the most useful of them:
contains(E... items)
- Checks whether an
Iterable<T>
containsexactly
the given items, in the order they are given. All items must be present. Useful for verifying the contents of a list of items. containsInAnyOrder(T... items)
- Checks whether an
Iterable<T>
containsexactly
the given items, in any order. All items must be present. Useful for verifying the contents of a set of items. empty()
- Checks whether a
Collection<E>
is empty. hasItem(T item)
- Checks whether an
Iterable<T>
contains at least the given item. hasItems(T... items)
- Checks whether an
Iterable<T>
contains at least the given items, in any order. All items must be present. hasSize(int size)
- Checks whether a
Collection<E>
is of the given size.
Review
Gotchas
- Don't directly modify a collection while you are iterating or the collection will throw a
java.util.ConcurrentModificationException
. If you need to remove an item during iteration, use the iterator'sIterator.remove()
method. - Don't forget that in the range of
List.subList(int fromIndex, int toIndex)
thetoIndex
parameter is exclusive, and indicates one index past the last element to include. - Don't assume that a
Set<E>
'sIterator<T>
will return the elements in any particular order. - Don't use
Matchers.hasItems(T... items)
if you want to check the exact contents of a collection. Instead useMatchers.contains(T... items)
to verify the exact contents of a list, andMatchers.containsInAnyOrder(T... items)
to verify the exact contents of a set.
In the Real World
- Use
Collection.isEmpty()
as often as possible rather than checking its size for zero, asisEmpty()
may be more efficient. - The
Vector<E>
class is old and outdated. It brings a thread-locking overhead that is not always necessary. Use one of the other JavaList<E>
implementations unless an old API requires an instance ofVector<E>
. - Make
ArrayList<E>
your default go-toList<E>
implementation. The insertion and deletion penalties are not as big might be imagined.
Think About It
- Choose the correct type of collection for the job:
- Do the items you are keeping track of have duplicates, or a required positional order? If so you need a
List<E>
. - Do you want to prevent duplicates, or quickly check to see if you've encountered an item before? If so use a
Set<E>
. - Do want to keep a sequence of items for processing in the same order as you add more? You may want a
Queue<E>
. - Do you want to process items in the reverse order you added them? You will find the stack-related methods of a
Deque<E>
useful.
- Do the items you are keeping track of have duplicates, or a required positional order? If so you need a
Self Evaluation
- What does it mean that collection iterators are
fail-fast
? - In what order does an iterator of a
LinkedHashSet<E>
return its contents? How does this differ from the iterator of a plainHashSet<E>
? - What is an object's
natural ordering
?
Task
Convert your booker
project to use equivalent Java collections rather than your linked list implementations from your datastruct
project.
- Once you are finished, your
booker
project should have no references to any linked lists from thedatastruct
project. - The only remaining
datastruct
dependency should be the use of your hash table implementation.
Put your main list of publications into a static immutable List<Publication>
. Your main list of publications should have already been converted from using an array to use a list.
- Make the list immutable.
See Also
- Trail: Collections (Oracle - The Java™ Tutorials)
- Collection Interfaces (Oracle - The Java™ Tutorials)
- When to use LinkedList over ArrayList? (Stack Overflow)
- Hamcrest Collections Cookbook (Baeldung)
References
Resources
Acknowledgments
- Some symbols are from Font Awesome by Dave Gandy.