I/O
Goals
- Learn about file systems and path representations.
- Represent paths using Java interfaces.
- Access files.
- Work with byte-oriented input and output streams.
Concepts
- absolute path
- buffer
- directory
- Faux Pas
- file
- file system
- flush
- input/output (I/O)
- input stream
- mark
- output stream
- parent directory
- path
- relative path
- reset
- root directory
- try-with-resources
Language
try
(with resources)
Library
java.io
java.io.BufferedInputStream
java.io.BufferedOutputStream
java.io.ByteArrayInputStream
java.io.ByteArrayOutputStream
java.io.ByteArrayOutputStream.toByteArray()
java.io.Closeable
java.io.Closeable.close()
java.io.DataInputStream
java.io.DataOutputStream
java.io.File
java.io.File.toPath()
java.io.FileInputStream
java.io.Files.copy(Path source, Path target, CopyOption... options)
java.io.Files.createDirectories(Path dir, FileAttribute<?>... attrs)
java.io.Files.createDirectory(Path dir, FileAttribute<?>... attrs)
java.io.Files.exists(Path path, LinkOption... options)
java.io.Files.isDirectory(Path path, LinkOption... options)
java.io.Files.list(Path dir)
java.io.Files.newInputStream(Path path, OpenOption... options)
java.io.Files.newOutputStream(Path path, OpenOption... options)
java.io.Files.readAllBytes(Path path)
java.io.Files.write(Path path, byte[] bytes, OpenOption... options)
java.io.FilterInputStream
java.io.FilterOutputStream
java.io.Flushable
java.io.Flushable.flush()
java.io.IOException
java.io.InputStream
java.io.Inputstream.mark(int readlimit)
java.io.InputStream.markSupported()
java.io.InputStream.read()
java.io.InputStream.read(byte[] b)
java.io.InputStream.reset()
java.io.ObjectInputStream
java.io.ObjectOutputStream
java.io.OutputStream
java.io.OutputStream.write(int b)
java.io.OutputStream.write(byte[] b)
java.io.OutputStream.write(byte[] b, int off, int len)
java.io.PrintStream
java.io.UncheckedIOException
java.io.UncheckedException.getCause()
java.lang.AutoCloseable
java.lang.Byte.toUnsignedInt(byte)
java.nio
java.nio.FileSystem
java.nio.FileSystem.getPath(String first, String... more)
java.nio.FileSystem.getRootDirectories()
java.nio.FileSystem.getSeparator()
java.nio.FileSystems
java.nio.FileSystems.getDefault()
java.nio.file.Files
java.nio.file.Files.exists(Path path, LinkOption... options)
java.nio.file.Files.isDirectory(Path path, LinkOptions... options)
java.nio.file.Path
java.nio.file.Path.getFileName()
java.nio.file.Path.isAbsolute()
java.nio.file.Path.getNameCount()
java.nio.file.Path.getRoot()
java.nio.file.Path.relativize(Path other)
java.nio.file.Path.resolve(Path other)
java.nio.file.Path.resolve(String other)
java.nio.file.Path.toFile()
java.nio.file.Paths
java.nio.file.Paths.get(String first, String... more)
java.nio.file.StandardOpenOption.CREATE
java.nio.file.StandardOpenOption.TRUNCATE_EXISTING
Lesson
The Evolution of Java I/O
The JDK has been released with various input/output (I/O) libraries over the years. Some aspects of newer libraries replaced older ones. Other aspects continue to coexist with old classes. Here is a quick overview of the evolution of Java I/O over the years, just to get a feel for what has come before and to recognize some of the terminology when you see it.
- IO - The original classes in the
java.io
package concentrated on traditional I/O streams and file random-access classes. Many of these classes can still be used, although newer libraries bring alternatives for special applications. Thejava.io.IOException
exception class still pervades I/O code, but the thejava.io.File
class which was central to this package should not be abandoned for new code. - NIO - Java introduced the
java.nio
package (new I/O
), which included new whiz-bang concepts such asjava.nio.Buffer
andchannels
. These will be discussed in upcoming lessons. - NIO.2 - Java added some new approaches to asynchronous I/O, but most significantly across the board was the introduction of the
java.nio.file.Path
interface (to replacejava.io.File
), along with many other classes in thejava.nio.file
package. Note that Java did not create a newjava.nio2
package for these additions; instead the new classes are scattered across packages.
Exceptions
java.io.IOException
- A checked exception traditionally used as a general indication of I/O error.
java.io.UncheckedIOException
- An unchecked exception representing an I/O error. This class was recently added. Especially useful for code with lambda expressions, as many functional interfaces do not allow for checked exceptions.
File Systems
Computers store persisted information in files on some file system. Different file systems have different aspects such as security, attributes, and case sensitivity. Examples include NTFS (primarily Windows) and ext4 (primarily Linux).
Java represents information about a file system using the java.nio.FileSystem
class. To get the default file system use the helper class java.nio.FileSystems
method FileSystems.getDefault()
.
The files on a file system are usually divided into plain files, and directories, which are used to hierarchically groups. A file (or directory) is identified by its path. A path can be an absolute path if it indicates the complete path (from the outermost or root directory) necessary to locate a file; or a relative path if it only indicates the portion of the path necessary to locate the file from some other directory.
Paths are separated into parts by a separator character; on Linux and related systems this is the forward slash /
character; on Windows it is the backslash \
character. An earlier part in the path indicates the parent directory of the directory or file later in the path.The special directory
names .
and ..
refer to the current directory and parent directory, respectively.
Path | Relative/Absolute | Description |
---|---|---|
. | relative | Current directory. |
.. | relative | Parent directory. |
foobar.txt | relative | |
./foobar.txt | relative | Same file as above. |
foo/example.txt | relative | |
../bar/example.txt | relative | |
C:\foo\bar.txt | absolute | File system on Windows OS. |
/etc/foo/bar.txt | absolute | File system on Linux OS. |
Path
Java provides a versatile interface for identifying files and directories: the java.nio.file.Path
class. You can get a Path
instance by asking the FileSystem
for it using FileSystem.getPath(…)
. Rather than calling FileSystems.getDefault().getPath(…)
, you can use the java.nio.file.Paths
utility class using the method Paths.get(…)
.
Path
methods, using directory /foo/bar/
as an example.Path Method | Description | Paths.get("/foo/bar/") | Returns |
---|---|---|---|
Path.getRoot() | The root of the path. | .getRoot() | / |
Path.getFileName() | The name of the file or directory | .getFileName() | bar |
Path.getNameCount() | The number of name elements in the path. | .getNameCount() | 3 |
Path.isAbsolute() | Whether the path is absolute. | .isAbsolute() | true |
Path.relativize(Path other) | Determines the relative path from this path. | .relativize("/foo/bar/some/example.txt") | some/example.txt |
Path.resolve(Path other) | Combines this path with a relative path. | .resolve("some/example.txt") | /foo/bar/some/example.txt |
Path.resolve(String other) |
Files
For actually working with files on a disk, you can use the utilities in the java.nio.file.Files
class. This class contains a wealth of methods, including methods for checking whether a file is readable or writable.
Files.createDirectories(Path dir, FileAttribute<?>... attrs)
- Creates a hierarchy of directories if they do not exist. No error is generated if one or more of the directories already exist.
Files.createDirectory(Path dir, FileAttribute<?>... attrs)
- Creates a single new directory. An error may be given if the directory already exists.
Files.exists(Path path, LinkOption... options)
- Checks to see if a file exists at the path.
Files.isDirectory(Path path, LinkOption... options)
- Determines whether a path represents a directory.
Files.list(Path dir)
- Returns a
Stream<Path>
listing all the paths in a directory. Using stream filtering and processing operations, you can easily return a list of only files with certain filenames for example. The stream returned by this method must be closed, or you will leak resources which could eventually crash your application.
Byte Streams
The most fundamental approach to processing I/O in Java relies on specialized classes that allow programs to process information a byte at a time. An input stream allows a program to read a stream of bits from a data source as bytes. An output stream allows a program to write a stream of bits to a data source, one or more bytes at a time.
Input Streams
The following input stream classes are all in the java.io
package.
InputStream
- Abstract class that forms the basis of all input streams.
BufferedInputStream
- Provides buffering of other input streams.
ByteArrayInputStream
- An input stream to an existing array of bytes.
FileInputStream
- An input stream to a file. This class uses the old
java.io.File
class and should only be used with legacy code. FilterInputStream
- A simple input stream wrapper allowing subclasses to do more processing on data after reading.
DataInputStream
- Provides methods to read primitive Java types in a consistent way across platforms.
ObjectInputStream
- An input stream that allows deserialization of Java objects and their instance graphs.
Output Streams
The following output stream classes are all in the java.io
package.
OutputStream
- Abstract class that forms the basis of all output streams.
BufferedOutputStream
- Provides buffering of other output streams.
ByteArrayOutputStream
- An output stream to a dynamically managed internal array of bytes. The collected data can later be retrieved using ByteArrayOutputStream.toByteArray().
FileOutputStream
- An output stream to a file. This class uses the old
java.io.File
class and should only be used with legacy code. FilterOutputStream
- A simple output stream wrapper allowing subclasses to do more processing on data before writing.
DataOutputStream
- Provides methods to write primitive Java types in a consistent way across platforms.
ObjectOutputStream
- An output stream that allows serialization of Java objects and their instance graphs.
PrintStream
- An output stream that helps write certain data using methods such as
println()
. This class does not correctly encode character and strings across platforms; it should not be used unless you have no other option.
Reading Single Bytes
The abstract class java.io.InputStream
forms the basis of all byte stream-based input. Its main method is InputStream.read()
, which returns eight bits of information (a byte)—but the byte is returned as an int
! This is because the special int
value -1
is used to indicate that no further bytes are available to be read (the end of the stream has been reached). If a byte
value were used, there would be no way to distinguish between a value -1
indicating the end of the stream, and the byte
(which is signed) value -1
representing 0b11111111
.
The following example shows how to read from an input stream consisting of an existing array of bytes using java.io.ByteArrayInputStream
. The first half of the example merely creates a sequence of bytes to serve as the data to read.
Try-with-Resources
You already know how to use try … finally …
to ensure that you close a Closeable
resource in the finally {…}
clause. Java offers a further enhancement of the try
statement: if a class implements java.lang.AutoCloseable
(and the Closeable
interface extends AutoCloseable
, so all input and output streams are candidates), it can be used in a try-with-resources statement. Simply declare and assign the AutoCloseable
resource in parenthesis after the try
keyword. Java will automatically add, in the compiled code, the equivalent of a finally
clause that calls close()
on the resource, whether or not the try
clause throws an exception. Here is how the above try … finally
statement would be rewritten to use try-with-resources:
Mark and Reset
There may be times you are reading from a stream and decide, oops, I wish I could unread some information, and go back to start reading at some earlier location
. The InputStream
class has a facility for placing a marker
at location to later go back to.
- At any time when reading from an input stream, you can call
Inputstream.mark(int readlimit)
to request the input stream to mark the current location. Thereadlimit
value indicate the maximum number of bytes you might read before wanting to go back to the mark. - If you later call
InputStream.reset()
you will reset the stream to the marked location, and the next bytes read will be those directly after the marked location—even if you've already read those bytes earlier.
The mark/reset facility therefore provides a way for the input stream to somehow remember
any bytes (up to the readlimit
you provided) you read after the mark and somehow effectively put them back into the input stream to be read again.
Writing Single Bytes
The complement to InputStream
is the java.io.OutputStream
. An output stream allows writing of single bytes using OutputStream.write(int b)
. But moving data between streams using a byte at a time is inefficient; there are much more efficient ways to move data between stream, as explained in the following sections.
Reading and Writing Multiple Bytes
Many times you will want to read and writer larger sections of data by transferring it to and from a buffer, an area of memory designated for transferring the data. InputStream
provides an InputStream.read(byte[] b)
method that reads bytes into an existing byte array buffer. There always exist the possibility that, for whatever reason, fewer bytes (even 0
!) might be read; this method therefore returns an int
indicating the number of bytes read. If the method returns -1
, it indicates that the end of the stream has been reached.
We can use such a buffer to copy between two streams. OutputStream
provides a corresponding OutputStream.write(byte[] b)
, but this method assumes that the entire buffer is full and that all the bytes shoudl be written. Because the read operation may not have filled the buffer, we must take care to only write the number of bytes that were read each time around. This can be done using the OutputStream.write(byte[] b, int off, int len)
, which allows the starting offset (in this case 0
) and a length (in this case the number of bytes read), the number of bytes to read.
In this example we copy everything from the input stream to a java.io.ByteArrayOutputStream
which collects all the bytes, which we then print out, using the ByteArrayOutputStream.toByteArray()
method.
File Streams
You can get an input stream for reading from a file, or an output stream for writing to a file, by using the Files.newInputStream(Path path, OpenOption... options)
or the Files.newOutputStream(Path path, OpenOption... options)
method, respectively. Here's an example of printing out all the bytes in a /etc/foo/bar.txt
file.
Buffered Streams
Java provides java.io.BufferedInputStream
and java.io.BufferedOutputStream
for converting any input or output stream to a buffered version. These classes make working with relatively slow connections more efficient, because they will read or write blocks of data to an internal buffer in memory. You can still read and write the data a byte at a time, but you will be accessing an internal buffer which is much quicker that reading or writing data a byte at a time with e.g. a hard drive. These classes will transfer the data in blocks to the ultimate destination when needed. Because these classes use the decorator pattern, you can simply wrap an existing stream on the fly. There is no need to close the underlying stream; closing the wrapper stream will close the decorated stream as well.
Review
Summary
- Use
java.nio.file.Path
to identify files and directories. - Use
java.nio.file.Files
to manipulate the actual file aPath
refers to.
Gotchas
- If you are typing a string containing a Windows path, don't forget to escape the backslash character!
C:\foo\bar
in Java must be entered as"C:\\foo\\bar"
for Java to correctly understand the string. - The
FileSystem.getPathSeparator()
method is used to separate more than one path; if you want the separator for separating directory parts in the same path, useFileSystem.getSeparator()
. - The
PrintStream
methods for printing characters and strings won't correctly encode all characters. - You must close close the
Stream<Path>
returned fromFiles.list(…)
(using try-with-resources if you like) or Java will leave files open on the file system and eventually be unable to perform further file operations. - The
InputStream.read(byte[] b)
and related method may not fill the given buffer—and in fact may not read any bytes at all, even if the end of the stream is not reached! Be sure and check the returned value to find out how many bytes were read, if any. - If your method is passed an open stream, don't close it; this is the caller's responsibility.
- If your method wraps
OutputStream
given by the caller in anBufferedOutputStream
, you must flush the output before returning, as the caller will have no access to the buffered data.
In the Real World
- Most code you come across will use
File.separator
andFile.separatorChar
to indicate the file system's separator character for building paths. To get out of the habit of using thejava.io.File
class, you can useFileSystems.getDefault().getSeparator()
instead. Try not to build paths manually anyway; use the appropriatePath
constructor or utility method. - Don't use
PrintStream
unless you can help it; there are better approaches for reading and writing strings, as you will learn in an upcoming lesson. - If you need to support mark/reset but a given
InputStream
doesn't support mark/reset, find anInputStream
that does, such asBufferedInputStream
, and wrap the given anInputStream
with that. - Copying data using buffers is much more efficient than copying data a byte at a time.
Self Evaluation
- If you read a buffer of bytes from an input stream, and then try to print out each byte individually with no conversation, why might some of them be printed as negative numbers? Does this matter?
- When would you want to use a
BufferedInputStream
or aBufferedOutputStream
? When would you not want to use these classes?
Task
You are going to create a repository implementation that uses the file system for as its data store, storing publications in individual files. You have not yet learned how to store the individual publications, but prepare for this eventuality by implementing the basic FilePublicationRepository
class structure.
- The repository will be given a
Path
during its creation that indicates the directory in which it should store information. - The repository will ensure that the directory exists and is indeed a directory for storing publications using the
FilePublicationRepository
.- The validation of the directory will be done in an
initialize()
method of the repository. It is not a good idea to do I/O operations in the constructor, as they may throw aIOException
which is messy to deal with at this point. - When
initialize()
is called, verify that the repository directory exists. If the directory does not exist, create it. - That the directory is appropriate for a file repository will be determined by the presence of a
signature.dat
file. The file will contain the following bytes:2, 3, 4, 234, 0, 0, 0, 234
. These bytes have no meaning other than to verify the purpose of the directory. You may wish to use the online Hex-works tool, indicated in the Resources section, for producing this file if you have no hex editor. - When
initialize()
is called, if the signature file does not exist in the repository directory, create it. If the signature file does exist, verify that it contains exactly the correct bytes; if not, throw anIOException
. - As part of the implementation of signature checking, create a separate method that takes an
InputStream
and validates that it contains the signature bytes. You will be able to create a unit test for this method and pass in test sequences using aByteArrayInputStream
. Your main signature checking method based upon a file path will delegate to theInputStream
version. - To provide consistency across implementations, you therefore need to put the
initialize()
method in thePublicationRepository
interface, and make clear in its contract thatinitialize()
must be called for each repository implementation after creation. You may want to give this method a default implementation that does nothing to make it easier for implementation that do not need initialization. For completeness you should callinitialize()
for yourSnapshotPublicationRepository
as well, even though in this implementation this method does nothing. - In order to create a robust API, you will want to specify a contract and implementation that detects if
PublicationRepository.initialize()
is never called before a repository is used, and which throws aIllegalStateException
if called more than once for a repository.
- The validation of the directory will be done in an
- Do not actually hook up your
FilePublicationRepository
to the Booker application, yet; continue using the snapshot repository for now. - In your main
Booker
application's constructor, create and store a path to a.booker
directory in the user's home directory. This will be the main directory Booker users to store its configuration and data. - When the
Booker
starts running, check to make sure the.booker
directory exists; if not, create it.
See Also
- Basic I/O (Oracle - The Java™ Tutorials)
- The try-with-resources Statement (Oracle - The Java™ Tutorials)
References
- Java I/O, NIO, and NIO.2 (Oracle - Tech Notes: Guides)
- Legacy File I/O Code (Oracle - The Java™ Tutorials)
- The Java® Language Specification, Java SE 8 Edition: 14.20.3. try-with-resources (Oracle)
Resources
- Hex-works
- Online hex editor tool.
- HxD
- Free hex editor and disk editor. (Windows)
- Hex Editor Neo
- Free hex editor optimized for large files. (Windows)
Acknowledgments
- Some symbols are from Font Awesome by Dave Gandy.