Serialization
Goal
- Understand concepts, terms, and keywords of classic Java serialization.
- Be able to serialize and deserialize general instance graphs.
- Know how to use custom serialization techniques.
Concepts
- instance graph
- deserialize
- opaque
- serialization
- serialize
- singleton
Language
transient
Library
java.io.DataInput
java.io.DataInputStream
java.io.DataOutput
java.io.DataOutputStream
java.io.NotSerializableException
java.io.ObjectInputStream
java.io.ObjectInputStream.readObject()
java.io.ObjectOutputStream
java.io.ObjectOutputStream.defaultWriteObject()
java.io.ObjectOutputStream.writeObject(Object)
java.io.Serializable
com.globalmentor.io.Files.encodeCrossPlatformFilename(String filename)
Dependencies
Lesson
The term serialization in general refers to the process of converting an object to a series of bytes for storing so that you can reconstitute the object later. In Java there exists many ways to serialize objects, but many times the term in general refers to the system based on the java.io.Serializable
interface. This system uses the java.io.ObjectInputStream
and java.io.ObjectOutputStream
classes, and comes with a lot of related rules on how serialization occurs.
Serialization Streams
ObjectInputStream
and ObjectOutputStream
, which you briefly saw in the first lesson on I/O, are the main vehicles used to serialize and deserialize an object instance graph.
Using these streams for serialization is conceptually straightforward, using the ObjectInputStream.readObject()
and ObjectOutputStream.writeObject(Object)
methods, respectively.
Serializable
If reading and writing objects using object streams were all there was to serialization, things would be easy indeed. But there is much more to serialization. For starters, only classes that implement java.io.Serializable
can be serialized. This applies not only to the instance you are serializing, but to all the instances in the graph. Otherwise, a java.io.NotSerializableException
will be thrown.
serialVersionUID
Classes can change over the course of development, and even after you're released a version of your product. If you serialize one version of a class and try to deserialize it as another, the serialized data may not be compatible. To prevent reading incompatible data, Java generates a serialVersionUID
static variable for each serializable class. When deserializing an object, the JVM compares the stored serialVersionUID
to the version of the class to be instantiated. If they don't match, Java will throw an exception.
The problem is that almost any change in the class (even method signatures, for example) will cause Java to change the generated serialVersionUID
when the class is compiled. This could mean that you suddenly can't load data you saved earlier just because you tweaked the class. To prevent this Java allows you to maintain the serialVersionUID
yourself. Just declare it as static final long
and give it any value you want.
transient
By default Java will store all the members of a serializable class. There may be some variables that you don't want serialized; you can mark those with transient
, and they will be ignored and not stored in the output stream.
For example consider a Person
class that has givenName
and familyName
fields. It may contain a read-made constant named fullName
that keeps the precomposed full name around in case it is needed. There is no reason to serialize this data—it duplicates information in the other variables, and could be recalculated after deserialization—so we can mark it as transient
.
Custom Serialization
If you want to take complete control over how an object is serialized or deserialized, you can implement one of the following methods:
If you don't intend to completely replace the bytes used in serialization, the special methods ObjectOutputStream.defaultWriteObject()
and ObjectInputStream.defaultReadObject()
may be used to write or read the default version of the object (the bytes that serialization would have written or read by default). Here's how you would make sure the Person.fullName
variable gets updated upon deserialization if you have marked it as transient
:
Deserializing Alternate Objects
Java recognizes two other magic methods
that allow you to completely replace the object being read or written with one of your choosing:
One of the most common uses of readResolve()
is to accomplish deserialization of a singleton, a type for which you only want at most one instance. The default serialization mechanism would create a different instance of each object as it is deserialized, but with readResolve()
you can take over the process at the last minute and return the singleton instance instead.
If we were to implement a Farm
, we could make the Animal
interface serializable. We could then write and read as many instances of, for example, a Duck
as there exist ducks on our farm. But the Unicorn
is a special, magical best; there only exists one unicorn and is the same unicorn that appears on our farm and in fact on all the farms on the JVM. To automatically create a singleton Unicorn instance, we create a static final INSTANCE
constant. Whenever a Unicorn is read, instead of letting the serialization mechanism create a new instance we instead return the singleton instance inside readResolve()
.
Review
Gotchas
- Much of the rules for using
Serializable
are not checked by the compiler. The related methods are not even part of theSerializable
interface, and can only be found in the API documentation. Much of Java serialization relies on this sort ofbuilt-in
behavior of the JVM, not in actual language and API constructs. - If you decide to manage the
serialVersionUID
yourself, don't forget to update the value when the class changes. - If you make a member
transient
, don't forget to reconstitute the value after deserialization.
In the Real World
- Class Java serialization has many shortcomings and gotchas. Feel free to use it for value objects that might need to be saved; but for larger, more complex objects consider alternate storage frameworks.
Think About It
- Do you really need
Optional<T>
as a field? Is the containing objectSerializable
or do you plan on it being used with aSerializable
object? You will have to put extra work into custom serialization, so make the choice carefully.
Self Evaluation
- What is the role of a class constructor during deserialization?
- What is the
transient
keyword used for? - What is a common use case of the
readResolve()
method?
Task
Improve your FilePublicationRepository
implementation so that it actually saves and loads all the publications. There are several ways to approach this. We aren't so concerned about performance here, so we could forego any caching and deal directly with the file system. But we have several lookup methods that search for attributes besides the publication title, such as lookup by type. We therefore choose to load all the publications at the beginning and cache them.
- Save each publication in a file named
publication-name.pub.dat
, where publication-name is the name of the publication.- Some characters that appear in publication names may not be valid filename characters. Create a way to
clean up
the publication names, but make sure the process is symmetric in that it allows you to go backwards and determine the original publication name. For example, a book named"Either / Or"
would need the slash/
character in its title encoded somehow. You may wish to use GlobalMentor'sglobalmentor-core
library, listed in Dependencies above, which comes with thecom.globalmentor.io.Files.encodeCrossPlatformFilename(String filename)
method which performs this functionality fo ryou.
- Some characters that appear in publication names may not be valid filename characters. Create a way to
- In the
initialize()
method, load all the publications by deserializing them from their respective files.- To know which publications there are to load, you will need to list the directory contents, looking at only the
*.pub.dat
files.
- To know which publications there are to load, you will need to list the directory contents, looking at only the
- When a caller requests to retrieve a publication, you can search through those cached in memory (as is currently implemented).
- Add a method to
PublicationRepository
that allows a caller to add a publication, if you don't have one already.- Review the lesson on the repository pattern if needed.
- Specify in the contract that if a publication with the same name is added, it will replace the existing one.
- If a caller wants to store a new publication (or update an old one), you can update those in memory, but you will also need to update the version on disk by serializing it.
- Create unit tests of the individual methods that store and retrieve files. You should design these methods in a way that they can read and write given merely a file path without assumptions about the repository.
- In your main
Booker
application class, instead of creating and storing aSnapshotPublicationRepository
switch to creating and using aFilePublicationRepository
, storing files in a.booker/publication-files
directory in the user's home directory.- If you have been programming to interfaces, only a single line of your code should change involving the creation of the specific
PublicationRepository
implementation. - Pass the actual
.booker/publication-files
path to theFilePublicationRepository
during construction. - Don't hard-code the actual path—instead, resolve
publication-files
to Booker's configuration directory (which should default to.booker
).
- If you have been programming to interfaces, only a single line of your code should change involving the creation of the specific
Add a new command load-snapshot
to the command-line interface of Booker, which will copy all the snapshot list of publications into the current repository. This is easily accomplished by making a utility method that iterates over the publications in a one PublicationRepository
and adds them to another. This would be a good place to show off your mastery of streams and lambda expressions.
booker list [--name <name>] [--type (book|periodical)]
booker load-snapshot
booker -h | --help
Option | Alias | Description |
---|---|---|
list | Lists all available publications. | |
load-snapshot | Loads the snapshot list of publications into the current repository. | |
--help | -h | Prints out a help summary of available switches. |
--name | -n | Indicates a filter by name for the list command. |
--type | -t | Indicates the type of publication to list, either book or periodical. If not present, all publications will be listed. |
See Also
- Discover the secrets of the Java Serialization API (Todd Greanier, Oracle Technology Network Articles)
- 5 things you didn't know about ... Java Object Serialization (IBM developerWorks® Technical library)
- Serialization and magic methods (Olivier Croisier, The Coders Breakfast)
References
- Java Object Serialization Specification (Java Platform Standard Edition 8 Documentation, Oracle)
- The Java® Language Specification, Java SE 8 Edition: 8.3.1.3.
transient
Fields (Oracle) - Java Examples in a Nutshell, 3rd Edition: Chapter 10. Object Serialization (O'Reilly)
Acknowledgments
- Some symbols are from Font Awesome by Dave Gandy.