Monday, May 23, 2011

Externalizable and Serializable in Java

One obvious difference that Serializable is a marker interface and doesn't contain any methods whereas Externalizable interface contains two methods: writeExternal(ObjectOutput) and readExternal(ObjectInput). But, the main difference between the two is that Externalizable interface provides complete control to the class implementing the interface over the object serialization process whereas Serializable interface normally uses default implementation to handle the object serialization process.

While implementing Serializable, you are not forced to define any method as it's a marker interface. However, you can use the writeObject or readObject methods to handle the serilaization process of complex objects. But, while implementing Externalizable interface, you are bound to define the two methods: writeExternal and readExternal and all the object serialization process is solely handled by these two methods only.

In case of Serializable interface implementation, state of Superclasses are automatically taken care by the default implementation whereas in case of Externalizable interface the implementing class needs to handle everything on its own as there is no default implementation in this case.

Example Scenario: when to use what?

If everything is automatically taken care by implementing the Serializable interface, why would anyone like to implement the Externalizable interface and bother to define the two methods? Simply to have the complete control on the process. OKay... let's take a sample example to understand this. Suppose we have an object having hundreds of fields (non-transient) and we want only few fields to be stored on the persistent storage and not all. One solution would be to declare all other fields (except those which we want to serialize) as transient and the default Serialization process will automatically take care of that. But, what if those few fields are not fixed at design tiime instead they are conditionally decided at runtime. In such a situation, implementing Externalizable interface will probably be a better solution. Similarly, there may be scenarios where we simply don't want to maintain the state of the Superclasses (which are automatically maintained by the Serializable interface implementation).

Which has better performance - Externalizable or Serializale?

In most of the cases (or in all if implemented correctly), Externalizable would be more efficient than Serializable for the simple reason that in case of Externalizable the entire process of marshalling, un-marshalling, writing to the stream, and reading back from stream, etc. is under your control i.e., you got to write the code and you can of course choose the best way depending upon the situaton you are in. In case of Serializable, this all (or at least most of it) is done implicitly and the internal implementation being generic to support any possible case, can ofcourse not be the most efficient. The other reason for Serializable to be less efficient is that in this case several reflective calls are made internally to get the metadata of the class. Of course, you would not need any such call is needed in case Externalizable.

However, the efficiency comes at a price. You lose flexibility because as soon as your class definition changes, you would probably need to modify your Externaliable implementation as well. Additionally, since you got to write more code in case Externalizable, you increase the chances of adding more bugs in your application.

Another disadvantage of Externalizable is that you got to have the class to interpret the stream as the stream format is an opaque binary data. Normal Serialization adds field names and types (this why reflective calls are needed here) into the stream, so it's possible to re-construct the object even without the availability of the object's class. But, you need to write the object reconstruction code yourself as Java Serialization doesn't provide any such API at the moment. The point is that in case of Serialzable you can at least write your code as the stream is enriched with field names and types whereas in case Externalizable the stream contains just the data and hence you can't unless you use the class definition. As you can see Serializable not only makes many reflective calls, but also puts the name/type info into the stream and this would of course take some time making Serialzable slower than the corresponding Externalizable process where you got to stuff only the data into the stream.

No comments: