Monday, May 23, 2011

differences between SAX and DOM parsers

SAX Parser:

· A SAX (Simple API for XML) parser does not create any internal structure. Instead, it takes the occurrences of components of an input document as events, and tells the client what it reads as it reads through the input document.

· A SAX parser serves the client application always only with pieces of the document at any given time.

· A SAX parser, however, is much more space efficient in case of a big input document (because it creates no internal structure). What’s more, it runs faster and is easier to learn than DOM parser because its API is really simple. But from the functionality point of view, it provides a fewer functions, which means that the users themselves have to take care of more, such as creating their own data structures.

DOM Parser:

· A DOM (Document Object Model) parser creates a tree structure in memory from an input document and then waits for requests from client.

· A DOM parser always serves the client application with the entire document no matter how much is actually needed by the client.

· A DOM parser is rich in functionality. It creates a DOM tree in memory and allows you to access any part of the document repeatedly and allows you to modify the DOM tree. But it is space inefficient when the document is huge, and it takes a little bit longer to learn how to work with it.

Externalizable and Serializable in Java

One obvious difference that Serializable is a marker interface and doesn't contain any methods whereas Externalizable interface contains two methods: writeExternal(ObjectOutput) and readExternal(ObjectInput). But, the main difference between the two is that Externalizable interface provides complete control to the class implementing the interface over the object serialization process whereas Serializable interface normally uses default implementation to handle the object serialization process.

While implementing Serializable, you are not forced to define any method as it's a marker interface. However, you can use the writeObject or readObject methods to handle the serilaization process of complex objects. But, while implementing Externalizable interface, you are bound to define the two methods: writeExternal and readExternal and all the object serialization process is solely handled by these two methods only.

In case of Serializable interface implementation, state of Superclasses are automatically taken care by the default implementation whereas in case of Externalizable interface the implementing class needs to handle everything on its own as there is no default implementation in this case.

Example Scenario: when to use what?

If everything is automatically taken care by implementing the Serializable interface, why would anyone like to implement the Externalizable interface and bother to define the two methods? Simply to have the complete control on the process. OKay... let's take a sample example to understand this. Suppose we have an object having hundreds of fields (non-transient) and we want only few fields to be stored on the persistent storage and not all. One solution would be to declare all other fields (except those which we want to serialize) as transient and the default Serialization process will automatically take care of that. But, what if those few fields are not fixed at design tiime instead they are conditionally decided at runtime. In such a situation, implementing Externalizable interface will probably be a better solution. Similarly, there may be scenarios where we simply don't want to maintain the state of the Superclasses (which are automatically maintained by the Serializable interface implementation).

Which has better performance - Externalizable or Serializale?

In most of the cases (or in all if implemented correctly), Externalizable would be more efficient than Serializable for the simple reason that in case of Externalizable the entire process of marshalling, un-marshalling, writing to the stream, and reading back from stream, etc. is under your control i.e., you got to write the code and you can of course choose the best way depending upon the situaton you are in. In case of Serializable, this all (or at least most of it) is done implicitly and the internal implementation being generic to support any possible case, can ofcourse not be the most efficient. The other reason for Serializable to be less efficient is that in this case several reflective calls are made internally to get the metadata of the class. Of course, you would not need any such call is needed in case Externalizable.

However, the efficiency comes at a price. You lose flexibility because as soon as your class definition changes, you would probably need to modify your Externaliable implementation as well. Additionally, since you got to write more code in case Externalizable, you increase the chances of adding more bugs in your application.

Another disadvantage of Externalizable is that you got to have the class to interpret the stream as the stream format is an opaque binary data. Normal Serialization adds field names and types (this why reflective calls are needed here) into the stream, so it's possible to re-construct the object even without the availability of the object's class. But, you need to write the object reconstruction code yourself as Java Serialization doesn't provide any such API at the moment. The point is that in case of Serialzable you can at least write your code as the stream is enriched with field names and types whereas in case Externalizable the stream contains just the data and hence you can't unless you use the class definition. As you can see Serializable not only makes many reflective calls, but also puts the name/type info into the stream and this would of course take some time making Serialzable slower than the corresponding Externalizable process where you got to stuff only the data into the stream.

Serialization

What is Serialization?

- Serializable is a marker interface. When an object has to be transferred over a network ( typically through rmi or EJB) or persist the state of an object to a file, the object Class needs to implement Serializable interface. Implementing this interface will allow the object converted into bytestream and transfer over a network.

What is use of serialVersionUID?

- During object serialization, the default Java serialization mechanism writes the metadata about the object, which includes the class name, field names and types, and superclass. This class definition is stored as a part of the serialized object. This stored metadata enables the deserialization process to reconstitute the objects and map the stream data into the class attributes with the appropriate type
Everytime an object is serialized the java serialization mechanism automatically computes a hash value. ObjectStreamClass's computeSerialVersionUID() method passes the class name, sorted member names, modifiers, and interfaces to the secure hash algorithm (SHA), which returns a hash value.The serialVersionUID is also called suid.
So when the serilaize object is retrieved , the JVM first evaluates the suid of the serialized class and compares the suid value with the one of the object. If the suid values match then the object is said to be compatible with the class and hence it is de-serialized. If not InvalidClassException exception is thrown.

Changes to a serializable class can be compatible or incompatible. Following is the list of changes which are compatible:

Add fields
Change a field from static to non-static
Change a field from transient to non-transient
Add classes to the object tree
List of incompatible changes:

Delete fields
Change class hierarchy
Change non-static to static
Change non-transient to transient
Change type of a primitive field
So, if no suid is present , inspite of making compatible changes, jvm generates new suid thus resulting in an exception if prior release version object is used .
The only way to get rid of the exception is to recompile and deploy the application again.

If we explicitly metion the suid using the statement:

private final static long serialVersionUID =

then if any of the metioned compatible changes are made the class need not to be recompiled. But for incompatible changes there is no other way than to compile again.

What is the need of Serialization?

- The serialization is used :-

To send state of one or more object’s state over the network through a socket.
To save the state of an object in a file.
An object’s state needs to be manipulated as a stream of bytes.


Other than Serialization what are the different approach to make object Serializable?

- Besides the Serializable interface, at least three alternate approaches can serialize Java objects:

1)For object serialization, instead of implementing the Serializable interface, a developer can implement the Externalizable interface, which extends Serializable. By implementing Externalizable, a developer is responsible for implementing the writeExternal() and readExternal() methods. As a result, a developer has sole control over reading and writing the serialized objects.
2)XML serialization is an often-used approach for data interchange. This approach lags runtime performance when compared with Java serialization, both in terms of the size of the object and the processing time. With a speedier XML parser, the performance gap with respect to the processing time narrows. Nonetheless, XML serialization provides a more malleable solution when faced with changes in the serializable object.
3)Finally, consider a "roll-your-own" serialization approach. You can write an object's content directly via either the ObjectOutputStream or the DataOutputStream. While this approach is more involved in its initial implementation, it offers the greatest flexibility and extensibility. In addition, this approach provides a performance advantage over Java serialization.

Do we need to implement any method of Serializable interface to make an object serializable?

- No. Serializable is a Marker Interface. It does not have any methods.

What happens if the object to be serialized includes the references to other serializable objects?

- If the object to be serialized includes the references to other objects whose class implements serializable then all those object’s state also will be saved as the part of the serialized state of the object in question. The whole object graph of the object to be serialized will be saved during serialization automatically provided all the objects included in the object’s graph are serializable.

What happens if an object is serializable but it includes a reference to a non-serializable object?

- If you try to serialize an object of a class which implements serializable, but the object includes a reference to an non-serializable class then a ‘NotSerializableException’ will be thrown at runtime.


e.g.
public class NonSerial {
//This is a non-serializable class
}

public class MyClass implements Serializable{
private static final long serialVersionUID = 1L;
private NonSerial nonSerial;
MyClass(NonSerial nonSerial){
this.nonSerial = nonSerial;
}
public static void main(String [] args) {
NonSerial nonSer = new NonSerial();
MyClass c = new MyClass(nonSer);
try {
FileOutputStream fs = new FileOutputStream("test1.ser");
ObjectOutputStream os = new ObjectOutputStream(fs);
os.writeObject(c);
os.close();
} catch (Exception e) { e.printStackTrace(); }
try {
FileInputStream fis = new FileInputStream("test1.ser");
ObjectInputStream ois = new ObjectInputStream(fis);
c = (MyClass) ois.readObject();
ois.close();
}
tch (Exception e) {
e.printStackTrace();
}

}
}

On execution of above code following exception will be thrown –
java.io.NotSerializableException: NonSerial
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java)

Are the static variables saved as the part of serialization?

- No. The static variables belong to the class and not to an object they are not the part of the state of the object so they are not saved as the part of serialized object.

What is a transient variable?

- These variables are not included in the process of serialization and are not the part of the object’s serialized state.

What will be the value of transient variable after de-serialization?

- It’s default value.
e.g. if the transient variable in question is an int, it’s value after deserialization will be zero.

public class TestTransientVal implements Serializable{

private static final long serialVersionUID = -22L;
private String name;
transient private int age;
TestTransientVal(int age, String name) {
this.age = age;
this.name = name;
}

public static void main(String [] args) {
TestTransientVal c = new TestTransientVal(1,"ONE");
System.out.println("Before serialization: - " + c.name + " "+ c.age);
try {
FileOutputStream fs = new FileOutputStream("testTransients.ser");
ObjectOutputStream os = new ObjectOutputStream(fs);
os.writeObject(c);
os.close();
} catch (Exception e) { e.printStackTrace(); }


try {
FileInputStream fis = new FileInputStream("testTransients.ser");
ObjectInputStream ois = new ObjectInputStream(fis);
c = (TestTransientVal) ois.readObject();
ois.close();
} catch (Exception e) { e.printStackTrace(); }
System.out.println("After de-serialization:- " + c.name + " "+ c.age);
}

}


Result of executing above piece of code –
Before serialization: - Value of non-transient variable ONE Value of transient variable 1
After de-serialization:- Value of non-transient variable ONE Value of transient variable 0

Explanation –
The transient variable is not saved as the part of the state of the serailized variable, it’s value after de-serialization is it’s default value.

Does the order in which the value of the transient variables and the state of the object using the defaultWriteObject() method are saved during serialization matter?

- Yes. As while restoring the object’s state the transient variables and the serializable variables that are stored must be restored in the same order in which they were saved.

How can one customize the Serialization process? or What is the purpose of implementing the writeObject() and readObject() method?

- When you want to store the transient variables state as a part of the serialized object at the time of serialization the class must implement the following methods –

private void wrtiteObject(ObjectOutputStream outStream)
{
//code to save the transient variables state as a part of serialized object
}
private void readObject(ObjectInputStream inStream)
{
//code to read the transient variables state and assign it to the de-serialized object
}

e.g.

public class TestCustomizedSerialization implements Serializable{

private static final long serialVersionUID =-22L;
private String noOfSerVar;
transient private int noOfTranVar;

TestCustomizedSerialization(int noOfTranVar, String noOfSerVar) {
this.noOfTranVar = noOfTranVar;
this.noOfSerVar = noOfSerVar;
}

private void writeObject(ObjectOutputStream os) {

try {
os.defaultWriteObject();
os.writeInt(noOfTranVar);
} catch (Exception e) { e.printStackTrace(); }
}

private void readObject(ObjectInputStream is) {
try {
is.defaultReadObject();
int noOfTransients = (is.readInt());
} catch (Exception e) {
e.printStackTrace(); }
}

public int getNoOfTranVar() {
return noOfTranVar;
}

}

The value of transient variable ‘noOfTranVar’ is saved as part of the serialized object manually by implementing writeObject() and restored by implementing readObject().
The normal serializable variables are saved and restored by calling defaultWriteObject() and defaultReadObject()respectively. These methods perform the normal serialization and de-sirialization process for the object to be saved or restored respectively.

If a class is serializable but its superclass in not , what will be the state of the instance variables inherited from super class after deserialization?

- The values of the instance variables inherited from superclass will be reset to the values they were given during the original construction of the object as the non-serializable super-class constructor will run.

E.g.
public class ParentNonSerializable {
int noOfWheels;

ParentNonSerializable(){
this.noOfWheels = 4;
}

}


public class ChildSerializable extends ParentNonSerializable implements Serializable {

private static final long serialVersionUID = 1L;
String color;


ChildSerializable() {
this.noOfWheels = 8;
this.color = "blue";
}
}


public class SubSerialSuperNotSerial {

public static void main(String [] args) {

ChildSerializable c = new ChildSerializable();
System.out.println("Before : - " + c.noOfWheels + " "+ c.color);
try {
FileOutputStream fs = new FileOutputStream("superNotSerail.ser");
ObjectOutputStream os = new ObjectOutputStream(fs);
os.writeObject(c);
os.close();
} catch (Exception e) { e.printStackTrace(); }

try {
FileInputStream fis = new FileInputStream("superNotSerail.ser");
ObjectInputStream ois = new ObjectInputStream(fis);
c = (ChildSerializable) ois.readObject();
ois.close();
} catch (Exception e) { e.printStackTrace(); }
System.out.println("After :- " + c.noOfWheels + " "+ c.color);
}

}

Result on executing above code –
Before : - 8 blue
After :- 4 blue

The instance variable ‘noOfWheels’ is inherited from superclass which is not serializable. Therefore while restoring it the non-serializable superclass constructor runs and its value is set to 8 and is not same as the value saved during serialization which is 4.

Immutable Class

What is an immutable class?

- Immutable class is a class which once created, it’s contents can not be changed. Immutable objects are the objects whose state can not be changed once constructed. e.g. String class

How to create an immutable class?

- To create an immutable class following steps should be followed:

Create a final class.
Set the values of properties using constructor only.
Make the properties of the class final and private
Do not provide any setters for these properties.
If the instance fields include references to mutable objects, don't allow those objects to be changed:
Don't provide methods that modify the mutable objects.
Don't share references to the mutable objects. Never store references to external, mutable objects passed to the constructor; if necessary, create copies, and store references to the copies. Similarly, create copies of your internal mutable objects when necessary to avoid returning the originals in your methods.

E.g.
public final class FinalPersonClass {

private final String name;
private final int age;

public FinalPersonClass(final String name, final int age) {
super();
this.name = name;
this.age = age;
}
public int getAge() {
return age;
}
public String getName() {
return name;
}

}

Immutable objects are automatically thread-safe –true/false?

- True. Since the state of the immutable objects can not be changed once they are created they are automatically synchronized/thread-safe.

Which classes in java are immutable?

- All wrapper classes in java.lang are immutable –
String, Integer, Boolean, Character, Byte, Short, Long, Float, Double, BigDecimal, BigInteger

What are the advantages of immutability?

- The advantages are:
1) Immutable objects are automatically thread-safe, the overhead caused due to use of synchronisation is avoided.
2) Once created the state of the immutable object can not be changed so there is no possibility of them getting into an inconsistent state.
3) The references to the immutable objects can be easily shared or cached without having to copy or clone them as there state can not be changed ever after construction.
4) The best use of the immutable objects is as the keys of a map.