Saturday, August 10, 2024

 Performance tuning in Cassandra


Cassandra Performance Tunning:

Read minimization:

+++++++++++++++++++

Of course, we can not afford full SStable scan on each reads. Several mechanisms are implemented by default in Cassandra to ease read.

+ The first one is about Bloom filters.

Bloom filters is a probabilistic data structure to check for the existence of a key. It allows Cassandra to almost all of the time avoid to read on the disk for keys that do not exist.

+ Key Cache storing on disk location for a specific key.

Thus we can save much time when reading it.

+ Cassandra has a Row Cache,

storing in memory specific values for rows. It manages to keep ‘hot’, frequently access values in it. It allows finer grained caching than file system cache. Row cache can avoid disk reads.

+ compaction! SSTables are immutable.

However update and delete heavy workload will generate large SStables. Sometime we will need to remove entries and associated tombstone. Sometimes we want to summarize an entry and it’s updates into one entry. And we also want to have clustering partitions located on the same SSTable. So Cassandra will on a regular basis read and re-write the SSTables to make our reads easier, and save disk space.

+ Metadata Analysis:

describe table;

You can there check:

compression

compaction

Bloom filter settings

Cache settings

To see Cassandra level metrics (global caches results, load, etc…):

+ nodetool info

To see detailed statistics about your tables, including:

Read/Write request counts

Read/Write latencies

Space on disk

SSTable count

Bloom filter statistics

+ nodetool cfstats

You can also monitor detailed percentile latencies, per table, using:

+ nodetool cfhistograms apache_james messageidtable

Adding row cache

You can enable row caching. It can avoid significant READ load on your disks.

War story: You need to reboot the node when enabling row-cache though row_cache_size_in_mb cassandra.yaml configuration file. nodetool will not be enough.

Once this (annoying) configuration parameter is enabled, you can use CQL per table to enable row cache:

use apache_james ;

ALTER TABLE modseq

WITH caching = {keys: ALL, rows_per_partition: 10} ;

Updating the compaction strategy of your tables can be done without downtime, at the cost of running compactions. Warning: this might consume IO and memories, and thus decrease performances when the compaction is running.

You need to modify the CQL table declaration to change the compaction strategy:

use apache_james ;

ALTER TABLE modseq

WITH compaction = { class : LeveledCompactionStrategy };

For the changes to take effect, you need to compact the SSTables. To force this, you need to use nodetool:

+ nodetool compact keyspace table

For the following compaction on large tables, you can use:

+ nodetool compactionstats

The rule of thumb for compaction time estimate is, with our hardware (16GB, HDD), approximatively one hour per GB stored on the table.

JVM tunning for cassandra:

https://tobert.github.io/pages/als-cassandra-21-tuning-guide.html

Methodology for Perf tuning:

  • Methodology
  • Observability
  • Practice
OODA Loop

Try to identify if there is a saturation at any level.Especially at the device IO level.What  it means the device is having more request probably queued , then it can served.
  1. IOSTAT 
  2. Error : tcp retransmit
Anti-Patterns:
  1. Average latency watch 
Some useful commands to find internal working of cassandra:


  • chunk_length_in_kb (default: 16KiB): specifies the number of kilobytes of data per compression chunk. The main tradeoff here is that larger chunk sizes give compression algorithms more context and improve their ratio, but require reads to deserialize and read more off disk. This means there is a configurable tradeoff between read and disk space here.

  • Advanced Use

  • Advanced users can provide their own compression class by implementing the interface at org.apache.cassandra.io.compress.ICompressor.

Saturday, October 21, 2017


Serialization in JAVA:



Short story about serialization
After many years of hard work, Earth's scientists developed a robot who can help them in daily work. But this robot had fewer features than the robots developed by the scientists from planet Mars.
After a meeting between both planets' scientists, it is decided that Mars will send their robots to Earth. But a problem occurred. The cost of sending 100 robots to Earth was $100 million. And it takes around 60 days of traveling.
Finally, Mar's scientists decided to share their secret with Earth's scientists. This secret was about the structure of class/robot. Earth's scientists developed the same structure on Earth itself. Mar's scientists serialized the data of each robot and sent it to earth. Earth's scientists deserialized the data and fed it into each robot accordingly.
This process saved them time in communicating a massive amount of data.
Some of the robots were being used in some defensive work on Mars. So their scientists marked some crucial properties of those robots as transient before sending their data to Earth. Note that the transient property is set to null (in case of reference) or to the default value (in case of the primitive type) when the object gets deserialized.
One more point noticed by Earth's scientists is that Mars's scientists asked them to create some static variables to keep details about the environment. These details are used by some robots. But Mars's scientists don't share these details. Because Earth's environment was different from Mars' environment.
Even though knowing about the robot class structure and having serialized data Earth's scientist were not able to deserialize the data which can make robots working.
Exception in thread "main" java.io.InvalidClassException:
SerializeMe; local class incompatible: stream classdesc
:
Mars's scientists were waiting for the complete payment. Once the payment was done Mars's scientists shared the serialversionUID with Earth's scientists. Earth's scientist set it to robot class and everything started working.

Java provides a mechanism to convert live object to bytestream and then save it to objects and back and forth.This process is called serialization.To achieve serialization java provides a Marker Interface Serializable(Interface having no methods and fields) to achieve serialization.Any Class that implements Serializable can be serialized and deserialized.

Below is the example of a simple java bean class that achieve serialization.
package com.kunal.serialization;

import java.io.Serializable;

public class SerializedStudent implements Serializable {

/**
* For versioning of class.
*/
//private static final long serialVersionUID = 1L;
public SerializedStudent(int aRoll,String aName) {
// TODO Auto-generated constructor stub
this.roll = aRoll;
this.name = aName;
}
int roll;
String name;
//String pass;

/*public String getPass() {
return pass;
}
public void setPass(String pass) {
this.pass = pass;
}*/
public int getRoll() {
return roll;
}
public void setRoll(int roll) {
this.roll = roll;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
}



package com.kunal.serialization;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;



public class Driver {

public static void main(String[] args) {
try {
//Writing object to outputstream with help of object output stream
FileOutputStream fo = new FileOutputStream(new File("serializedStudent"));
SerializedStudent student = new SerializedStudent(25, "KuchBhi");
ObjectOutputStream ostream = new ObjectOutputStream(fo);
ostream.writeObject(student);


FileInputStream iStream = new FileInputStream(new File("serializedStudent"));
ObjectInputStream iObjStream = new ObjectInputStream(iStream);
SerializedStudent readStudent = (SerializedStudent)iObjStream.readObject();
System.out.println(readStudent.getName());
} catch ( IOException | ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

}

}


Output:
KuchBhi
25

If from our studment BEAN we uncomment pass field and comment the Object outputStream Section(Reading the old serialized object into new updated class),
We will get below exception:
java.io.InvalidClassException: com.kunal.serialization.SerializedStudent; local class incompatible: stream classdesc serialVersionUID = -6360204280632258449, local class serialVersionUID = 651924956756896015
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1843)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2000)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at com.kunal.serialization.Driver.main(Driver.java:24)
This is becaues if we do not provide serialVersionUID java provides a default serialVersionUID as per its specification and the guidelines are very dependent on structure of the class.So originally our class has a serialVersionUID say x.
But when we added a filed pass the structure of class changed and java changed the default serialVersionUID.Hence when we tried to deserilaze older file (which was having older serialVersionUID ) and tried to desiralize wrt new updated class it gave us an error.

serialVersionUID is used to ensure that during deserialization the same class (that was used during serialize process) is loaded. This is a one line definition to explain why a serialVersionUID is used?

Apart from the above definition there are quite  a few things to learn from this serialVersionUID. As per javadocs, following is format of serialVersionUID:

serialVersionUID Syntax

ANY-ACCESS-MODIFIER static final long serialVersionUID = 42L;
  • serialVersionUID is a static final field. You can assign any number of your choice to it. Later I will explain the significance of these two statements.

Why serialVersionUID?

Lets start with annoying warning message you get in your IDE when you declare a class as Serializable.
The serializable class Lion does not declare a static final serialVersionUID field of type long
Most of us used to ignore this message as we always do for a warning. My general note is, always pay attention to the java warning messages. It will help you to learn a lot of fundamentals.

serialVersionUID is a must in serialization process. But it is optional for the developer to add it in java source file. If you are not going to add it in java source file, serialization runtime will generate a serialVersionUID and associate it with the class. The serialized object will contain this serialVersionUID along with other data.

Even though serialVersionUID is a static field, it gets serialized along with the object. This is one exception to the general serialization rule that, “static fields are not serialized”.

How serialVersionUID is generated?

serialVersionUID is a 64-bit hash of the class name, interface class names, methods and fields. Serialization runtime generates a serialVersionUID if you do not add one in source. Refer this link for the algorithm to generate serialVersionUID.
It is advised to have serialVersionUID as unique as possible. Thats why the java runtime chose to have such a complex algorithm to generate it.
If you want help in generating it, jdk tools provides a tool named serialver. Use serialver -show to start the gui version of the tool as shown below.



How serialVersionUID works?

When an object is serialized, the serialVersionUID is serialized along with the other contents.
Later when that is deserialized, the serialVersionUID from the deserialized object is extracted and compared with the serialVersionUID of the loaded class.
If the numbers do not match then, InvalidClassException is thrown.

Serialization and inheretence:
Case 1 : If SuperClass is serializable then subclass is automatically serializable.
For details please refer to my Git repo(Link Shared at the bottom)
Case 2 : If Super class is not serializable and subclass is serailizble
- In this case at the time of serilization JVM will allocate default values to the fields belonging to super class
- At the time of deserialization it will execute default non-arg consutructor(Which is a must in this secanrio to avoid runtime exception).
For examples please refer to gitrepo.
Case 3 : If super calss is serialized and in subclass we want to avoid such scenario then there is no direct way for the same and to achieve this we can throw exceptions from read/write utility methods.

Serialization in SingleTon
As we know singleton can be easily breached by using reflection and serialization.One of the way forword to preserve singleton is to use readResolve method.This method must not have call elsewher in the code.It must be called by java runtime only to identify and return singleton instance.

For Serializable and Externalizable classes, the readResolve method allows a class to replace/resolve the object read from the stream before it is returned to the caller. By implementing the readResolve method, a class can directly control the types and instances of its own instances being deserialized. The method is defined as follows:
        ANY-ACCESS-MODIFIER Object readResolve()
                throws ObjectStreamException;
The readResolve method is called when ObjectInputStream has read an object from the stream and is preparing to return it to the caller. ObjectInputStream checks whether the class of the object defines the readResolve method. If the method is defined, the readResolve method is called to allow the object in the stream to designate the object to be returned.
Externalizable
At sender end:
Externalization lets the programmer to customize the flow of serialization with two methods readExternal(Object obj) and writeExternal(Object obj).JVM gives priority to Externalization first and if the class being serialized has not implemented Externalized but Serialized then it user ObjectOutputStream to serialize.
At receiver end:
The object is constructed using No-Arg Constructor and the readExternal is called.Otherwise its reconstructed using ObjectInputStream in case its implementing Serializable.you must have no args contructor if you implement externalizable.
package com.kunal.serialization.externalization;



import java.io.Externalizable;
import java.io.IOException;
import java.io.ObjectInput;
import java.io.ObjectOutput;
import java.io.Serializable;



public class ExternalizedStudent implements Externalizable {




/**
* For versioning of class.
*/
private static final long serialVersionUID = 1L;

public ExternalizedStudent(int aRoll,String aName) {
this.roll = aRoll;
this.name = aName;
}

public ExternalizedStudent(){

}

int roll;
String name;
public int getRoll() {
return roll;
}
public void setRoll(int roll) {
this.roll = roll;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
@Override
public void writeExternal(ObjectOutput out) throws IOException {



out.writeInt(roll);
out.writeObject(name);


}
@Override
public void readExternal(ObjectInput in) throws IOException, ClassNotFoundException {



roll = in.readInt();
name = (String)in.readObject();
}
}









Driver:
package com.kunal.serialization.externalization;

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;

public class ExternalizableApp {

public static void main(String[] args) {

ExternalizedStudent student = new ExternalizedStudent(4,"kunal");
//Serialize
try
{
FileOutputStream fileOut = new FileOutputStream("student.ser");
ObjectOutputStream outStream = new ObjectOutputStream(fileOut);
outStream.writeObject(student);
outStream.close();
fileOut.close();
}catch(IOException i)
{
i.printStackTrace();
}
//Deserialize
student = null;
try
{
FileInputStream fileIn =new FileInputStream("student.ser");
ObjectInputStream in = new ObjectInputStream(fileIn);
student = (ExternalizedStudent) in.readObject();
in.close();
fileIn.close();
}catch(IOException i)
{
i.printStackTrace();
return;
}catch(ClassNotFoundException c)
{
System.out.println("student class not found");
c.printStackTrace();
return;
}
System.out.println("Deserialized student...");
System.out.println("Name: " + student.getName());
System.out.println("roll: " + student.getRoll());
}

}
Inheritence in Externalization:

Case 1: What if super class does not implement Externalizable:

If Super class does not externd externalizable and child class does then child classs can still serialize fields in its writeExternal method.
Case 2: What if super class implements Externalizable:
In this case both super and sub class will have externilizable implemantation and hence both of them will have readExternal/writeExternal methods overridden so child class need explicit call to these methods with objectInput and objectOutput as parameters.

Please refer to my git-repo