Sunday, September 22, 2013

Java Collections hierarchy

Collection Hierarchy


How to choose collection for my use case?

Thursday, September 19, 2013

Locale character sorting - java

I am currently working on multilingual site development. Yesterday I faced an interesting requirement to sort the locale characters.

Assuming the english text sorting, we have written the code as Treemap based sorting.
Map data = new TreeMap();  
     data .put("sarav", "000");  
     data .put("rohit", "002"); 
As i18n involved, what happens all locale based text got the last position in the sorting.
data .put("ärav", "003"); 

Interestingly Treemap has a constructor with Compartor

TreeMap(Collator.getInstance(Locale.GERMAN))

The Collator class performs locale-sensitive String comparison. By using this Collator, we will be able to build searching and sorting routines for natural language text.

Collator is an abstract base class. Subclasses implement specific collation strategies. One subclass, RuleBasedCollator, is currently provided with the Java 2 platform and is applicable to a wide set of languages.

Like other locale-sensitive classes, you can use the static factory method, getInstance, to obtain the appropriate Collator object for a given locale, as I used above.

We can set a Collator's strength property to determine the level of difference considered significant in comparisons. Four strengths are provided: PRIMARY, SECONDARY, TERTIARY, and IDENTICAL. The exact assignment of strengths to language features is locale dependant. For example, in Czech, "e" and "f" are considered primary differences, while "e" and "ê" are secondary differences, "e" and "E" are tertiary differences and "e" and "e" are identical.

I didn't set any strength for my requirement, it worked as-is. I found a sample for this in the below URL

http://stackoverflow.com/questions/7502642/sort-a-list-of-hungarian-strings-in-the-hungarian-alphabetical-order


Saturday, December 10, 2011

How to do parallel processing in java?

I have this question in my mind for long time. We have dual processor/multi processor whether we are using the full power of those. How JVM itself can split the task across the processor and make the instruction set execution more efficient?

As usual before I move 5 steps in my thought process, industry will move 50 steps. No wonder, we have solve for this in Java 7.

It's the fork/join framework is an implementation of the ExecutorService interface that helps you take advantage of multiple processors. As with any ExecutorService, the fork/join framework distributes tasks to worker threads in a thread pool. The fork/join framework is distinct because it uses a work-stealing algorithm. Worker threads that run out of things to do can steal tasks from other threads that are still busy.

 If you are not using Java 7 you also need to have "jsr166y.jar" in the classpath.
package com.sarav.forkjoin;

import java.util.Random;

public class Problem {
private final int[] list = new int[2000000]; public Problem() { Random generator = new Random(19580427); for (int i = 0; i < list.length; i++) { list[i] = generator.nextInt(500000); } } public int[] getList() { return list; } }
RecrusiveAction implementation
package com.sarav.forkjoin;

import java.util.Arrays;

import jsr166y.forkjoin.RecursiveAction;

public class Solver extends RecursiveAction {
 private int[] list;
 public long result;

 public Solver(int[] array) {
  this.list = array;
 }

 @Override
 protected void compute() {
  if (list.length == 1) {
   result = list[0];
  } else {
   int midpoint = list.length / 2;
   int[] l1 = Arrays.copyOfRange(list, 0, midpoint);
   int[] l2 = Arrays.copyOfRange(list, midpoint, list.length);
   Solver s1 = new Solver(l1);
   Solver s2 = new Solver(l2);
   forkJoin(s1, s2);
   result = s1.result + s2.result;
  }
 }
}
Test to verify my understanding
package com.sarav.forkjoin.testing;

import jsr166y.forkjoin.ForkJoinExecutor;
import jsr166y.forkjoin.ForkJoinPool;
import com.sarav.forkjoin.Problem;
import com.sarav.forkjoin.Solver;

public class Test {

 public static void main(String[] args) {
  Problem test = new Problem();
  // Check the number of available processors
  int nThreads = Runtime.getRuntime().availableProcessors();
  System.out.println(nThreads);
  Solver mfj = new Solver(test.getList());
  ForkJoinExecutor pool = new ForkJoinPool(nThreads);
  pool.invoke(mfj);
  long result = mfj.getResult();
  System.out.println("Done. Result: " + result);
  long sum = 0;
  // Check if the result was ok
  for (int i = 0; i < test.getList().length; i++) {
   sum += test.getList()[i];
  }
  System.out.println("Done. Result: " + sum);
 }
}


I didn't apply this in my real time application to give some performance report. Soon I will :)

Friday, October 28, 2011

Linked List - Reverse & identifying the cycle

Linked List- another interesting and simple data structure. Most of the interviewers ask for reverse and identifying the cycle in linked list.

Identifying the cycle
Traversal of a linked list with a cycle will never end, so there needs to be some method to keep track of what nodes have been encountered.


One idea is to mark nodes that have been seen, either by adding a flag to the node itself or storing information about the node in a separate data structure. Unfortunately, this method requires modification of the node and/or additional space.


Interviewers won't accept this way as they want to understand/see how we traverse across nodes.

Another idea is to move the pointers in different speed to identify a cycle as below. To make one pointer faster than the other, just advance it two nodes instead of one.

public static boolean hasCycle(Node head) {

Node fast = head;
Node slow = head;

while (true) {
if (fast == null || fast.next == null)
return false;
else if (fast == slow || fast.next == slow)
return true;
else {
fast = fast.next.next;
slow = slow.next;
}
}
}
Reverse


With three pointers, reverse will be pretty straight forward to move on.



Node previous = null;
Node current = head;
Node forward;

while (current != null) {
forward = current.next;
current.next = previous;
previous = current;
current = forward;
}

Download the code (Just copy & paster in your editor)

Saturday, October 22, 2011

Social Commerce - my view

What is Social commerce?

Social commerce is a another dimension of e-commerce. With the social media interaction and user contributions in its side, it will promote/drive online user to buy and sell products or services.

Why it gets big hype in recent timings?
E-commerce sites target the big user bank in social networking sites like Facebook, Twitter etc. E-commerce sites want to drive traffic to their site with these platform which they can convert as a purchase.

Why Facebook is in the top compared to other sites?
It's because of its user bank -
More than 800 million active users
Average user has 130 friends
More than 70 languages available on the site
More than 7 million apps and websites are integrated with Facebook
More than 350 million active users currently access Facebook through their mobile devices
Data from http://www.facebook.com/press/info.php?statistics

Which industries are getting hot?
Fashion (Clothing & shoes) , Electronics (which includes mobile phones, tech gadets)

How I see its different from traditional E-commerce?
  • Group Buy – With friends and family, I can buy and make them buy.Buying something together.
  • Advise - I can get friend's advice on a prospective purchase. Same I can provide too.
  • Recommend/Review - suggesting a specific item to a friend who is looking for one with some review.
  • Share - Something like show off, what I own
  • Trust - Trust will be/can be created not only by item description but from my friends and family.
Readymade Solutions:
IBM social commerce, Microsoft Commerce

Though we can say solutions, but it has to be tied with different departments Marketing, Analytics, PR etc

Prediction: $30billion revenue from social commerce market in 2015

Thursday, July 14, 2011

Reverse an Integer and String

Just tried out the very basic problem today. How to reverse an integer and string? Please let me know, if i can improve anything on this.

Integer logic 1234 = 4 *10^3+3*10^2+2*10^1+1*10^0
String logic Sarav= replace the indices value till you reach the middle.

package com.sarav.util;
/**
* @author sparamasivan
*
*/
public class ReverseUtil {

public static void main(String[] args) {
System.out.println(reverse(1234));
System.out.println(reverse("Sarav"));
}

public static int reverse(int number){
int length=String.valueOf(number).length();
int reverse=0;
while(length>0){
//int tens=(int)(Math.pow(10.00,length-1));
//reverse=reverse + ((number%10)*tens);
reverse=(reverse*10) + ((number%10));
number= number/10;
length--;
}
return reverse;
}

public static String reverse( String s ) {
int length = s.length(), last = length - 1;
char[] chars = s.toCharArray();
for ( int i = 0; i < length/2; i++ ) {
char c = chars[i];
chars[i] = chars[last - i];
chars[last - i] = c;
}
return new String(chars);
}
}


Tuesday, July 5, 2011

Protocol Buffers & Avro


I am consuming service from world's biggest market place for my site inventory. Current SLA for their services - ~200msec. I was stunned on hearing their new SLA for us ~20msec. Heard about their optimization happened on data serialization and digged little bit today to upgrade myself.

Most server-client platforms use a serialization technique to serialize into a leaner data format, and then de-serialize on the receiving end.Many languages offer native serialization APIs, but when serializing the data using the native API, Metadata about the class is serialized into the output too. is it possible to serialize only data and not with metadata?

Yes we have - Google Protocal Buffers & Apache Avro

Google Protocol Buffers
Protocol Buffers is a serialization format with an interface description language developed by Google. It is available under free software, open source license. Protocol Buffers design goals are emphasized performance and simplicity. It is a language and platform neutral technology that is an extensible mechanism for serializing structured data.

It works by you defining how you want your data to be structured via proto files, which are simply structure text files. Once you have decided the structure in your proto file, the proto executable is called on it, and a generated class (Adobe Actionscript 3, Java, C, C++, Python) is produced. The class can be generated into multiple different technologies, which means the class can be generated for the client and server technologies. Thus securing a data contract (which is type safe) between the two. The protocol buffer technology provides the ability to update the data structure without breaking deployed programs that are compiled against the old format.

Protocol buffers claims it takes between 100 to 200 nanoseconds to parse. As the overhead of the data structure is not needed in protocol buffers, only the object fields’ values is serialized. Protocol buffers will find the most compact serialization technique for a particular data type (always primitives), and only serialize fields that are not null.

Apache Avro
Avro is another very recent serialization system. It provides rich data structures that are compact, and are transported in a binary data format.

Avro relies on a schema-based system that defines a data contract to be exchanged. When Avro data is read, the schema used when writing it is always present. Similar to Protocol Buffers, it is only the values in the data structure that are serialized and sent. The strategy employed by Avro (and Protocol Buffers), means that a minimal amount of data is generated, enabling fast transport.

The schemas are equivalent to protocol buffers proto files, but they do not have to be generated. The JSON format is used to declare the data structures.

Protocol Buffers & Avro

Google’s Protocol buffer provides a much richer API for defining a data contract than Avro. Below is a list of features available to Protocol Buffers and not Avro:


  • Declare nested types

  • Define requires, repeated and optional fields

  • Specify default values on fields

  • Declare enumerations and set a fields default value from it

  • Multiple message types in the same document

  • Import other proto files

  • Declare a range of field numbers in a message available for third party extensions (Extensions) Nested Extensions

  • Define services

Avro is only compatible with C, Java and Python, but Protocol Buffers is compatible with C, C++, Adobe Actionscript 3, Java and Python.


Now you would have guessed what that company applied for their middleware solution as a push mechanism between server and client.