Philosophy behind NodeJS

In this section I will try to present philosophical aspects behind NodeJS covering design principles and programming techniques inspired its development.

  1. Hollywood Principle: “Don’t call us we will call you”. To understand this here’s a small story – a struggling actor goes to some director and ask for role, currently the director don’t have any suitable assignment for him, and he don’t want to be bothered again and again, so he asked actor to leave his contact detail to his assistant and says – “don’t call me I will call you if some assignment is there”. This principle initiated the callback mechanism, whereby one task depending on other task for some processing, simply submit request along with a callback method/function to be called by other task when result is available. This way tasks do not block important resources like ‘thread’ in waiting.
  2. Asynchronous Call: To understand this first understand that ‘asynchronous’ is the state of being ‘not synchronised’; synchronous process generally wait for final result to be calculated whether success or failure. In asynchronous call you submit your request along with callback handler that must be triggered when result is prepared and returned, your process doesn’t wait rather continue with its flow. To achieve this ofcourse threads are required but that will be handled by nodejs for developer its single threaded programming model.
    Example: executeQuery("select empid from employee", resultHandler(data))
    Here your resultHandler is a callback handler.

    Please see http://blog.mixu.net/2011/02/01/understanding-the-node-js-event-loop/ for more detail on single thread model.

  3. Non-Blocking IO: NodeJS is suitable for IO intensive application rather computation intensive application (more on this can be found in google). Non-Blocking IO means your program execution flow is not blocked for any IO operation and this is also achieved through callback. Javascript support’s callback implementation but didn’t have any IO API, for node non-blocking IO implementation was added, which is available for developer out of the box.
    Example: executeQuery("select empid from employee", resultHandler(data));
    displayProcessingWindow();
    Here you can see displaying process running info to user is called after db call is made; and this will display process window because your execution flow is not blocked by the call of ‘executeQuery’, also you are not doing anything special to achieve this it will be handled out of the box.
  4. Multiple Request Single Thread: Javascript follows single thread programming model, means there is no way you can spawn new thread in your code/implementation, and this is done to achieve simple programming model. Parallel execution is handled out of the box. Node assume that any call other than IO will be completed immediately, that is why it’s not recommended for computation intensive tasks; as IO operations are non-blocking it will not halt your execution flow. (IO operations are handled parallel in the background using thread)

Conclusion: Hollywood principle is the key design principle behind non-blocking implementation, and callback and event loop is the technique to achieve asynchronous/non-blocking behaviour.

Purpose

  1. Node is designed to be used for applications that are heavy on input/output, but light on computation (computation intensive task should be delegated out of node).
  2. To leverage single thread programming model (for developers) to achieve simplicity, while handling complex multithreading in the background out of the box.
  3. To reduce memory overhead that is inherent in request per thread model.

References

Advertisements

“Instanceof” is it bad?

Once an interviwer asked me – we know using “instanceof” is bad, can we avoid using it with the help of some pattern (give the name)?

In my 9 years of java programming career I don’t remember that I have used instanceof apart from overriding “equals” method. So first of all this question that “can we avoid its usage with the help of some pattern” made me confuse and I was not able to answer it, later on after doing some reading and trying out some examples I came to the conclusion that its not a right question. Design consideration to avoid “instanceof” in your code totally depend on the problem that you are solving or the requirement you are implementing. Let me explain in detail.

  1. Is using “instanceof” is bad: Can you avoid its usage when overriding equals method in your java bean? answer is NO. In a situation similar to equals method you cannot avoid its usage, but so far I have not seen any such situation, so presently I think we can avoid its usage apart from equals method, and there is no silver bullet pattern.
  2. Please read this artical – “http://butunclebob.com/ArticleS.UncleBob.VisitorVersusInstanceOf”, the solution in this article is done using visitor pattern with reflection to avoid “instanceof” usage. What I feel about this solution is that there is no need of visitor and reflection in the solution, however there could be a situation where visitor will help, and my intent here is to show that there is no silver bullet pattern to avoid “instanceof”. The problem mentioned in the above article is – We have courses and we want to generate report about each course, so separate classes are created for courses and course report generators (to achieve separation of concern). The solution provided in the article is trying to identify report generator based on course and then call methods on that report generator to generate report, here the use of reflection with visitor can be easily avoided if we shift data part in course classes (course title, duration, sysRequired, etc) and report generation behaviour in generic report generator or specific report generator. Lets look at generic report generator.

    GenericReportGenerator {
    Course course;
    generateStandardReport();
    buildTitle() {
    course.getTitle()
    // use course title to generate report title.
    }
    buildHeader();
    buildBody();
    buildFooter();
    }
    You can see with generic report generator there is no need to check instanceof courses or course report generator, also with the help of builder pattern you can change report structure as required (ex: building report by using only buildTitle and buildBody).
    Now lets see specific report generator, we can have JavaReportGenerator and AOODReportGenerator, now in solution every time we create a specific report generator we register it (think of Map having course as key and course report generator as value), further we can use factory to return course report generator by providing course to it.
    This way we can clearly avoid instanceof and reflection usage as defined in the article above, rather you can see in many situations you can avoid instanceof and reflection using appropriate patterns. Some people think that in case you want to call specific method which was introduced by the sub-class/sub-implementor* needs type checking using “instanceof” (however I think this is wrong as new implementation should go in base class, and sub-classes can avoid this implementation if they want to), but this can be avoided using generics if you are using jdk 5 or above or by using specific report generator implementation.

    Some people think “instanceof” introduce lot of if else statement and to avoid that – visitor pattern will help, well I think its wrong, as stated above – refactoring of such code depends on particular implementation, factory, state, builder etc, any pattern can be used depending on the situation. you can even use Predicate (high level pattern**) – please see http://www.infoq.com/presentations/3-Patterns-Cleaner-Code

* in case of interface where you don’t want to break existing implementation by introducing new method in interface.
** pattern mix of several pattern

Amazon Elastic MapReduce with Talend

Submit EMR Job from Talend

Prerequisites

  • Amazon account with AccessKey and SecretKey.
  • AWS SDK for Java.
  • Talend Studio for BigData.
  • Pre knowledge on how to create job in Talend.

Steps

  1. Create new job in Talend.
  2. Use component ‘tLibraryLoad’ and ‘tJava’, and connect them as shown below.Image
  3. tLibrary settings: add aws sdk jars in tLibrary using advance settings.Image
  4. Use tJava advance settings to import all required dependencies.Image
  5. Code to submit MR job to EMR in tJava.Image
    System.out.println(“Starting T Java component for EMR job.”);
    //Set creadential (accessKey and secretKey)
    //String accessKey = “”;
    //String secretKey = “”;
    AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
    AmazonElasticMapReduceClient emrClient = new AmazonElasticMapReduceClient(credentials);
    StepFactory stepFactory = new StepFactory();//Hadoop jar step
    HadoopJarStepConfig jarStepConfig = new HadoopJarStepConfig();
    jarStepConfig.setJar(“s3://mddwordcount1/MR_JAR/hadoop-0.20.2-examples.jar”);
    jarStepConfig.setMainClass(“wordcount”);
    ArrayList<String> args = new ArrayList<String>();
    args.add(“s3://mddwordcount1/input/catalina.2012-06-12.log”);
    args.add(“s3://mddwordcount1/output/wordcount”);
    jarStepConfig.setArgs(args);

    // Debug step config that will help us bad times.
    StepConfig enableDebugging = new StepConfig();
    enableDebugging.withName(“Enable Debugging”);
    enableDebugging.withActionOnFailure(“TERMINATE_JOB_FLOW”);
    enableDebugging.withHadoopJarStep(stepFactory.newEnableDebuggingStep());

    // hadoop step config
    StepConfig hadoopJarConf = new StepConfig();
    hadoopJarConf.withName(“Jar Test”);
    hadoopJarConf.withActionOnFailure(“TERMINATE_JOB_FLOW”);
    hadoopJarConf.withHadoopJarStep(jarStepConfig);

    // instance config
    JobFlowInstancesConfig instancesConfig = new JobFlowInstancesConfig();
    instancesConfig.setMasterInstanceType(“m1.small”);
    instancesConfig.setSlaveInstanceType(“m1.small”);
    instancesConfig.setHadoopVersion(“0.20.205”);
    instancesConfig.setInstanceCount(2);
    instancesConfig.setPlacement(new PlacementType(“us-east-1c”));
    instancesConfig.withKeepJobFlowAliveWhenNoSteps(false);
    instancesConfig.setTerminationProtected(false);

    // Job request creation.
    RunJobFlowRequest request = new RunJobFlowRequest();
    request.withName(“CustomJarStepConfigtest”);
    request.withSteps(enableDebugging, hadoopJarConf);
    request.withLogUri(“s3://mddwordcount1/log”);
    request.withInstances(instancesConfig);
    request.withAmiVersion(“latest”);

    // finally submitting job.
    RunJobFlowResult result = emrClient.runJobFlow(request);

  6. Run your job and see the output at Amazon EMR job console.Image
  7. In case of failure you can select your job and click ‘Debug’ to see error logs.