Tuesday, April 9, 2013

Logging Frameworks

In any complex application comprising of several components working together, tracking failures effectively becomes more challenging. Even though the application is separated by individual components, a trace of operation is required to investigate potential failures. In such circumstances, logging individual component activities comes in handy and provides a great depth of insight over periodic operations. Logging using system.out and filewriter in Java was prevalent but now with more sophisticated frameworks available, such techniques have become a thing of the past. There are three major logging frameworks which are dominant in the java world apart from countless others. They are Log4J, Slf4J and Logback frameworks.

Java Logging API
The java logging API contains a basic set of logging capabilities in the java.util.logging package using the Logger class. The Logger actually is a hierarchy of Loggers, and a . (dot) in the hierarchy indicates a level in the hierarchy. If we get a Logger for the com.example then the logger is a child of the com Logger and the com Logger is child of the Logger for the empty String. We can configure the main logger which affects all its children. The log levels such as SEVERE, WARNING, INFO etc define the severity of a message. The Level class is used to define which messages should be written to the log. The levels OFF and ALL to turn the logging of or to log everything. Each logger can access several handlers which receives the log messages from the logger and exports it to a target file (FileHandler) or console (ConsoleHandler). Each handlers output can be configured with formatters such as SimpleFormatter to generate messages in text or XMLFormatter to generate messages in XML format. The log manager is responsible for creating and managing the logger and the maintenance of the configuration.

The logging can be configured using the log.properties file with the below sample configuration.

# Logging
handlers = java.util.logging.FileHandler, java.util.logging.ConsoleHandler.level = ALL

# File Logging
java.util.logging.FileHandler.pattern = %h/myApp.log
java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter
java.util.logging.FileHandler.level = INFO

# Console Logging
java.util.logging.ConsoleHandler.level = ALL


The "-Djava.util.logging.config.file=/absolute-path/logging.properties" parameter is used to load a custom log.properties for java util logging. It works with following cases:
  • Move the file log.properties to the default package (the root folder for your sources)
  • add it directly to the classpath (just like a JAR)
  • You can specify the package in which the file is, replacing "." with "/": -Djava.util.logging.config.file=com/company/package/log.properties
  • You can specify the absolute path

The most famous way to disable all the logging for any frameworks is by setting the error output to NULL as follows:
  static {
    //Windows style
    try {
        PrintStream nps = new PrintStream(new FileOutputStream("NUL:"));
        System.setErr(nps);
        System.setOut(nps);
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    }
  }


Log4J Framework
Log4J is the oldest of the above frameworks, and widely used due its simplicity of usage. It defines various log levels and messages. Log4j is thread safe and optimized for speed. It is based on a named logger hierarchy. It supports multiple output appenders per logger and internationalization.
Log4j is not restricted to a predefined set of facilities. Its logging behavior can be set at runtime using a configuration file. It is  designed to handle Java Exceptions from the start. Log4j uses multiple levels, namely ALL, TRACE, DEBUG, INFO, WARN, ERROR and FATAL to denote log levels. The format of the log output can be easily changed by extending the Layout class. The target of the log output as well as the writing strategy can be altered by implementations of the Appender interface. Log4j is fail-stop but it does not guarantee that each log statement will be delivered to its destination.
   Below is the sample log4j property file: log4j.properties

#suppress logging from spring and hibernate to warn
log4j.logger.org.hibernate=WARN
log4j.logger.org.springframework=WARN

# Set root logger level to DEBUG and its only appender to Appender1.
log4j.rootLogger=INFO, Appender1,Appender2
# Appender1 is set to be a ConsoleAppender.
log4j.appender.Appender1=org.apache.log4j.ConsoleAppender
log4j.appender.Appender2=org.apache.log4j.RollingFileAppender
log4j.appender.Appender2.File=sample.log
# Appender2 uses PatternLayout.
log4j.appender.Appender1.layout=org.apache.log4j.PatternLayout
log4j.appender.Appender1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
log4j.appender.Appender2.layout=org.apache.log4j.PatternLayout
log4j.appender.Appender2.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n

Log4j sample code is as follows:
      try {
            Properties props = new Properties();
            props.load(TestHTTP.class.getResourceAsStream("/log4j.properties"));
            System.out.println("props = " + props.toString());
            PropertyConfigurator.configure(props);
      } catch (IOException e) {
            e.printStackTrace();
      }

      LogManager.getRootLogger().setLevel(Level.OFF);

      // Pavan's Code
      Logger log = Logger.getLogger("myApp");
      log.setLevel(Level.ALL);
      log.info("initializing - trying to load configuration file ...");

      Properties preferences = new Properties();
      try {
          FileInputStream configFile = new FileInputStream("/path/to/app.properties");
          preferences.load(configFile);
          LogManager.getLogManager().readConfiguration(configFile);
      } catch (IOException ex)  {
          System.out.println("WARNING: Could not open configuration file");
          System.out.println("WARNING: Logging not configured (console output only)");
      }

      log.info("starting myApp");

Logback Framework
Logback framework is a successor to the log4j framework providing Slf4J Api implementation natively. Logging configuration can be provided either in xml or groovy. It provides a SiftingAppender which enables to maintain seperate the logfiles based on the user session instance and the ability to switch the loglevel for individual users. Logback automatically reloads upon configuration changes and provides a better I/O failover in case of server failure.

Logback delegates the task of writing a logging event to components called appenders.
Appenders must implement the ch.qos.logback.core.Appender interface, which contains doAppend() method which is responsible for outputting the logging events in a suitable format to the appropriate output device.

Sample configuration for logback framework is as follows:


  
    
  

  
     
        
     
  

  
    
      
      
      
      
      
      
      
    
  

  
    PERFORMANCE
    ALLOW
  

  
    
      %date [%thread] %mdc %-5level %logger %msg %n %ex
    
  

 
    
      unknown
    
    
      
        
          ERROR
          ACCEPT
          DENY
        
        ${logdir}/${contextName}Error.log
        true
        
          ${logdir}/${contextName}Error%d{yyyy-MM-dd}.%i.log
          
            
            10MB
          
          
          30

          
          30
        

        
          %date [%thread] %mdc %-5level %logger %msg %n
        

        

      

    
  

  
    
      
        
        
        
      
    
  






In any complex application

Wednesday, February 27, 2013

Maven Plugin Development


Maven carries out all its implementation using plugins making them highly significant for its operation. But often there are times when a customized plugin implementation might be needed in order to carryout some peculiar build-related tasks. Tasks especially involving jenkins build operations, or command line operations which can be better off using maven than ant scripts. Further plugins can call other plugins and create custom goals to carryout large series of operations. Hence maven plugin development comes handy in creating customized maven plugins.

A maven plugin contains a series of Mojos (goals), with each Mojo being a single Java class containing a series of annotations which tells Maven the way to generate the Plugin descriptor. Every maven plugin must implement the Mojo interface which requires the class to implement getLog(), setLog() and execute() methods. The abstract class AbstractMojo provides the default implementation of getLog() and setLog(), thus only requiring to implement the execute() method. The getLog() method can be used to access the maven logger which has methods info, debug() and error() to log at various levels. The execute() method is the entry point of the plugin execution and provides the customized build-process implementation for the maven plugin.
           The AbstractMojo implementation does require to have a @goal annotation in the class-level javadoc annotation. The goal name specified with the javadoc @goal annotation defines the maven goal name to be used along with the goal prefix in order to execute the plugin. The mojo goal can be used directly in the command line or from the POM by specifying mojo-specific configuration. The @phase annotation if specified, binds the Mojo to a particular phase of the standard build lifecycle e.g. install. It is to be noted that the phases in the maven lifecycle are not called in series from the phase name specified with the @phase annotation in the Maven Mojo. The @execute annotation can be used to specify either phase and lifecycle, or goal to be invoked before the execution of plugin implementation. When the mojo goal is invoked, it will first invoke a parallel lifecycle, ending at the given phase. If a goal is provided instead of a phase, that goal will be executed in isolation. The execution of either will not affect the current project, but instead make available the ${executedProject} expression if required. The @requireProject annotation denotes whether the plugin executes inside a project thus requiring a POM to execute or else can be executed without a POM. By default the @requireProject is set to true, thus requiring to run inside a project. The @requiresOnline annotation mandates the plugin to be executed in the online mode. The Maven Mojo API Specification provides all the available annotations in detail.
      Maven mojo class can also access maven specific parameters such as MavenSession, MavenProject, Maven etc using the maven parameter expressions "${project}", "${session}" or ${maven}. These maven model objects can be used to get the project details in the POM or alter the session to execute another project. Below is the sample maven plugin mojo, which reads another pom, creates a new maven project and alter the session to execute the new project. Also it lists the plugins present in the maven project.

/**
 * @goal sample-task
 * @requiresProject false
 * @execute lifecycle="mvnsamplecycle" phase="generate-sources"
 */
public class SampleMojo extends AbstractMojo {
 /** 
 * The Maven Session Object 
 * @parameter expression="${session}" 
 * @required 
 * @readonly 
 */ 
 private MavenSession session; 

 /**
 * The maven project.
 * @parameter expression="${project}"
 * @readonly
 */
 private MavenProject project;

 public void execute() throws MojoExecutionException, MojoFailureException {

      // Create a new MavenProject instance from the pom.xml and set it as current project.
      MavenXpp3Reader mavenreader = new MavenXpp3Reader();
      File file = new File("../../pom.xml");
      FileReader reader = new FileReader(file);
      Model model = mavenreader.read(reader);
      model.setPomFile("../../pom.xml");
         
      MavenProject newProject = new MavenProject(model);
      project.setBuild(newProject.getBuild());
      project.setExecutionProject(newProject);
      project.setFile(file);
      session.setCurrentProject(newProject);
      session.setUsingPOMsFromFilesystem(true);

      // Create a new MavenSession instance and set it to execute the new maven project.
      ReactorManager reactorManager = new ReactorManager(session.getSortedProjects());
      MavenSession newsession = new MavenSession( session.getContainer(), session.getSettings(), session.getLocalRepository(),
      session.getEventDispatcher(), reactorManager, session.getGoals(),
      session.getExecutionRootDirectory()+ "/" + app, session.getExecutionProperties(), session.getUserProperties(), new Date());
         
      newsession.setUsingPOMsFromFilesystem(true);
      session = newsession;
      project.setParent(newProject);
      project.addProjectReference(newProject);
      project.setBasedir(new File(app));

      // List all the plugins in the project pom.
      List plugins = getProject().getBuildPlugins();

      for (Iterator iterator = plugins.iterator(); iterator.hasNext();) {
         Plugin plugin = (Plugin) iterator.next();
         if(key.equalsIgnoreCase(plugin.getKey())) {
             getLog().info("plugin = " + plugin);
         }
      }
 }
}

Below are the required dependencies for the Maven Plugin. Note that the last three dependencies along with the maven-invoker are optional and used to access the Maven Object Model with the objects, MavenSession, MavenProject etc.
 
    
      org.apache.maven
      maven-plugin-api
      2.0
    
    
      commons-io
      commons-io
      2.1
    

    
    
      org.apache.maven.shared
      maven-invoker
      2.1.1
    
    
      org.codehaus.plexus
      plexus-component-annotations
      1.5.5
    
    
      org.codehaus.plexus
      plexus-utils
      3.0.8
    
 

 
   
     ...................................
     
       maven-plugin-plugin
       2.3
       
           samples
       
     
     ...................................
   


Maven Lifecycle

The process of building and distributing a particular artifact (project) is defined as the Maven build lifecycle. There are three built-in build lifecycles: default, clean and site. The default lifecycle handles the project deployment, the clean lifecycle handles project cleaning, while the site lifecycle handles the creation of project's site documentation. Each of the build lifecycles is defined by a different list of build phases, wherein a build phase represents a stage in the lifecycle. The build phases listed in the lifecycle are executed sequentially to complete the build lifecycle. On executing the specified build phase in the command line, it will execute not only that build phase, but also every build phase prior to the called build phase in the lifecycle. This works for multi-module scenario too. The build phase carries out its operations by declaring goals bound to it.

A goal represents a specific task (finer than a build phase) which contributes to the building and managing of a project. It may be bound to zero or more build phases. A goal not bound to any build phase could be executed outside of the build lifecycle by direct invocation. The order of execution depends on the order in which the goal(s) and the build phase(s) are invoked. Moreover, if a goal is bound to one or more build phases, that goal will be called in all those phases. Furthermore, a build phase can also have zero or more goals bound to it. If a build phase has no goals bound to it, that build phase will not execute. But if it has one or more goals bound to it, it will execute all those goals mostly in the same order of declaration as in the POM.
Goals can be bound to a particular lifecycle phase by configuring a plugin in the project. The goals that are configured will be added to the goals already bound to the lifecycle from the selected phase. If more than one goal is bound to a particular phase, the order used is that those from the selected phase are executed first, followed by those configured in the POM. Note that the <executions> element can be used to gain more control over the order of particular goals. It can also run the same goal multiple times with different configuration if required. Separate executions can also be given an ID so that during inheritance or the application of profiles, it can be controlled whether the goal configuration is merged or turned into an additional execution. When multiple executions are given that match a particular phase, they are executed in the order specified in the POM, with inherited executions running first.

  
    process-classes
    
      
        jcoverage:instrument
      
    
  
  
  
    test
    
      
        surefire:test
        
          
          ${project.build.directory}/generated-classes/jcoverage
          true
        
      
    
  


Report Plugin

Writing a Report plugin is similar to the Mojo plugin were we extend the AbstractMavenReport class instead of AbstractMojo class. The report plugin can be added to the plugins of the reporting section to generate the report with the Maven site. The goal to be executed is specified in the report tag in the reportSet section which control the execution of the goals. The methods getProject(), getOutputDirectory(), getSiteRenderer(), getDescription(), getName(), getOutputName(), getBundle() and executeReport() are required to be overridden.

Note: In order to create the report without using Doxia, e.g. via XSL transformation from some XML file, add the following method to the report Mojo:
public boolean isExternalReport() {
    return true;
}

Following dependencies are required for maven report plugin

    org.apache.maven.reporting
    maven-reporting-api
    2.0.8

 

    org.apache.maven.reporting
    maven-reporting-impl
    2.0.4.3

 

    org.codehaus.plexus
    plexus-utils
    2.0.1


AbstractMavenReportRenderer is used to handle the basic operations with the Doxia sink to setup the head, title and body of the html report. The renderBody method is implemented to fill in the middle of the report by using the utilities for sections and tables in Doxia. To use Doxia Sink-API we import the org.apache.maven.doxia.sink.Sink class and call the getSink() method to get its instance. Then we use the doix api as in the below example to header, title and body. The starting tag is denoted by xxx() while the end tag is denoted by xxx_() similar to html tags. The rawText() method outputs exactly specified text while the text() method adds escaping characters. The sectionning is strict which means that section level 2 must be nested in section 1 and so forth. Below sample report mojo override the required methods and provide a sample usage of Doxia API.

public class ReportMojo extends AbstractMavenReport {

 /**
 * Report output directory.
 * @parameter expression="${project.reporting.outputDirectory}"
 * @required
 * @readonly
 */
 private String outputDirectory;

 /**
 * Maven Project Object.
 * @parameter default-value="${project}"
 * @required
 * @readonly
 */
 private MavenProject project;
 
 /**
 * Maven Report Renderer.
 * @component
 * @required
 * @readonly
 */
 private Renderer siteRenderer;

 protected MavenProject getProject() {
  return project;
 }

 protected String getOutputDirectory() {
  return outputDirectory;
 }

 protected Renderer getSiteRenderer() {
  return siteRenderer;
 }

 public String getDescription(Locale locale) {
  return getBundle(locale).getString("report.description");
 }

 public String getName(Locale locale) {
  return getBundle(locale).getString("report.title");
 }

 public String getOutputName() {
  return "sample-report";
 }

 private ResourceBundle getBundle(Locale locale) {
  return ResourceBundle.getBundle("sample-report", locale, this.getClass().getClassLoader());
 }

 @Override
 protected void executeReport(Locale locale) throws MavenReportException {

     Sink sink = getSink();
     sink.head();
     sink.title();
     sink.text( getBundle(locale).getString("report.title") );
     sink.title_();
     sink.head_();
   
     sink.body();
     sink.section1();
     sink.sectionTitle1();
     sink.text( String.format(getBundle(locale).getString("report.header"), version) );
     sink.sectionTitle1_();
     sink.section1_();
      
     sink.lineBreak();

     sink.table();
     sink.tableRow();
     sink.tableHeaderCell( );
     sink.bold();
     sink.text( "Id" );
     sink.bold_();
     sink.tableHeaderCell_();
     sink.tableRow_();

     sink.tableRow();
     sink.tableCell();
     sink.link( "http://some_url" );
     sink.text( "123" );
     sink.link_();
     sink.tableCell_();
     sink.tableRow_();
     sink.table_();
      
     sink.body_();
     sink.flush();
     sink.close();
 }

MultiPage Report Plugin

Often times there is a need to create maven reports with multiple pages. But the maven report plugin only provides a single instance of doxia sink to create an html page. If we try to copy the implementation of the execute() method in AbstractMavenReport class and try to loop it with different filenames then we do get the required multiple pages but it only works when the report plugin is executed directly without the maven site. The maven site plugin does not calls the execute() method but calls the actual implementation of the executeReport(Locale) method. Hence such logic does not work for the mvn site but works for direct execution of the plugin. The ReportDocumentRenderer from maven-site-plugin creates the SiteRendererSink and calls report.generate(sink,locale) which in turn calls executeReport(Locale) method. Using the createSink() method fails in such case. There is no way to create more SiteRendererSinks within the report, because those sinks are from a different classloader. Maven does provide the AbstractMavenMultiPageReport class to implement but it also does not provide any way to create multiple sink instances. After we upgrade to the maven-report-plugin 3.0 we have a new method in AbstractMavenReport class called getSinkFactory(). It allows to create new sink instances when executeReport method is called from the site-plugin which initializes the factory instance. In case of the direct execution of the multipage report plugin, the execute() method of AbstractMavenReport class has no implementation for initializing the factory method neither any setter to set the factory. Hence in such case we use the dirty hack and copy the execute method implementation in the executeReport method of the multipage report class to create a new sink instance. For accessing the getFactory method we upgrade the maven-reporting-api to 3.0 as follows:

    org.apache.maven.reporting
    maven-reporting-api
    3.0
    
        
            org.apache.maven.doxia
            doxia-sink-api
        
     



    org.apache.maven.doxia
    doxia-sink-api
    1.3



    org.apache.maven.reporting
    maven-reporting-impl
    2.2


Following code provides an overview of the implementation with an example of generating a multipage report:
public class MultiPageReportMojo extends AbstractMavenReport {

  .......................

  /**
   * Copied implementation from {@link AbstractMavenReport}. Generates the index page and 
   * report pages for all the environments. If the {@link SinkFactory} is null 
   * (when invoked directly) then creates a new {@link SiteRendererSink} object using 
   * {@link RenderingContext}. If the {@link SinkFactory} is not null (usually for mvn site) 
   * then uses its createSink() method to create a new {@link Sink} object. 
   * @see org.apache.maven.reporting.AbstractMavenReport#execute()
   */
   @Override
   protected void executeReport(Locale locale) throws MavenReportException {

    List<String> envList = Arrays.asList("local", "devl", "qual", "cert", "prod");
  
    // index method uses getSink() method from AbstractMavenReport class to directly access 
    // the sink and render the index page.
    executeReportIndex(locale, envList);
  
    for (String env : envList) {
   
      File outputDirectory = new File( getOutputDirectory() );
      Writer writer = null;
   
      try {
    
         String filename = outputPrefix + env + ".html";
         SinkFactory factory = getSinkFactory(); 

         if(factory == null) {
     
           SiteRenderingContext siteContext = new SiteRenderingContext();
           siteContext.setDecoration( new DecorationModel() );
           siteContext.setTemplateName( "org/apache/maven/doxia/siterenderer/resources/default-site.vm" );
           siteContext.setLocale( locale );
               
           RenderingContext context = new RenderingContext( outputDirectory, filename );

           SiteRendererSink renderSink = new SiteRendererSink( context );

           // This method uses the sink instance passed for the environment to render the report page.
           executeConfigReport(locale, renderSink);

           renderSink.close();

           if ( !isExternalReport() ) { // MSHARED-204: only render Doxia sink if not an external report
                
             outputDirectory.mkdirs();
             writer = new OutputStreamWriter( new FileOutputStream( new File( outputDirectory, filename ) ), "UTF-8" );
             getSiteRenderer().generateDocument( writer, renderSink, siteContext );
           }
         }
         else {
           Sink renderSink = factory.createSink(outputDirectory, filename);

           // This method uses the sink instance passed for the environment to render the report page.
           executeConfigReport(locale, renderSink);

           renderSink.close();
         }
      } catch (Exception e) {
         getLog().error("Report, Failed to create server-config-env: " + e.getMessage(), e);
         throw new MavenReportException(getName( Locale.ENGLISH ) + "Report, Failed to create server-config-env: " 
                                                                  + e.getMessage(), e);
      } finally {
         IOUtil.close( writer );
      }
  }

  .......................

  /**
   * Renders the table header cell with the specified width and text using the specified sink instance.
   * @param sink
   *   {@link Sink} instance to render the table header cell.
   * @param width
   *   {@link String} of the table header cell.
   * @param text
   *   {@link String} in the table header cell.
   */
  protected void sinkHeaderCellText(Sink sink, String width, String text) {

        SinkEventAttributes attrs = new SinkEventAttributeSet();
        attrs.addAttribute(SinkEventAttributes.WIDTH, width);
        sink.tableHeaderCell(attrs);
        sink.text(text);
        sink.tableHeaderCell_();
  }
}

Tuesday, January 29, 2013

Solr: An Opensource Search Platform

Searching is a basic requirement for any application in the current software world. With the emergence of Yahoo and Google, search technology has been revolutionized with various types of information including books, videos, maps, personal profiles etc being searchable instantly online. Although search technology has progressed far beyond speech recognition and artificial intelligence enabled search, it has being mostly remained proprietary to few the giant co-operations. There were very few alternatives to make your website searchable unless by adding a google or yahoo custom search bar. With the advent of Apache Lucene project, fully text indexing and powerful speedy search became possible in Open source community. Further with the core approach of Lucene to consider a document containing fields of text gives it ability to search Text in various file formats such as Xml, Html, Text, PDF, MS Word, Open Office Document etc.

Setting up Solr
First Apache Tomcat should be downloaded and must be installed in order to set up Solr. We assume that Java Runtime is installed and already configured to run on the system. In order to access the manager page for Apache Tomcat, edit the $CATALINA_HOME\conf\tomcat-users.xml to add user with an admin role associated with it.
   Now download the latest Apache Solr on the system. Extract the archive and copy the .war file to the webapps directory of the Tomcat installation. Copy the Solr configuration from the example directory of the Solr setup in the $CATALINA_HOME directory.
cp -R /path/to/apache-solr-x.x.x/example/solr $CATALINA_HOME/solr

In order for the tomcat to know about the Solr webapp, we add a file named solr.xml in the Tomcat configuration directory i.e. $CATALINA_HOME/conf/Catalina/localhost. Now we open the recently created solr.xml in any text editor and add the following configuration:
<context allowlinking="true" crosscontext="true" debug="0" docbase="{/full/path/to/webapps/solr.war}" priviledged="true">
  <environment name="solr/home" override="true" type="java.lang.String" value="{/full/path/to/CATALINA_HOME/solr}">
  </environment>
</context>

After restarting the Tomcat server we are able to access the Solr interface at http://localhost:8080/solr/admin/.


Configuring Solr
Solr can be configured using two configuration files namely, solrconfig.xml and schema.xml. Both these files reside in the conf directory in the Solr home directory. Solrconfig.xml configures the solr server itself while the Schema.xml is used to specify the fields which the documents might contain, which are used for indexing the documents and querying to seach the documents.

The Solrconfig.xml contains following set of configuration information:
1) Lib directive specifies the path to solr plugins in order to load them. If there are dependencies, list the lowest level dependency jar first. It also supports regular expressions to control loading of jars.
2) DataDir directive specifies the location of the index data files which is stored in "/data" directory by default.
3) IndexConfig section allows to configure low level behavior for Lucene Index writers such as index sizing, index merging , index locks and other parameters.
4) UpdateHandler section relates to the low level details of internal handling of updates such as maximum number of uncommitted documents or maximum time before an auto commit or soft auto commit is triggered, or enabling open search on hard commits etc. It also defines listeners such as RunExecutableListener (which executes external commands) for particular update events, postCommit and postOptimize. It also defines the max pending deletes parameter which sets a limit on the number of deletions that Solr will buffer during document deletion.
Data sent to Solr is not searchable until it has been committed to the index. The reason being that in some cases commits can be slow and they should be done in isolation from other possible commit requests to avoid overwriting data. Hence, it's preferable to provide control over when data is committed using the above commit and soft commit options. Soft commits as opposed to normal commit, does not guarantee that documents are in stable storage after committing.
5) Query section controls everything related to search queries such as maximum number of clauses in boolean query. It contains the caching section and event listener section.
  • Caching section is used to configure the caching parameters depending on the size of the index. Solr caches are associated with a specific instance of an Index Searcher, a specific view of an index that doesn't change during the lifetime of that searcher. As long as that Index Searcher is being used, any items in its cache will be valid and available for reuse. When a new searcher is opened, the current searcher continues servicing requests while the new one auto-warms its cache. The new searcher uses the current searcher's cache to pre-populate its own. When the new searcher is ready, it is registered as the current searcher and begins handling all new search requests. The old searcher will be closed once it has finished servicing all its requests. Details of each cache is as follows:
    FilterCache is used by SolrIndexSearcher for filters and unordered sets of all documents matching the query. For new searcher filterCache is pre-populated using the most recently accessed items. QueryResultCache caches the result of previously searched queries, while DocumentCache caches Lucene document objects which contains the fields of the document. The generic used defined cache can be defined and accessed by SolrIndexSearcher methods getCache(), cacheLookup() and cacheInsert(). Also there are optimizations to use filter for a search and to enable the use of queryResultCache for specific number of result items.
  • Listener section defines a set of listeners triggered by a query related event to perform operations such as prepare cache for new or first search etc. 
6) RequestDispatcher section provides configuration for Solr's RequestDispatcher for handling HTTP requests including whether it should handle "/select" urls; HTTP Request Parsing; remote streaming support; max multipart file upload size etc.
HandleSelect is for backward compatiblity


Indexing files
In order to index html and other files using the post.jar we add the library files "apache-solr-core-x.x.x", "apache-solr-solrj-x.x.x" as well as "slf4j-api", "commons-io", "httpcore", "httpmime", "httpclient" and others from the Solr_Setup/dist and Solr_Setup/dist/solrj-lib directory to the $CATALINA_HOME/webapps/solr/WEB-INF/lib. This would avoid Class not found issues while indexing the files.
  To index files we use the post.jar located in Solr_Setup/example/exampledocs directory using the following command: java -jar post.jar *.xml

Below is the required maven dependencies for using SolrJ library:
    <dependency>
      <groupId>org.apache.solr</groupId>
      <artifactId>solr-solrj</artifactId>
      <version>${solr.version}</version>
    </dependency>

    <dependency>
      <groupId>org.apache.solr</groupId>
      <artifactId>solr-core</artifactId>
      <version>${solr.version}</version>
    </dependency>

Below is the sample code to index html pages using SolrJ:
  // Scan the directory for all Html files and get instance of Solr server to index those files
  public static void indexDirectory(File directory) throws Exception {

    SolrServer solr = new HttpSolrServer("http://localhost:8090/solr");

    // Pattern to filter all html files
    String pattern = "^.*.html$";

    // FileUtils requires Apache Commons-IO library
    Collection<File> files = FileUtils.listFiles(directory, 
                                                 new RegexFileFilter(pattern), 
                                                 DirectoryFileFilter.DIRECTORY );
    for (File file : files) {
      indexFile(solr, file);
    }
  }


  // Add the file to the index of Solr and commit
  public static void indexFile(SolrServer solr, File file) throws Exception {

    // do not try to index files that cannot be read
    if (file.canRead()) {
      if (file.isDirectory()) {

        String[] files = file.list();
        // an IO error could occur

        if (files != null) {
          for (int i = 0; i < files.length; i++) {
            indexFile(solr, new File(file, files[i]));
          }
        }
      } else {

        try {

         ContentStreamUpdateRequest req = new ContentStreamUpdateRequest("/update/extract");
         String parts[] = file.getName().split("\\.");
         String type = "text";

         if (parts.length>1) {
          type = parts[1];
         }

         req.addFile(file, new MimetypesFileTypeMap().getContentType(file));
         req.setParam("literal.id", file.getAbsolutePath());
         req.setParam("literal.name", file.getName());
         req.setParam("literal.content_type", type);
         req.setAction(ACTION.COMMIT, true, true);
     
         solr.request(req); // submits one req at a time.   
        }
        catch (FileNotFoundException fnfe) {
          fnfe.printStackTrace();
        }
      }
    }
  }

Moving on further to searching the indexed files.
  public static void showResults(QueryResponse queryResponse) {
   
  System.out.println("Response Header = " + queryResponse.getHeader());
   
  System.out.println("Elapsed Time: " + queryResponse.getElapsedTime());
  System.out.println("Query Time:" + queryResponse.getQTime());
  System.out.println("Number Of Results:" + ((SolrDocumentList)(queryResponse.getResponse().get("response"))).getNumFound());
  System.out.println("Results: \n\n");
  SolrDocumentList solrDocumentList = queryResponse.getResults();

  Iterator solrDocumentIterator =  solrDocumentList.iterator();
  while(solrDocumentIterator.hasNext()) {
   SolrDocument solrDocument = solrDocumentIterator.next();
   Map fieldValueMap = solrDocument.getFieldValueMap();
   for (String key : fieldValueMap.keySet()) {
    
    if(key.equals("content")) {
     String value = (String) fieldValueMap.get(key);
     value = value.replaceAll("\\s+", " ");
     System.out.println(key + " = " + value);
    }
    else {
     System.out.println(key + " = " + fieldValueMap.get(key));
    }
   } 
  }
  }

Sunday, January 13, 2013

Known Issues/Resolutions in upgrading to Hibernate 3.6.0


  • SELECT COUNT returns Long instead of Integer as of Hibernate 3.2, to align with JPA spec. See "Changed aggregation (count, sum, avg) function return types" within Hibernate Guide for info (it also shows how you can override the new behavior with Hibernate 3.1 behavior, but this is not recommended unless an appropriate amount of time is available).
  • org.hibernate.AssertionFailure: no collection snapshot for orphan delete ... Not 100% clear on the cause of this one. Research indicated something to do with a collection being mutable. This appears to happen when performing a HibernateCallback via Hibernate's execute() method. It looks like Hibernate tries to flush the cache during the execute() call, which is where this mutable topic gets thrown in. For the place where this was initially found it was for a SELECT, so when comparing with a HibernateCallback that did work, we found that changing the execute() to an executeFind() resolved the issue (assuming that execute() had to presume you were going to write data, and executeFind is read-only so a flush is not necessary). We tried to identify the root cause, but none of our hypotheses held up. More understanding on this issue would be good. See the fix as below:    
     BEFORE FIX:    
    List<Measurement> mesurementList = (List<Measurement>) getIAFHibernateTemplate().execute(hibernateCallback);


    AFTER FIX:
    List<Measurement> mesurementList = (List<Measurement>) getIAFHibernateTemplate().executeFind(hibernateCallback);

  • The object alias is also part of the return scalar array. The read order of the scalar array that is returned should account for this object aliases position. Refer to the fix below:    
    BEFORE FIX:    
    equipment = (Equipment)equipmentsDataArr[5];            
    equipment.setActiveInd((String)equipmentsDataArr[0]);        
    equipment.setDeviceTypeId((Integer)equipmentsDataArr[1]);    
    equipment.setDeviceNumber((String)equipmentsDataArr[2]);    
    equipment.setOrganizationAccountId((Integer)equipmentsDataArr[3]);    
    equipment.setLicenseTypeId((Integer)equipmentsDataArr[4]);

    AFTER FIX:
   equipment = (Equipment)equipmentsDataArr[0];    
    equipment.setActiveInd((String)equipmentsDataArr[1]);    
    equipment.setDeviceTypeId((Integer)equipmentsDataArr[2]);    
    equipment.setDeviceNumber((String)equipmentsDataArr[3]);    
    equipment.setOrganizationAccountId((Integer)equipmentsDataArr[4]);
    equipment.setLicenseTypeId((Integer)equipmentsDataArr[5]);


  • java.lang.ClassCastException: java.lang.String incompatible with java.lang.Character. For Hibernate projections when restrictions are added if the database column is of CHAR type then for adding the restrictions, we need to denote the CHAR in Java instead of String. Refer to the fix below:
    BEFORE FIX:
    criteria.add(Restrictions.eq("deleted", "N"));
    criteria.add(Restrictions.eq("display", "Y"));  

    AFTER FIX:
    criteria.add(Restrictions.eq("deleted", 'N'));
    criteria.add(Restrictions.eq("display", 'Y'));
  • There were issues with lazy loading getting org.hibernate.LazyInitializationException: failed to lazily initialize a collection of role: no session or session was closed ... Performance impact would have to be analyzed for each case.

Sunday, December 16, 2012

Key Generation using ERACOM and Keytool


Java Keystore (keytool) setup

1       Introduction

A keystore is a password-protected file which stores the keys and certificates. The keytool application can import, export and list the contents of a keystore. The keytool can also be used to generate self-signed certificates for test purposes.
Following are the keytool command attributes:
§  -genkey (Java 1.5) or –genkeypair (Java 1.6): This flag generates a key pair (a public key and associated private key). Wraps the public key into an X.509 v3 self-signed certificate, which is stored as a single-element certificate chain. This certificate chain and the private key are stored in a new keystore entry identified by alias.
§  -genseckey (Java 1.6): The –genseckey flag generates a secret key and stores it in a new entry identified by the name specified in the –alias flag.
§  -keyalg: keyalg specifies the algorithm to be used to generate the key. The default value of –keyalg when –genkey / –genkeypair flag is set, is “DSA”, while the value is “DES” when –genseckey flag is set.
NOTE: AES\DES algorithms are not available in Java 1.5.0
§  -keysize: The –keysize specifies the size of each key to be generated. By default the value is 1024 (when using –genkey / –genkeypair), 56 (when using -genseckey and -keyalg is "DES") and 168 (when using -genseckey and -keyalg is "DESede").
§  -alias: The –alias flag refers to a particular entity in the keystore.
§  –validity: Certificates generated by the system are valid for just under 90 days by default. The flag –validity allows to change the length of validity for a certificate to n days.
§  -keystore: The keytool uses as default a keystore file ".keystore" located in the user’s home directory. To use another keystore file use the -keystore flag.
§  -storetype:  The flag specifies the key store type that should be used. Below are the supported keystore types:

Sr
Store Type
Description
1
JKS
Java KeyStore. Oracle's KeyStore format. It is the Default keystore type.
2
JCEKS
Java Cryptography Extension KeyStore. More secure version of JKS.
3
PKCS12
Public-Key Cryptography Standards #12 KeyStore. RSA's KeyStore format.
4
PKCS12S2
It is a second version of PKCS12 type keystore.
5
JCERACFKS
Java Cryptography Extension RACF KeyStore. It is a RACF (Resource Access Control Facility) keying keystore and is available only on z/OS systems with RACF installed.

§  -list: The –list flag is used to list the content of the keystore.
§  -delete: The flag is used to delete the keystore the entry identified by alias flag. The user is prompted for the alias, if no alias is provided at the command line.

2       Creating a RSA Key

RSA is a public-key cryptography based on factoring large integers. Both DSA and RSA algorithms can be used to generate key-pairs using –genkey or   -genkeypair. The commands to generate RSA Keys in the Java Keystore are as follows:
·         Generate an RSA keypair: 

keytool -genkey -alias RSAKey -keyalg RSA -validity 365 -keystore keystore/msmkeystore.jks

·         Enter keystore password: mysecret

What is your first and last name?

      [Unknown]: www.mobilefish.com

What is the name of your organizational unit?

      [Unknown]:Research and Development

What is the name of your organization?

      [Unknown]: Mobilefish.com

What is the name of your City or Locality?

      [Unknown]: Zaandam

What is the name of your State or Province?

      [Unknown]: Noord-Holland

What is the two-letter country code for this unit?

      [Unknown]: NL

Is CN=www.mobilefish.com, OU=Research and Development, O=Mobilefish.com, L=Zaandam, ST=Noord-Holland, C=NL correct?

      [no]: y



Enter key password for <RSAKey>

         (RETURN if same as keystore password):

·         To view the fingerprints of certificates in the keystore, type:

keytool -list -keystore keystore/msmkeystore.jks

·         To view the personal information about the issuer and owner of the certificate, type:

keytool -list -v -keystore keystore/msmkeystore.jks

·         To remove entries from the keystore, enter the following command:

keytool -keystore keystore/msmkeystore.jks -delete –alias RSAKey



3       Creating a AES (or DES) Key (Java 1.6.0)
AES is a symmetric key algorithm; hence we prefer to generate a single secret key using the flag –genseckey. Further, the default Keystore type is JKS for the Java keytool which cannot store symmetric keys. Hence the keystore type is changed to JCEKS for AES encryption algorithm. The commands to generate AES Keys in the Java Keystore are as follows:
·         Generate the secret key (Using AES as encryption algorithm): 

 keytool -genseckey -alias AESKey -keyalg AES -keysize 128  -validity 365 -keystore keystore/msmkeystore.jks -storetype JCEKS

·         To view the fingerprints of certificates in the keystore, type:

keytool -list -storetype JCEKS

NOTE: AES keysize should be 128 bit in order for the SunJCE / IBMJCE Provider to initialize the key.

4       Mobile Security Properties
  •          JCEProvider.KeyStoreProvider: Name of the Provider for cryptology operations. Eg: SunJCEProvider or IBMJCE.
  •          JCEProvider.KeyStoreFile: Path to JKS Keystore
  •          JCEProvider.KeyStorePIN: PIN for JKS Keystore
  •          JCEProvider.TokenPIN: PIN for Key Token in JKS Keystore
  •          JCEProvider.CryptoKey.Alias: Alias Name for Key in JKS Keystore
  •          JCEProvider.CryptoKey.CipherAlgorithm: Algorithm for generating the Alias Key
  •          JCEProvider.CryptoKey.CipherMode: Mode to be used during encryption or decryption using the Alias Key
  •          JCEProvider.CryptoKey.CipherPadding: Padding type to be used during encryption or decryption using the Alias Key


5       Notes


1         Symmetric cannot be generated using keytool for Java 1.5.0.
2         The keystore generated from the keytool of Java 1.6.0 cannot be used for the application running Java 1.5.0 (and vice versa).
3         Sun and IBM JCE Providers supports PKCS5Padding for AES but support PKCS1Padding for RSA.
4         The IBM Websphere JDK does not comes with JCE Provider or any other SUN Provider, but is substituted by IBMJCE Provider and other IBM versions of the providers (IBMPKCS11, IBMJSSEProvider etc). Adding SUNJCE Providers to IBM Websphere application library may cause java.lang.UnsatisfiedLinkError: sun/misc/Unsafe.registerNatives().
5         IBM JCE Provider (and probably others) does not allow private keys for encryption and public key for decryption.
6         IBM JCE Provider used with RSA algorithm (PKCS1Padding and ECB mode) and public key used for encryption while private key used for decryption gives the following exception: javax.crypto.BadPaddingException: Not PKCS#1 block type 2 or Zero padding
7         Path for the msmkeystore.jks can be specified as (classes\\certs\\msmkeystore.jks) for Windows but should be specified as (classes/certs/msmkeystore.jks) for Linux in ‘300-mobile-security.properties’.



LINUX: ERACOM Hardware Security Module setup

1       Installation

1         Install PTKC Runtime and PTKC SDK Package from “Eracom 3.3” Folder.
2         Install PTKJ Runtime and PTKJ SDK Package from “PTKJ3.06” Folder.
3         Install an additional package called PCI HSM-Provider from “\Eracom 3.3\pci_hsm_access_provider” Folder if using a real HSM board.
Sample installation commands:
Sample installation commands:
mount -o loop 007553-003MI_ptkc.iso /mnt/eracom
./safeNet-install.sh
chmod a+x eracom-install.sh
./eracom-install.sh

NOTE: IT IS IMPORTANT THAT THE ERACOM SOFTWARE SHOULD BE INSTALLED WITH THE SAME USER PERMISSIONS AS THE WEBSPHERE USER IN THE LINUX BOX.

2       HSM Slot and Key Creation Commands:
1)      ctconf –c3
Create a new User slot (Creates slot 3).
2)      ctstat
Show the status of the Tokens and the Objects in the Protect Toolkit.
3)      ctconf –n3 
Initialize the token in the specified slot (Here Slot 3).
4)      ctkmu p –s3 
     Initialize the User PIN or to change an existing PIN (either the User or SO PIN). If the specified slot (Here Slot 3) contains a token without an initialized user PIN this command will prompt for the current SO PIN and then for the new User PIN (123456) . If the PIN is initialized the current PIN will be prompted for before the new PIN may be specified.
5)      ctkmu c –s3 -t aes -z128 -n AESTestKey03 –aWUxED
(c) Create a (–aWUxED) CKA_WRAP, CKA_UNWRAP, CKA_EXPORTABLE, CKA_ENCRYPT and CKA_DECRYPT Key of (-t) type “aes” and (-z) size “128”, with the (-n) name “AESTestKey03” in (-s3) slot number 3.

AVAILABLE KEYS:

Single Key Types
Key Pair Types
DES
RSA (Public)
Double DES
RSA (Private)
Triple DES
DSA (Public)
AES  (16, 24, or 36 bytes)
DSA (Private)
IDEA
DH (Public)
CAST128 (1 to 16 bytes)
DH (Private)
RC2 (1 to 128 bytes)
EC (Public)
RC4 (1 to 256 bytes)
EC (Private)
SEED


6)      ctkmu l –s3 
List the keys or the objects stored on the token in the specified slot.  Lists actual keys and certificates in the specified slot.
7)      ctkmu c –s3 -t aes –k 2 -z128 -n WrapKey –aWUxED
(c) Create a (–aWUxED) CKA_WRAP, CKA_UNWRAP, CKA_EXPORTABLE, CKA_ENCRYPT and CKA_DECRYPT Key of (-t) type “aes”, (-k) –num-comp and (-z) size “128”, with the (-n) name “AESTestKey03” in (-s3) slot number 3.
Note: -K option corresponds to the number of key components required to be entered or number to be generated (when –g parameter is specified).

3       Copy Jar and Lib files to JRE:
Here ““/opt/” is considered as the installed location of Websphere and Eracom PTKC and PTKJ SDK. Please change the path depending on the path of installation on the System.
1)      Copy jprov.jar from “/opt/ERACjprov/lib” to “/opt/WebSphere/70/java/jre/lib/ext”
2)      Copy jcprov.jar from “/opt/PTK/lib” to “/opt/WebSphere/70/java/jre/lib/ext”
3)      Copy libcryptoki.so from “/opt/PTK/lib” to “/opt/WebSphere/70/java/jre/lib/ext”
4)      Copy libjcprov.so from “/opt/PTK/lib” to “/opt/WebSphere/70/java/jre/lib/ext”
5)      Copy libjcryptoki.so from “/opt/PTK/lib” to “/opt/WebSphere/70/java/jre/lib/ext”
6)      Sample Commands:
cd /opt/PTK/lib
cp -i jcprov.jar  /opt/WebSphere/70/java/jre/lib/ext
cp  libcryptoki.so  /opt/WebSphere/70/java/jre/lib/ext
cp -i libjcryptoki.so  /opt/WebSphere/70/java/jre/lib/ext
cp -i libjcprov.so  /opt/WebSphere/70/java/jre/lib/ext

NOTE: FOLLOWING FILES ARE SPECIFICALLY FOR HSM SW SIMULATOR. YOU MAY NEED TO COPY ADDITIONAL FILES FOR THE ACTUAL HSM BOARD. PLEASE REFER THE MANUAL FOR DETAILS.

4       Add EracomProvider Entry to Java Security:
1)      Go to Websphere JRE and find the java.security file under {WEBSPHERE_HOME}/70/java/jre/lib/security Folder
2)      Add the following entry to the java.security file under “List of providers” section  security.provider.13=au.com.eracom.crypto.provider.slot0.ERACOMProvider
Note: Please change the number of the provider from “13” to the next number from the last provider in the list.

5       Setting up the Library Path:
Please add the following shell commands to the Server Startup Script
LD_LIBRARY_PATH=/opt/PTK/lib:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH
PATH=/opt/PTK/bin:$PATH
export PATH
Here “/opt/PTK/lib” and “/opt/PTK/bin” paths depends on the path of installation of PTKC SDK Package on the System.
For Example the Server Startup Script is located in /opt/wasapps/was/70/profiles/NFCSrv01/bin/startServer.sh
Add the above lines in Blue at the start of such startServer.sh script

NOTE: RESTART THE WEBSPHERE TO SET THE LIBRARY PATH


WINDOWS: ERACOM Hardware Security Module setup
1       Creation of Slots:
  1. Go to gCtAdmin (SW) and open Adapter Management.
  2. Then select File -> Create Slots
  3. Enter the number of slots to be created, the slots are created and it will go to login again.
  4. Then go in Edit -> Tokens, select the Slot which is uninitialized token.
  5. Press “Initialise”, enter the Token label, Security Officer and User Pin.
  6. The user pin is used to access the slot for encryption and decryption in MSM.
  7. Click “Done” to finish. Hence Slot is created and initialized successfully.


2       Creation of Secret Keys:
  1. Go to KMU(SW) to open Key Management Utility. And select the token with the label name specified before.
  2. In case if it shows an exception go to step 3 to initialize the token using CRYPTOKI Utility.
  3. Else enter the User PIN.
  4. From the top menu tool bar, select “Secret Key” for AES Key or Key Pair for Public-Private Key.
  5. For “Secret Key”, select the Mechanism as "AES", Label Name (Key Alias), Key Size (128) and make sure to select "Encrypt", "Decrypt" options to enable it for encryption and decryption.
  6. Press Ok to successfully create the Eracom Secret Key.


3       Testing the Keys:
  1. Go to Browser (SW) i.e. CRYPTOKI Token Browser
  2. Skim through the branches to find the slot created earlier.
  3. Right Click on the slot label and select “init token”, and say Ok (for all keys and PIN setup for the slot are erased).
  4. To find all the keys in the slot, double click on the "Objects" label and all the keys under the slot will appear.
  5. We can copy or delete the key and use it for Encryption and Decryption.