Spring Batch frame work, a collaborative effort from Accenture and SpringSource, is a lightweight, comprehensive framework that facilitates the development of batch applications that helps the day to day activities of enterprise systems. Batch application or processing refers to automated offline systems that performs bulk data processing, periodic updates and delegated processing.
Examples include loading csv file data to database, process feed file once received and push daily transactions to the upstream or downstream systems.
CommandLineJobRunner is one of the ways to bootstrap your Spring batch Job. The xml script launching the job needs a Java class main method as as entry point and CommandLineJobRunner helps you to start your job directly using the XML script.
The CommandLineJobRunner performs 4 tasks:
Load the appropriate ApplicationContext.
Parse command line arguments into JobParameters.
Locate the appropriate job based on arguments.
Use the JobLauncher provided in the application context to launch the job..
The CommandLineJobRunner arguments are jobPath, the location of the XML file that will be used to create an ApplicationContext and the jobName, the name of the job to be run.
bash$ java CommandLineJobRunner DailyJobConfig.xml processDailyJob
These arguments must be passed in with the path first and the name second. All arguments after these are considered to be JobParameters and must be in the format of 'name=value'.
Parallel processing enables multiple batch runs jobs to run in parallel to reduce the total elapsed batch processing time. Parallel processing is simpler as long as the same file or database table is not shared among the processes otherwise the processes should process partitioned data.
Another approach would be using a control table for maintaining interdependencies and to track each shared resource in use by any process or not.
Other key issues in parallel processing include load balancing and the availability of general system resources such as files, database buffer pools etc. Also note that the control table itself can easily become a critical resource.
There are many implementations including the ones that allow read and write operations on,
Partitioning faciliates multiple large batch applications to run concurrently that minimize the elapsed time required to process long batch jobs. Processes which can be successfully partitioned are those where the input file can be split and/or the main database tables partitioned to allow the application to run against different sets of data.
Processes which are partitioned must be designed to only process their assigned data set.
The item mapping bean can implement org.springframework.batch.item. ItemCountAware, a marker interface to have the item position tracked.
A JobInstance represents the concept of a logical job run.
Use a scheduling tool such as Quartz, Control-M or Autosys. Quartz islight weight, doesn't have all the features of Control-M or Autosys. Even the OS based Task scheduler, CRON jobs could be used to schedule Spring batch jobs.
There are 3 required dependencies:
and one or more steps.
The Spring Batch Meta-Data tables are used to persist batch domain objects such as JobInstance, JobExecution, JobParameters, and StepExecution for internally managing the Batch Jobs.
The JobRepository is responsible for saving and storing each Java object into its correct table
Cron is a daemon that executes scheduled commands. Cron is started automatically from /etc/init.d on entering multi-user runlevels. Cron searches its spool area (/var/spool/cron/crontabs) for crontab files (which are named after accounts in /etc/passwd); crontabs found are loaded into memory.
Cron wakes up every minute, examining all stored crontabs, checking each command to see if it should be run in the current minute. When executing commands, any output is mailed to the owner of the crontab (or to the user named in the MAILTO environment variable in the crontab, if such exists).
ItemReader is an abstraction that represents the retrieval of input for a Step, one item/row/record at a time. When the ItemReader has exhausted the items it can provide, it will indicate this by returning null.
Concurrent/on-line batch processing refers to the batch process that handles data being concurrently used/updated by online users so the data cannot be locked in database or file as the online users will need it. Also the data updates should be commited frequently at the end of few transactions to minimize the portion of data that is unavailable to other processes and the elapsed time the data is unavailable.
It is SERIALIZABLE by default to prevent the same job instance being executed concurrently.
Reading large number of records from a database, file, queue or any other medium, process it and store the processed records into medium, for example, database.
You can synchronize the read() method. Remember that you will lose restartability, so best practice is to mark the step as not restartable and to be safe (and efficient) you can also set saveState=false on the reader.
A Job Launcher can be used to execute a Spring Batch Job. Also a batch job can be launched/scheduled using a web container as well.
Execution of a job is termed as Job Instance. Each Job Instance is provided with an execution id which can be used to restart the job if required.
Job can be configured with parameters which is passed to it from the Job Launcher.
A Job in Spring Batch contains a sequence of one or more Steps. Each Step can be configured with the list of parameters/attribute required to execute each step.
next : next step to execute
tasklet: task or chunk to execute. A chunk can be configured with a Item Reader, Item Processor and Item Writer.
decision : Decide which steps need to executed.
ItemProcessor is an abstraction that represents the business processing of an item. While the ItemReader reads one item, and the ItemWriter writes them, the ItemProcessor provides access to transform or apply other business processing. If, while processing the item, it is determined that the item is not valid, returning null indicates that the item should not be written out.
A Job is an entity that encapsulates an entire batch process.
Job will be wired together using a XML configuration file or Java based configuration. This configuration is also referred as "job configuration".
A Job is simply a container for Steps and it combines multiple steps that runs logically together in a flow.
Restorability: Restart a batch program from where it failed.
Different Readers and Writers : Provides great support to read from text files, csv, JMS, JDBC, Hibernate, iBatis etc. It can write to JMS, JDBC, Hibernate, files and many more.
Chunk Processing : If we have 1 Million records to process, these can be processed in configurable chunks (1000 at a time or 10000 at a time).
Easy to implement proper transaction management even when using chunk processing.
Easy to implement parallel processing. With simple configuration, different steps can be run in parallel.
A cron job is a Linux command for scheduling script on your server to execute repetitive tasks automatically. Scripts executed as a cron job are typically used to modify files, databases and manage caching.
A Step is a domain object that encapsulates an independent and sequential phase of a batch job while a StepExecution represents a single attempt to execute a step.
Normal processing during offline.
Concurrent batch or online processing.
Parallel processing of many different batch or jobs at the same time.
Partitioning (processing of many instances of the same job at the same time).
An ExecutionContext represents a collection of key/value pairs that are persisted and controlled by the framework in order to provide the developers a placeholder to store persistent state that is scoped to a StepExecution or JobExecution.
Database-driven applications are driven by rows or values received from the database.
File-driven applications are driven by records or values retrieved from a file.
Message-driven applications are driven by messages retrieved from a message queue.
The available latest version is 3.0.7.
Normal processing refers to the batch processes that runs in a separate batch window, the data being updated is not required by on-line users or other batch processes, where concurrency would not be a concern and a single commit can be done at the end of the batch run.
Single commit point may be a concern in terms of scaiability and volume of data it could handle, it is always a good practice to have restart recovery options .
Spring Batch exhibit a layered architecture and it comprises of three major high level components:
Application, Core and Infrastructure.
The application layer contains all the batch job configurations, custom codes for business logic and job meta information developed by Application developers.
The Batch Core has the core runtime classes necessary to launch and control any batch job. Some of the core runtime classes include JobLauncher, Job, and Step implementations.
The infrastructure contains API for common readers and writers, and services for retrying on failure, repeat jobs etc. The infrastructure layer are used both by application developers(ItemReader and ItemWriter) and the core framework itself for controlling the batch job such as Retry, repeat. Thus Batch Core and Application layers are built on top of Infrastructure layer.
The JobRepository is used for basic CRUD operations of the various persisted domain objects within Spring Batch, such as JobExecution and StepExecution. It is required by many of the major framework features, such as the JobLauncher, Job, and Step.
Step scope- there is only one instance of such a bean per executing step.
<bean id="..." class="..." scope="step">
Job scope- there is only one instance of such a bean per executing Job.
<bean id="..." class="..." scope="job">
No. There must exists at least one step or flow or split configuration within a Spring Batch job.
Usually The Java batch Job main class and its dependencies are passed to the java command and it is stored in a command line Batch file or shell script in terms of linux/unix.
These script file can be run using scheduler like Autosys at the Production environment.
Batch framework leverages Spring programming model thus allows developers to concentrate on the business logic or the business procedure and framework facilitates the infrastructure.
Clear separation of concerns between the infrastructure, the batch execution environment, the batch application and the different steps/proceses within a batch application.
Provides common scenario based, core execution services as interfaces that the applications can implement and in addition to that framework provides its default implementation that the developers could use or partially override based on their business logic.
Easily configurable and extendable services across different layers.
Provides a simple deployment model built using Maven.
The job configuration contains,
Spring 3 enables the ability to configure applications using java instead of XML and from Spring Batch 2.2.0, batch jobs can be configured using the same java config.
There are 2 components for the java based configuration:
the @EnableBatchConfiguration annotation and two builders.
@EnableBatchProcessing provides a base configuration for building batch jobs.
The core interface for this configuration is the BatchConfigurer. The default implementation provides the beans to be autowired such as JobRepository, JobLauncher.
ResourceAware is a marker interface which will set the current resource on any item that implement this interface.
The Tasklet is an interface which performs any single task such as setup resource, running a sql update, cleaning up resources etc.
ItemWriter is an abstraction that represents the output of a Step, one batch or chunk of items at a time. Generally, an item writer has no knowledge of the input it will receive next, only the item that was passed in its current invocation.
JobLauncher represents a simple interface for launching a Job with a given set of JobParameters.
Spring Batch and Quartz have different features and responsibility. Spring Batch provides functionality for processing large volumes of data while Quartz provides functionality for scheduling tasks. Thus Quartz could complement Spring Batch and a common combination would be to use Quartz as a trigger for a Spring Batch job using a Cron expression.
An ExecutionContext represents a collection of key-value pairs that are persisted and controlled by the framework in order to allow developers a place to store persistent state that is scoped to a StepExecution or JobExecution.