22
Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. Inside Spring Batch Dave Syer, VMware, JAX London 2011

Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Embed Size (px)

DESCRIPTION

2011-10-31 | 01:30 PM - 02:15 PMSpring Batch has a large user base and a good track record in production systems, but what is it all really about, and why does it work? This presentation provides a short bootstrap to get a new user started with the Batch domain, showing the key concepts and explaining the benefits of the framework. Then it goes into a deeper dive and looks at what holds it all together, with a close look at some of the most important but least understood features, including restart, retry and transactions.

Citation preview

Page 1: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited.

Inside Spring Batch

Dave Syer, VMware, JAX London 2011

Page 2: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 2

Overview

• Very quick start with Spring Batch• Spring Batch Admin• State management – thread isolation• Retry and skip• Restart and transactions

Page 3: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 3

Processing the Same File Twice…

Page 4: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 4

Spring Batch

Application

Batch Core

Batch Infrastructure

Re-usable low level stuff: flat files, XML files, database keys

Quality of service, auditability, management information

Business logic

Page 5: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 5

Spring Batch Admin

Application

Batch Core

Batch Infrastructure

Runtime services(JSON and Java) plus optional UI

Page 6: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 6

Simple Sequential Sample

<job id="job"> <step id="businessStep"> <tasklet> <chunk reader="itemGenerator" writer="itemWriter" commit-interval="1"/> </tasklet> <next on="FAILED" to="recoveryStep"/> <end on="*"/> </step> <step id="businessStep"> <tasklet ref="reportFailure" /> </step>

</job>

Page 7: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 7

Item-Oriented Processing

• Input-output can be grouped together = Item-Oriented Processing

ItemReader

read()

ExitStatus

ItemWriter

write(items)

Step

item

execute()

repeat, retry, etc.

Page 8: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 8

Job Configuration and Execution

Job

JobInstance

JobExecution

*

*

The EndOfDay Job

The EndOfDay Jobfor 2007/05/05

The first attempt at EndOfDay Job for 2007/05/05

JobParameters

schedule.date = 2007/05/05

Page 9: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 9

State Management

• Isolation – thread safety• Retry and skip• Restart

Page 10: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 10

Thread Isolation: StepScope

<bean class="org.sfw...FlatFileItemWriter" scope=“step”> <property name=“resource">

<value> /data/#{jobName}-{#stepName}.csv

</value></property>

</bean>

File writer needs to be step scoped so it can flush and close the output stream

Because it is step scoped the writer has access to theStepContext and can replace these patterns with runtime values

Page 11: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 11

Step Scope Responsibilities

• Create beans for the duration of a step• Respect Spring bean lifecycle metadata (e.g.

InitializingBean at start of step, DisposableBean at end of step)

• Recognise StepScopeAware components and inject the StepContext

• Allows stateful components in a multithreaded environment

• Well-known internal services recognised automatically

Page 12: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 12

Quality of Service

• Stuff happens: – Item fails– Job fails

• Failures can be– Transient – try again and see if you succeed– Skippable – ignore it and maybe come back to it later– Fatal – need manual intervention

• Mark a job execution as FAILED• When it restarts, pick up where you left off• All framework concerns: not business logic

Page 13: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 13

Quality of Service Sample

<step id="step1"> <tasklet> <chunk reader="itemGenerator" writer="itemWriter" commit-interval="1" retry-limit="3" skip-limit="10"> ... </chunk> </tasklet></step>

Page 14: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 14

Retry and the Transaction

REPEAT(while more input) {chunk = ACCUMULATE(size=500) {

input;}RETRY {

TX { for (item : chunk) { process; }write;

}}

}

Chunk

Provider

Chunk

Processor

Page 15: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 15

Retry and Skip: Failed Processor

RETRY(up to n times) {TX {

for (item : chunk) { process; }write;

}} RECOVER {

TX { for (item : successful) { process; }write;skip(item);

}}

}

Skip is just an exhausted retry

Page 16: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 16

Flushing: ItemWriter

public class RewardWriter implements ItemWriter<Reward> {

public void write(List<Reward> rewards) {// do stuff to output Reward records// and flush changes here…}

}

public class RewardWriter implements ItemWriter<Reward> {

public void write(List<Reward> rewards) {// do stuff to output Reward records// and flush changes here…}

}

Page 17: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 17

Retry and Skip: Failed Write

RETRY(up to n times) {TX {

for (item : chunk) { process; }write;

}} RECOVER {

for (item : chunk) {TX {

process;write;

} CATCH {skip(item);

}}

}

Scanning for failed item

Page 18: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 18

Restart and Partial Failure

• Store state to enable restart• What happens to the business data on error?• What happens to the restart data?• Goal: they all need to rollback together

Page 19: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 19

Partial Failure: Piggyback the Business Transaction

JOB {STEP {

REPEAT(while more input) {TX {

REPEAT(size=500) {input; output;

}FLUSH and UPDATE;

}}

}}

Inside business transaction

Persist context data for next execution

Database

Page 20: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 20

ItemStream

ItemStream

open(executionContext)

save(executionContext)

close()

Called before commit

Step JobRepository

ExitStatus

execute()

repeat, retry, etc.

update(executionContext)

Page 21: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited. 21

Overview

• Very quick start with Spring Batch• Spring Batch Admin• State management – thread isolation• Retry and skip• Restart and transactions

Page 22: Spring Day | Behind the Scenes at Spring Batch | Dave Syer

Copyright 2007 SpringSource. Copying, publishing or distributing without express written permission is prohibited.

Q&A