This post is part of a series delving into the details of the JSR-352 (Java Batch) specification. Each post examines a very specific part of the specification and looks at how it works and how you might use it in a real batch application.To start at the beginning, follow the link to the first post.
The next post in the series is
here.
This series is also available as a podcast on iTunes, Google Play, Stitcher, or use the link to the RSS feed.
-----
The issue can be found here.
When a call is made to set the persistent user data, is a copy of the user data made or just the reference kept? That’s the question raised by this issue. What’s the difference?
Let’s suppose we have some serializable class we defined to hold our user data. We’re in the Item Reader and we set some fields in an instance of this object. The reader calls the API to set the persistent user data.
Suppose this instance of our user data object is accessible to our Item Processor. If the processor changes something inside the object data, what happens? If the user data object reference was held by the batch container, then updating the data inside the object updates the persistent user data. When we reach the end of the chunk and commit the chunk transaction, whatever happens to be in the user data object at the time will be persisted.
But what if the batch container makes a copy of the user data? Then the update made by the processor won’t have any effect on the copy of the data kept by the container and only the initial values set by the reader will be persisted.
Of course you can get consistent behavior by just calling the Set API every time you make a change to the user data. But that’s rather clunky, especially if it isn’t necessary. In fact, it might be handy to just have to set the user data object once at the start of the step and let the batch container just persist whatever happens to be in there at the end of each chunk. Then the batch application can just update it as needed without having to worry about remembering to Set it.
The problem raised by this issue is that the specification doesn’t actually say which behavior is required. This leaves implementations free to do either thing. Which essentially requires you to assume a copy is being made and call Set every time you change the data just to be safe. If the implementation you develop with decided to keep a reference and you depend on that, your application might not behave correctly when run on a different implementation. And that was certainly NOT the goal of having a specification!
The point here is that a decision should be made as to which behavior should be in the specification and TCK tests should be written to verify it.
Have an opinion? Follow the link above and comment on the issue!