1) Introduction

Over the past 10 years, numerous persistence frameworks have been developed for Java. Each of these frameworks have lead to various implementations, each of which have addressed particular issues well, but this stew of activity has resulted in many different, and divergent paths to solve what is a common problem, persisting data to back-end storage.

The Java Persistence API (JPA) attempts to standardize this multiplicity.

Implementations of the JPA standard are already available from a variety of providers, including Sun (Glassfish JPA), BEA (Kodo), Oracle (TopLink), RedHat/JBoss (Hibernate EntityManager), IBM Websphere 6.1 (EJB3 service pack since December 2007), Tangasol Coherence (Cache Store), OpenJPA, and JPOX, to name a few.

This blog entry is the first in a series covering the basics of JPA, ready for the next entries which will cover implementation of JPA outside a JEE container, and the final entry caching and database HA with the JPA.

2) Key Concepts

There are several key concepts to cover when using JPA to persist and retrieve data. This section briefly discussed each in turn.

Skip this blog entry if you know your JPA already.

All of the concepts identified here can be read up on in detail in EJB3 in Action chapters 6 to 11, and in extreme detail in the whole of Java Persistence with Hibernate.

The Entity Manager

A Java application using the JPA does not interact with data storage directly via JDBC connections instead the main interface any layer above the JPA uses is the EntityManager, as defined at http://java.sun.com/javaee/5/docs/api/javax/persistence/EntityManager.html.

The EntityManager interface is used to support Create, Retrieve, Update, and Delete (CRUD) data actions and custom query actions on object data.

Probably the clearest description of the purpose of the methods on an EntityManager instance is in Chapter 9 Manipulating Entities with EntityManager of Java Persistence with Hibernate.

Here, we’ll use example code to clarify the concepts this reference discusses.

Here is example usage of an (out-of-container, namely outside JBoss) EntityManager instance. Here an EntityManager is used to save the details of a Book:

        void addBook(Book a_Book) throws MyException
    {
        EntityTransaction currentTrans = null;
        final EntityManager em = super.getEntityManager();
        try
        {
            currentTrans = em.getTransaction();
            currentTrans.begin();
            em.persist(a_Book);
            currentTrans.commit();
        }
        catch (Exception anyE)
        {
            if (currentTrans.isActive())
            {
                currentTrans.rollback();
            }
            throw new MyException(errorMessage, anyE);
        }
    }
Within the example, the call to the method persist is used when adding new data via the EntityManager instance.

Which EntityManager method is applicable for what purpose needs to be explained by reference to the life-cycle of a typical JEE Entity. This is identified in the next section.

Note: For JavaDoc for the persistence API look under the package javax.persistence at http://java.sun.com/javaee/5/docs/api/ .

Entities

An application views or traverses data in data storage indirectly, via abstractions known as JEE Entities.

Annotations

JEE Entities and the relationships between them are defined by annotations applied within Java classes, and these Entities are used to map to the relational schema of the data storage.

From the example code in the previous section, the book entity (Book.java) is defined by the following code, which exhibits common annotations which you will see repeated throughout the entity definitions within the example application:

package com.whatever.example.dao.entities;

import com.whatever.dao.UniqueIDGenerator;
import java.util.Date;
import java.util.HashSet;
import java.util.Set;
import javax.persistence.CascadeType;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.FetchType;
import javax.persistence.Id;
import javax.persistence.JoinColumn;
import javax.persistence.ManyToOne;
import javax.persistence.OneToMany;
import javax.persistence.Table;

/**
* Book Entity
*
* @author jball
*/
@Entity
@Table(name = "book", schema = "public")
public class Book implements java.io.Serializable
{
    /** Size of DATE field */
    private static final int SIZE_OF_DATE_FIELD = 29;
    /** Size of BIG field */
    private static final int SIZE_OF_BIG_FIELD = 2000;
    /** Book UUID */
    private String m_Bookuuid;
    /** Category */
    private Category m_Category;
    /** Library */
    private Library m_Library;
    /** title */
    private String m_Title;
    /** author */
    private String m_Author;
    /** published */
    private Date m_Published;
    /** description */
    private String m_Description;
    /** on loan records */
    private Set<OnLoan> m_OnLoans = new HashSet<OnLoan>(0);

    /**
     * Default empty constructor
     */
    public Book()
    {
    }
    /**
     * Kitchen sink constructor. Note the entity makes its own
     * identifier which is a distributed storage safe UUID.
     *
     * @param a_Title title
     * @param a_Author author
     * @param a_Published published date
     * @param a_Description description
     * @param a_Category category
     * @param a_Library library
     */
    public Book(String a_Title,
                String a_Author,
                Date a_Published,
                String a_Description,
                Category a_Category,
                Library a_Library)
    {
        this.m_Bookuuid    = UniqueIDGenerator.getID();
        this.m_Title       = a_Title;
        this.m_Author      = a_Author;
        this.m_Published   = a_Published;
        this.m_Description = a_Description;
        this.m_Category    = a_Category;
        this.m_Library     = a_Library;
    }
    /**
     * Get the PK
     *
     * @return PK
     */
    @Id
    @Column(name = "bookuuid", unique = true, nullable = false)
    public String getBookuuid()
    {
        return this.m_Bookuuid;
    }
    /**
     * Set the PK
     *
     * @param a_Bookuuid PK
     */
    public void setBookuuid(String a_Bookuuid)
    {
        this.m_Bookuuid = a_Bookuuid;
    }
    /**
     * Get the book category
     *
     * @return Book Category
     */
    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "categoryname", nullable = false)
    public Category getCategory()
    {
        return this.m_Category;
    }
    /**
     * Set the Book Category
     *
     * @param a_Category Book Category
     */
    public void setCategory(Category a_Category)
    {
        this.m_Category = a_Category;
    }
    /**
     * Get the book's library
     *
     * @return Library that holds book
     */
    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "libraryuuid", nullable = false)
    public Library getLibrary()
    {
        return this.m_Library;
    }
    /**
     * Set the book's library
     *
     * @param a_Library Library
     */
    public void setLibrary(Library a_Library)
    {
        this.m_Library = a_Library;
    }
    /**
     * Get the book title
     *
     * @return Book title
     */
    @Column(name = "title", nullable = false)
    public String getTitle()
    {
        return this.m_Title;
    }
    /**
     * Set the book title
     *
     * @param a_Title book title
     */
    public void setTitle(String a_Title)
    {
        this.m_Title = a_Title;
    }
    /**
     * Get the book author
     *
     * @return book author
     */
    @Column(name = "author", nullable = false)
    public String getAuthor()
    {
        return this.m_Author;
    }
    /**
     * Set the book author
     *
     * @param a_Author Book Author
     */
    public void setAuthor(String a_Author)
    {
        this.m_Author = a_Author;
    }
    /**
     * Get the published date
     *
     * @return published date
     */
    @Column(name = "published", nullable = false,
            length = SIZE_OF_DATE_FIELD)
    public Date getPublished()
    {
        return this.m_Published;
    }
    /**
     * Set published date
     *
     * @param a_Published published date
     */
    public void setPublished(Date a_Published)
    {
        this.m_Published = a_Published;
    }
    /**
     * Get the book description
     *
     * @return book description
     */
    @Column(name = "description", nullable = false,
            length = SIZE_OF_BIG_FIELD)
    public String getDescription()
    {
        return this.m_Description;
    }
    /**
     * Set the book description
     *
     * @param a_Description book description
     */
    public void setDescription(String a_Description)
    {
        this.m_Description = a_Description;
    }
    /**
     * Get the on loan details for the book
     *
     * @return on loan details
     */
    @OneToMany(cascade = CascadeType.ALL, fetch = FetchType.LAZY,
               mappedBy = "book")
    public Set<OnLoan> getOnLoans()
    {
        return this.m_OnLoans;
    }
    /**
     * Set the on loan details
     *
     * @param a_OnLoans on loan details
     */
    public void setOnLoans(Set<OnLoan> a_OnLoans)
    {
        this.m_OnLoans = a_OnLoans;
    }
}

There are several key features in this code:

The @Entity annotation marks the class definition as a JEE Entity which can be managed via the JPA EntityManager.
The @Table annotation is not mandatory. It is suggested you use it, as it ensures any RDBMS table creation actions the JPA EntityManagerFactory may perform on your behalf (see the persistence unit section below) results in the correct mapping between the object layer and the relational layer.
The @Id field is used to identify the unique identifier or primary key for an instance of entity type Book. In our example code here, the code uses a custom UUID generator for the primary key. This is a by-product of our example also being used for a multi-database instance example illustrating HA-JDBC failover which will be described in a later blog. There are many strategies that can be used to generate the unique ID, either your own custom code, or get the JPA provider to do it for you, or rely on the background data storage to do it for you (say via a database sequence).

Note: The annotations that can be used for ID generation are discussed in some detail in section 4.2.3 Database Primary Keys of Java Persistence with Hibernate. Make sure you choose an ID generation strategy that scales and de-couples the JPA from the DB implementation properly; we reckon the less dependent the method is on a particular database implementation the better, as otherwise you can end up in danger of producing a JPA persistence layer that only works with one type of database, which misses the point of the JPA.

Every attribute you want stored against an entity needs a setter and a getter method. These take the form of setters and getters throughout Java, namely get<Attribute> and set <Attribute> where <Attribute> corresponds to the attribute name starting with an upper case letter. You can leave it to the JPA to decide what the actual database column names should be, but, in our pedantic example the @Column annotation specifies all the database column characteristics we consider important, all of them.
The relationship between one entity and another can be expressed via the annotations @OneToOne, @OneToMany, @ManyToOne, and @ManyToMany. The @OneToMany annotation in the example code above indicates there is a relationship between a Book entity and an OnLoan Entity, where a book may have been on loan to many different members over any particular time frame. Be advised that there are many different argument permutations allowable for these relationship annotations. In the example here, the arguments to the annotation indicate the cascade type is ALL, and the fetch type is LAZY. This means if we delete a Book entity, the JPA will cascade delete any OnLoan records that refer to the Book, and the LAZY means the JPA will only perform SQL required to retrieve OnLoan records when an application with a handle on the Book object actually invokes the getOnLoans method.
All the JPA entities should implement Serializable to avoid “between-tier” transport issues in multi-tier multi-box applications.
All the JPA entities should contain a default empty constructor. We’ve discovered that this constructor is used behind the scenes by the Hibernate JPA implementation to help maintain the correct entity content. Without it, entities do not get properly hydrated by the Hibernate JPA provider on data retrieval.

Lifecycle

All Entities within the JPA architecture have a life-cycle, best described in section 9.1: The persistence lifecycle in EJB3 in Action. This life-cycle is what explains the methods on an EntityManager instance. We'll cover it really briefly here, but please refer to the reference documentation for detail.

Entities start in a New state, and this state represents their condition just after construction.
When an entity is passed to the persist method on an EntityManager it moves to the Managed state, and the data representation of it is scheduled to be added to data storage. Managed entities are also returned from an EntityManager that executes either a query or the find operation.

In this managed state, an application can navigate the relationships expressed in the entity method signatures, so for example the call getOnLoans on an object of type Book would result in the JPA retrieving the OnLoan records for the book.

When an entity moves outside of the scope of the EntityManager that is looking after it (which can occur if the EntityManager has been closed, the entity has been passed between tiers in a multi-tier application, or the entity has been passed between threads) the entity will move to the Detached state.

You can amend the content of a detached entity by setting its values, but the content of this object needs to be merged back to back-end storage. Our example code provided always uses the merge operation on amended entity data which is posted back to data storage to cover detachment scenarios. It is suggested you follow the same pattern to avoid Detached Entity passed to merge/persist messages from the JPA provider layer.

The merge operation schedules the content of an entity to be updated in data storage.
An Entity can be removed using the remove method, which schedules the persistence provider to delete the representation in the data storage.
Note: Please note the scheduled to comment with respect to add, update, and delete actions. The actual operations get performed by the Hibernate implementation of the JPA when the EntityManager is made to commit a transaction either by you or by the provider when it flushes.

The Persistence Unit

The Java classes defining JEE Entities need to be grouped together into their own JAR and this is known as a persistence unit. An application can have many persistence units, and each persistence unit can be configured independently to use different back end storage.

It is sensible to organize your persistence unit to cover a particular relational data-model or object domain model that expresses the data scope of your solution. There is usually no point in breaking up a single logical data model into multiple persistence units.

Each persistence unit JAR keeps its classes at the top level of the JAR.

A persistence unit also needs a configuration file called persistence.xml that defines the entity content of the persistence unit and the way in which the persistence unit relates to the relational data model. In our example code, the content of META-INF/persistence.xml ensures that the application creates the database schema it needs from the JPA entity annotations it has if said schema is missing, or just uses the one already there. So JPA gives you a way of creating schema in any database without writing the DDL.

Here is the file for our sample persistence unit:

<persistence>
    <persistence-unit name="libraryDAO" transaction-type="RESOURCE_LOCAL">
        
        <class>com.example.dao.entities.Book</class>
        <class>com.example.dao.entities.Category</class>
        <class>com.example.dao.entities.Library</class>
        <class>com.example.dao.entities.Member</class>
        <class>com.example.dao.entities.OnLoan</class>

<class>com.example.dao.entities.OnLoanId</class>
        
        <properties>
            <property name="hibernate.ejb.cfgfile"
                      value="hibernate.cfg.xml"/>
            <property name="hibernate.hbm2ddl.auto" value="update"/>
        </properties>
    </persistence-unit>
</persistence>

The transaction type defines the transaction type of the unit. Ours is RESOURCE_LOCAL because it does not make use of the Java Transaction API (JTA) that could have been used if this persistence unit had been deployed under JBoss.
Our persistence unit doesn’t need a JEE container to run, so doesn’t make use of JTA. RESOURCE_LOCAL is a short way of indicating to the JPA provider to use the default JDBC transaction model.

The hibernate.ejb.cfgfile setting allows us to get the JPA provider to obtain the variable part of the configuration of the persistence unit (say the connection details) from a file that lives in the class path of the application outside the persistence unit JAR boundary. This file is called hibernate.cfg.xml and is discussed later.

Transactions

All entity adding/deleting/altering actions need to be encapsulated in transactions within the JPA. Within our outside-a-container implementation in the example code, the transactions bound all the operations performed by an EntityManager instance. So, repeating a code snippet previously given, the code highlighted in red represents the transaction management for an add operation:

    public void addBook(Book a_Book) throws DAOException
    {
        EntityTransaction currentTrans = null;
        final EntityManager em = super.getEntityManager();
        try
        {
            currentTrans = em.getTransaction();
            currentTrans.begin();
            em.persist(a_Book);
            currentTrans.commit();
        }
        catch (Exception anyE)
        {
            if (currentTrans.isActive())
            {
                currentTrans.rollback();
            }
            throw new DAOException(errorMessage, anyE);
        }
    }

As can be seen, this transaction management code represents a significant part of the code, which unfortunately is the case when you use the JPA outside a JEE container.

Outside a JEE container you are responsible in your own code for identifying the beginning of a unit of work with a transaction.begin statement, and indicating when this unit of work is complete with a transaction.commit or transaction.rollback statement.

Complex Queries

In most cases, you can navigate the object model of the data returned from an EntityManager.find operation to get hold of the data you may need from the object’s get methods.

Sometimes you’ll need more power than this interface offers. To get more power, you need to create custom queries, which can be defined in the query language JPQL.

These custom queries can form part of the entity classes themselves, or can be in a file. We chose the file route as it then means we know where to go to alter our queries without altering code.

Custom queries are scoped to a particular persistence unit by defining an orm.xml file embedded inside the persistence unit at the same directory location as the persistence.xml.

Here is the orm.xml for our example code:

<?xml version="1.0" encoding="UTF-8"?>
<entity-mappings xmlns="http://java.sun.com/xml/ns/persistence/orm"
                 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                 xsi:schemaLocation="http://java.sun.com/xml/ns/persistence/orm http://java.sun.com/xml/ns/persistence/orm_1_0.xsd"
                 version="1.0">
    <named-query name="getAllBooks">
        <query>
            <![CDATA[
                from Book b order by b.title asc
            ]]>
        </query>
    </named-query>
    <named-query name="getBookByTitle">
        <query>
            <![CDATA[
                from Book b
                where b.title = :bookTitle
            ]]>
        </query>
    </named-query>
    <named-query name="getAllCategories">
        <query>
            <![CDATA[
                from Category c order by c.categoryname asc
            ]]>
        </query>
    </named-query>
    <named-query name="getAllLibraries">
        <query>
            <![CDATA[
                from Library l order by l.name asc
            ]]>
        </query>
    </named-query>
    <named-query name="getAllMembers">
        <query>
            <![CDATA[
                from Member m order by m.fullname asc
            ]]>
        </query>
    </named-query>
    <named-query name="getMemberByName">
        <query>
            <![CDATA[
                from Member m
                where m.fullname = :memberName
            ]]>
        </query>
    </named-query>
    <named-query name="getAllLoans">
        <query>
            <![CDATA[
                from OnLoan ol order by ol.member.fullname asc
            ]]>
        </query>
    </named-query>
    <named-query name="getLoansForMember">
        <query>
            <![CDATA[
                from OnLoan ol
                where ol.id.memberuuid = :memberID
            ]]>
        </query>
    </named-query>
    <named-query name="getLoansForLibrary">
        <query>
            <![CDATA[
                from OnLoan ol
                where ol.member.library.libraryuuid = :libraryID
            ]]>
        </query>
    </named-query>
</entity-mappings>

Each query identified within the file is of type named-query. This is useful in-so-far as you can amend the caching properties of these queries to prevent them from hitting the database each time they are run, and by presenting them as named queries, the EntityManagerFactory will pre-compile these on initial startup for speedy usage.

The CDATA tag allows us to embed unruly characters in the query statements if we need to do, to avoid any XML processing errors.

Each query statement uses the Entity class names rather than the database tables names, and uses the attribute names defined in get<Attribute> and set<Attribute> on the Entity classes instead of table columns. Make sure you ensure the first character of the attribute names used in the orm.xml is lower case, unlike the access methods on the Entity class.

A query statement can use placeholders for variable data, an example placeholder being :libraryID in the last named query in the file. Your DAL code should take these placeholders as arguments where it needs to.

Both Java Persistence with Hibernate and EJB3 in Action have whole chapters specifically on the JPQL language. For an online guide try the complete and excellent example provided by BEA at http://edocs.bea.com/kodo/docs41/full/html/ejb3_langref.html

External Configuration

Any persistence unit should be configurable via external configuration, to prevent us from having to rebuild a persistence unit when it needs to be tweaked to a new database location or new database flavour. In our example code the file hibernate.cfg.xml performs this task.

When the EntityManagerFactory for a particular Data Access Layer (DAL) is started up, the persistence.xml refers out to the hibernate.cfg.xml. Within our sample application, the following hibernate.cfg.xml content is used:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE hibernate-configuration PUBLIC
    "-//Hibernate/Hibernate Configuration DTD 3.0//EN"
    "http://hibernate.sourceforge.net/hibernate-configuration-3.0.dtd">
    <hibernate-configuration>
        <session-factory>
            <property name="hibernate.show_sql">true</property>
            <property name="hibernate.format_sql">true</property>
            <property name="hibernate.connection.driver_class">
                org.postgresql.Driver
            </property>
            <property name="hibernate.connection.url">
                jdbc:postgresql://localhost:5432/libraries
            </property>
            <property name="hibernate.connection.username">postgres</property>
            <property name="hibernate.connection.password">hello123</property>
            <property name="hibernate.dialect">
                org.hibernate.dialect.PostgreSQLDialect
            </property>
            <property name="hibernate.c3p0.min_size">5</property>
            <property name="hibernate.c3p0.max_size">20</property>
            <property name="hibernate.c3p0.max_statements">50</property>
            <property name="hibernate.c3p0.timeout">1800</property>
            <property name="hibernate.cache.provider_class">
                org.hibernate.cache.EhCacheProvider
            </property>
         <property name="hibernate.cache.use_query_cache">true</property>
        </session-factory>
    </hibernate-configuration>

The show attribute can be set to false for production systems and true to development systems. With this set to true, you will see the SQL sent to the database in response to JPA queries and actions at whatever console you have going on the JVM.

The connection attributes specify the URL, username, password, and driver to use to connect. These can be JBoss data source definition, straight-forward JDBC definitions, or HA-JDBC clusters for client side DB HA which we’ll cover in a future blog.

The c3p0 attributes are only used for connection pooling outside a JEE container. When this is configured for JBoss execution, there will not need to be any pool definition here, the pooling will be done within JBoss via its data-source configuration. However, c3p0 is used by the Hibernate stand alone JPA implementation for out-of-container usage.

The cache.provider_class attribute is used to inform Hibernate of a second-level caching implementation at the JVM process level. This technique is used to stop the JPA implementation from hitting the database unless the data has changed, or is no longer in the second-level cache.
The cache.use_query_cache attribute informs Hibernate to turn on named-query caching for any suitably annotated query.

We’ll discuss caching as a feature in a follow on blog entry to this one.

Conclusions

For those of you without an in-depth knowledge of the JPA, we’re hoping this blog entry has given you enough basic information and confidence to start investigating the JPA for suitability on your projects.

In the next blog entry in the series we’ll cover implementing the ideas covered here for out-of-container usage of the JPA, specifying tools, tips, and hopefully pointing you in the right direction.

In the final blog on the series we’ll look at caching and database HA with the JPA.

Jim

The time poor programmer

Tuesday, 6 May 2008

How to JPA outside your JEE container: Part 1: The Basics