The best way to use JOIN FETCH and Pagination with Spring
Are you struggling with performance issues in your Spring, Jakarta EE, or Java EE application?
What if there were a tool that could automatically detect what caused performance issues in your JPA and Hibernate data access layer?
Wouldn’t it be awesome to have such a tool to watch your application and prevent performance issues during development, long before they affect production systems?
Well, Hypersistence Optimizer is that tool! And it works with Spring Boot, Spring Framework, Jakarta EE, Java EE, Quarkus, Micronaut, or Play Framework.
So, rather than fixing performance issues in your production system on a Saturday night, you are better off using Hypersistence Optimizer to help you prevent those issues so that you can spend your time on the things that you love!
Introduction
In this article, we are going to see how we can use the JOIN FETCH clause when fetching a child collection eagerly while also limiting the number of parent records using pagination in a Spring Data JPA application.
I decided to write this article because the most common solution used in many projects turns out to be extremely inefficient.
Domain Model
Let’s assume we have a parent Post
entity that has a one-to-many relationship with the child PostComment
entity.
The Post
parent entity is mapped as follows:
@Entity(name = "Post") @Table(name = "post") public class Post { @Id private Long id; private String title; @Column( name = "created_on", nullable = false ) private LocalDateTime createdOn; @OneToMany( mappedBy = "post", cascade = CascadeType.ALL, orphanRemoval = true ) private List<PostComment> comments = new ArrayList<>(); }
And the PostComment
child entity, like this:
@Entity(name = "PostComment") @Table(name = "post_comment") public class PostComment { @Id private Long id; @ManyToOne(fetch = FetchType.LAZY) private Post post; private String review; @Column(name = "created_on") private LocalDateTime createdOn; }
Using JOIN FETCH with Spring Pagination
Our business requirement is to filter the Post
entities by a given predicate, order them by the creation timestamp, and fetch the Top N such Post
entities along with all their associated comments
.
The most obvious but wrong way of implementing this requirement is to combine a JOIN FETCH on the child collection with the parent entity pagination in a single Spring @Query
that looks as follows:
@Query( value = """ select p from Post p left join fetch p.comments where p.title like :titlePattern """, countQuery = """ select count(p) from Post p where p.title like :titlePattern """ ) Page<Post> findAllByTitleWithCommentsAntiPattern( @Param("titlePattern") String titlePattern, Pageable pageable );
At first sight, this query looks like the natural solution to our problem, but when running it:
Page<Post> posts = postRepository.findAllByTitleWithCommentsAntiPattern( "High-Performance Java Persistence %", PageRequest.of( 0, maxCount, Sort.by("createdOn") ) );
We get the following entries printed in the log:
WARN: HHH90003004: firstResult/maxResults specified with collection fetch; applying in memory -- JOIN FETCH query SELECT p1_0.id, c1_0.post_id, c1_0.id, c1_0.created_on, c1_0.review, p1_0.created_on, p1_0.title FROM post p1_0 LEFT JOIN post_comment c1_0 ON p1_0.id=c1_0.post_id WHERE p1_0.title LIKE 'High-Performance Java Persistence %' ESCAPE '' ORDER BY p1_0.created_on ASC -- Count query SELECT count(p1_0.id) FROM post p1_0 WHERE p1_0.title LIKE 'High-Performance Java Persistence %' ESCAPE ''
The WARN
message tells us that the pagination was not done in the database but in the application memory, and if we inspect the SQL query that follows the warning message, we can see that, indeed, there’s no SQL-level pagination.
Because a WARN message can be overlooked, it’s better to enable the Hibernate
hibernate.query.fail_on_pagination_over_collection_fetch
property which will throw aHibernateException
when detecting this issue.
spring.jpa.properties.hibernate.query.fail_on_pagination_over_collection_fetch=true
Check out this article for a detailed explanation of this property.
To verify the performance of a given SQL query, the only valid strategy is to inspect the execution plan.
However, istead of using EXPLAIN ANALYZE
, which would just give us the estimated plan, we are going to use auto_explain
so that we can collect the actual execution plan for a given executed SQL query.
In our case, if we inspect the execution plan for the aforementioned SQL query, we can see that it fetches 100,000
records from the database (e.g., 10,000
matching post
entries multiplied by 10
post_comments
for each post
):
LOG: duration: 368.062 ms Query Text: SELECT p1_0.id, c1_0.post_id, c1_0.id, c1_0.created_on, c1_0.review, p1_0.created_on, p1_0.title FROM post p1_0 LEFT JOIN post_comment c1_0 ON p1_0.id=c1_0.post_id WHERE p1_0.title like $1 ESCAPE '' ORDER BY p1_0.created_on ASC Sort (cost=2659.65..2660.90 rows=500 width=1072) (actual time=137.324..168.576 rows=100000 loops=1) Sort Key: p1_0.created_on Sort Method: external merge Disk: 13512kB -> Hash Right Join (cost=248.63..2637.24 rows=500 width=1072) (actual time=4.356..51.336 rows=100000 loops=1) Hash Cond: (c1_0.post_id = p1_0.id) -> Seq Scan on post_comment c1_0 (cost=0.00..2126.00 rows=100000 width=540) (actual time=0.033..14.343 rows=100000 loops=1) -> Hash (cost=248.00..248.00 rows=50 width=532) (actual time=4.306..4.307 rows=10000 loops=1) Buckets: 16384 (originally 1024) Batches: 1 (originally 1) Memory Usage: 1075kB -> Seq Scan on post p1_0 (cost=0.00..248.00 rows=50 width=532) (actual time=0.056..2.470 rows=10000 loops=1) Filter: ((title)::text ~~ 'High-Performance Java Persistence %'::text)
Without using SQL-level pagination, such SQL queries are prone to fetching all the post
records matching the filtering criteria along with all their associated post_comments
.
Fetching the parent-child hierarchy while limiting the parent entities using SQL
Spring 6 uses Hibernate 6, and starting with this version, JPQL queries can use Window Functions and Derived Tables, which we can also use for our business use case.
So, a much more efficient way to JOIN FETCH the child collection while also using pagination on the parent entities is to use the following Spring @Query
:
@Query(""" select p from Post p left join fetch p.comments pc where p.id in ( select id from ( select id as id, dense_rank() over (order by createdOn ASC) as ranking from Post where title like :titlePattern ) pr where ranking <= :maxCount ) """ ) List<Post> findFirstByTitleWithCommentsByTitle( @Param("titlePattern") String titlePattern, @Param("maxCount") int maxCount );
When running the findFirstByTitleWithCommentsByTitle
query method:
List<Post> posts = postRepository.findFirstByTitleWithCommentsByTitle( "High-Performance Java Persistence %", maxCount );
Hibernate generates the following SQL query:
SELECT p1_0.id, c1_0.post_id, c1_0.id, c1_0.created_on, c1_0.review, p1_0.created_on, p1_0.title FROM post p1_0 LEFT JOIN post_comment c1_0 ON p1_0.id=c1_0.post_id WHERE p1_0.id IN ( SELECT pr1_0.id FROM ( SELECT p2_0.id, dense_rank() over( ORDER BY p2_0.created_on ASC ) FROM post p2_0 WHERE p2_0.title LIKE 'High-Performance Java Persistence %' ESCAPE '' ) pr1_0(id, ranking) WHERE pr1_0.ranking<=25 )
The execution plan for this query looks as follows:
LOG: duration: 8.569 ms Query Text: SELECT p1_0.id, c1_0.post_id, c1_0.id, c1_0.created_on, c1_0.review, p1_0.created_on, p1_0.title FROM post p1_0 LEFT JOIN post_comment c1_0 ON p1_0.id=c1_0.post_id WHERE p1_0.id in (SELECT pr1_0.id FROM (SELECT p2_0.id, dense_rank() over( ORDER BY p2_0.created_on ASC) FROM post p2_0 WHERE p2_0.title like $1 ESCAPE '') pr1_0(id, ranking) WHERE pr1_0.ranking<=$2) Nested Loop Left Join (cost=251.70..1228.20 rows=500 width=1072) (actual time=5.630..7.673 rows=250 loops=1) -> Hash Semi Join (cost=251.41..501.22 rows=50 width=532) (actual time=5.532..7.240 rows=25 loops=1) Hash Cond: (p1_0.id = pr1_0.id) -> Seq Scan on post p1_0 (cost=0.00..223.00 rows=10000 width=532) (actual time=0.018..0.821 rows=10000 loops=1) -> Hash (cost=250.79..250.79 rows=50 width=8) (actual time=5.500..5.502 rows=25 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 9kB -> Subquery Scan on pr1_0 (cost=249.41..250.79 rows=50 width=8) (actual time=5.452..5.488 rows=25 loops=1) -> WindowAgg (cost=249.41..250.29 rows=50 width=24) (actual time=5.451..5.484 rows=25 loops=1) Run Condition: (dense_rank() OVER (?) <= '25'::bigint) -> Sort (cost=249.41..249.54 rows=50 width=16) (actual time=5.415..5.417 rows=27 loops=1) Sort Key: p2_0.created_on Sort Method: quicksort Memory: 931kB -> Seq Scan on post p2_0 (cost=0.00..248.00 rows=50 width=16) (actual time=0.024..3.271 rows=10000 loops=1) Filter: ( (title)::text ~~ 'High-Performance Java Persistence %'::text ) -> Index Scan using idx_post_comment_post_id on post_comment c1_0 (cost=0.29..9.54 rows=500 width=540) (actual time=0.007..0.013 rows=10 loops=25) Index Cond: (post_id = p1_0.id)
Compared to the previous query that took 368 ms
, this one takes only 8 ms
because it only fetches 250
records from the database, not 100_000
rows.
Awesome, right?
Fetching the parent-child hierarchy using Blaze Persistence
Another way to fetch a parent-child hierarchy while painting the parent entities is to use Blaze Persistence.
For that, we will create a CustomPostRepository
:
public interface CustomPostRepository { List<Post> findAllByTitleWithComments( String titlePattern, PageRequest pageRequest ); }
And we will provide the following implementation for it:
public class CustomPostRepositoryImpl implements CustomPostRepository { @PersistenceContext private EntityManager entityManager; @Autowired private CriteriaBuilderFactory criteriaBuilderFactory; @Override public List<Post> findAllByTitleWithComments( String titlePattern, PageRequest pageRequest) { return criteriaBuilderFactory .create(entityManager, Post.class) .fetch(Post_.COMMENTS) .where(Post_.TITLE) .like().value(titlePattern).noEscape() .orderBy(Post_.CREATED_ON, true) .orderBy(Post_.ID, true) .page( (int) pageRequest.getOffset(), pageRequest.getPageSize() ) .withCountQuery(false) .getResultList(); } }
The findAllByTitleWithComments
query method uses Blaze Persistence to create the query dynamically, which uses both JOIN FETCH
and Post
entity pagination.
To use the methods from the CustomPostRepository
, the PostRepository
will extend from it:
@Repository public interface PostRepository extends BaseJpaRepository<Post, Long>, CustomPostRepository { }
For more details about using custom Repositories with Spring Data JPA< check out this article as well.
When calling the findAllByTitleWithComments
method defined by the CustomPostRepository
:
List<Post> posts = postRepository.findAllByTitleWithComments( "High-Performance Java Persistence %", PageRequest.of(0, maxCount, Sort.by("createdOn")) );
Hibernate will generate the following SQL query:
SELECT p1_0.id, c1_0.post_id, c1_0.id, c1_0.created_on, c1_0.review, p1_0.created_on, p1_0.title FROM post p1_0 LEFT JOIN post_comment c1_0 ON p1_0.id=c1_0.post_id WHERE p1_0.id in ( SELECT p2_0.id FROM post p2_0 WHERE p2_0.title like 'High-Performance Java Persistence %' ESCAPE '' ORDER BY p2_0.created_on ASC, p2_0.id ASC FETCH FIRST 25 ROWS ONLY ) ORDER BY p1_0.created_on ASC, p1_0.id ASC
And when analyzing the execution plan:
LOG: duration: 6.075 ms plan: Query Text: SELECT p1_0.id, c1_0.post_id, c1_0.id, c1_0.created_on, c1_0.review, p1_0.created_on, p1_0.title FROM post p1_0 LEFT JOIN post_comment c1_0 ON p1_0.id=c1_0.post_id WHERE p1_0.id in (SELECT p2_0.id FROM post p2_0 WHERE p2_0.title like $1 ESCAPE '' ORDER BY p2_0.created_on ASC,p2_0.id ASC FETCH FIRST 25 ROWS ONLY) ORDER BY p1_0.created_on ASC, p1_0.id ASC Sort (cost=795.08..795.71 rows=250 width=1072) (actual time=5.384..5.406 rows=250 loops=1) Sort Key: p1_0.created_on, p1_0.id Sort Method: quicksort Memory: 64kB -> Nested Loop Left Join (cost=250.36..785.12 rows=250 width=1072) (actual time=5.083..5.250 rows=250 loops=1) -> Nested Loop (cost=250.07..421.63 rows=25 width=532) (actual time=5.046..5.106 rows=25 loops=1) -> HashAggregate (cost=249.79..250.04 rows=25 width=8) (actual time=4.986..4.996 rows=25 loops=1) Group Key: p2_0.id Batches: 1 Memory Usage: 24kB -> Limit (cost=249.41..249.47 rows=25 width=16) (actual time=4.970..4.974 rows=25 loops=1) -> Sort (cost=249.41..249.54 rows=50 width=16) (actual time=4.968..4.970 rows=25 loops=1) Sort Key: p2_0.created_on, p2_0.id Sort Method: top-N heapsort Memory: 26kB -> Seq Scan on post p2_0 (cost=0.00..248.00 rows=50 width=16) (actual time=0.072..3.628 rows=10000 loops=1) Filter: ((title)::text ~~ 'High-Performance Java Persistence %'::text) -> Index Scan using post_pkey on post p1_0 (cost=0.29..6.86 rows=1 width=532) (actual time=0.004..0.004 rows=1 loops=25) Index Cond: (id = p2_0.id) -> Index Scan using idx_post_comment_post_id on post_comment c1_0 (cost=0.29..9.54 rows=500 width=540) (actual time=0.003..0.004 rows=10 loops=25) Index Cond: (post_id = p1_0.id)
We can see that the executed SQL query is extremely efficient. So, Blaze Persistence offers a solution that’s both convenient and efficient.
For more details about Blaze Persistence, check out this article as well.
Split the filtering and the fetching into two separate queries
The previous solution works great for a Top-N query, but if you want to expand it to a Next-N query, you’d need to use Keyset Pagination.
However, there’s another approach that is worth mentioning.
We can split the filtering and the fetching into two separate queries like this:
@Query( value = """ select p.id from Post p where p.title like :titlePattern """, countQuery = """ select count(p) from Post p where p.title like :titlePattern """ ) List<Long> findAllPostIdsByTitle( @Param("titlePattern") String titlePattern, Pageable pageable ); @Query(""" select p from Post p left join fetch p.comments where p.id in :postIds """ ) List<Post> findAllByIdWithComments( @Param("postIds") List<Long> postIds );
The first query filters the Post
identifiers that match the filtering criteria and fetches the page of identifiers we are interested in.
And the second query takes a list of Post
identifiers and fetches the matching Post
entities along with their associated PostComment
child entities.
The ForumService
would then define a single transaction method that calls these two PostRepository
query methods:
@Transactional(readOnly = true) public List<Post> findAllPostsByTitleWithComments( String titlePattern, PageRequest pageRequest) { return postRepository.findAllByIdWithComments( postRepository.findAllPostIdsByTitle( titlePattern, pageRequest ) ); }
The reason why we are using a Service method is that we want these two queries to be executed in the context of the same database transaction.
And when calling the findAllPostsByTitleWithComments
service method:
List<Post> posts = forumService.findAllPostsByTitleWithComments( "High-Performance Java Persistence %", PageRequest.of( 0, maxCount, Sort.by("createdOn") ) );
We would get two very simple SQL queries executed by Hibernate:
-- The findAllPostIdsByTitle query SELECT p1_0.id FROM post p1_0 WHERE p1_0.title LIKE 'High-Performance Java Persistence %' ESCAPE '' ORDER BY p1_0.created_on ASC OFFSET 0 ROWS FETCH FIRST 25 ROWS ONLY -- The findAllByIdWithComments query SELECT p1_0.id, c1_0.post_id, c1_0.id, c1_0.created_on, c1_0.review, p1_0.created_on, p1_0.title FROM post p1_0 LEFT JOIN post_comment c1_0 ON p1_0.id=c1_0.post_id WHERE p1_0.id IN ( 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25 )
The execution plans for these two queries looks as follows:
LOG: duration: 4.084 ms Query Text: SELECT p1_0.id FROM post p1_0 WHERE p1_0.title like $1 ESCAPE '' ORDER BY p1_0.created_on ASC OFFSET $2 ROWS FETCH FIRST $3 ROWS ONLY Limit (cost=249.41..249.47 rows=25 width=16) (actual time=4.069..4.072 rows=25 loops=1) -> Sort (cost=249.41..249.54 rows=50 width=16) (actual time=4.067..4.069 rows=25 loops=1) Sort Key: created_on Sort Method: top-N heapsort Memory: 26kB -> Seq Scan on post p1_0 (cost=0.00..248.00 rows=50 width=16) (actual time=0.082..2.934 rows=10000 loops=1) Filter: ((title)::text ~~ 'High-Performance Java Persistence %'::text) LOG: duration: 20.008 ms Query Text: SELECT p1_0.id, c1_0.post_id, c1_0.id, c1_0.created_on, c1_0.review, p1_0.created_on, p1_0.title FROM post p1_0 LEFT JOIN post_comment c1_0 ON p1_0.id=c1_0.post_id WHERE p1_0.id in( $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12, $13, $14, $15, $16, $17, $18, $19, $20, $21, $22, $23, $24, $25 ) Hash Right Join (cost=142.23..2530.84 rows=250 width=1072) (actual time=0.091..19.621 rows=250 loops=1) Hash Cond: (c1_0.post_id = p1_0.id) -> Seq Scan on post_comment c1_0 (cost=0.00..2126.00 rows=100000 width=540) (actual time=0.022..11.337 rows=100000 loops=1) -> Hash (cost=141.92..141.92 rows=25 width=532) (actual time=0.062..0.063 rows=25 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 11kB -> Bitmap Heap Scan on post p1_0 (cost=79.38..141.92 rows=25 width=532) (actual time=0.053..0.055 rows=25 loops=1) Recheck Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}'::bigint[])) Heap Blocks: exact=1 -> Bitmap Index Scan on post_pkey (cost=0.00..79.31 rows=25 width=0) (actual time=0.050..0.050 rows=25 loops=1) Index Cond: (id = ANY ('{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}'::bigint[]))
This solution will work just fine for both Top-N queries that display the first page and Next-N queries that can display the remaining pages.
If you enjoyed this article, I bet you are going to love my Book and Video Courses as well.
Conclusion
The best way to use JOIN FETCH on a child collection while limiting the parent entities is to use SQL-level pagination, and starting with Spring 6 and Hibernate 6, this can be done using a single JPQL query.
On Spring 5 and Hibernate 5, or if you need both Top-N and Next-N queries, you can use the two-query approach. All in all, just avoid the initial JPQL query that mixes both JOIN FETCH and Pagination since it could lead to terrible performance issues.

You forget the other option, which is to use the Spring Data integration of Blaze-Persistence that generates these kinds of pagination queries automatically 🙂
Does Blaze Persistence manage to combine JOIN FETCH and pagination? Can you point me to the docs where this feature is mentioned?
Of course. It sees the query with join fetches and creates an id-subquery from that (prunes away everything that is select clause related), which looks similar to what you are presenting here.
See the pagination documentation https://persistence.blazebit.com/documentation/1.6/core/manual/en_US/#pagination and the under the hood section to understand what is going on.
Already for quite some time, the id query is inlined into the object query. Same thing happens for the count query.
Another nice thing that Blaze-Persistence offers is bounded counting to avoid a full table scan: https://persistence.blazebit.com/documentation/1.6/core/manual/en_US/#bounded-counting
Great. That’s for pointing it out. I’ll have to include it in the article.