Is your feature request related to a problem? Please describe.
When dealing with vast and intricate datasets, optimizing query performance becomes crucial. In such scenarios, leveraging subqueries can significantly enhance performance. Subqueries provide the ability to filter data before performing joins or aggregations, thereby reducing the volume of data processed in subsequent operations. By applying filtering at an early stage, the number of records involved in complex operations is limited, resulting in expedited query execution.
Here is an example that uses subqueries to page the source table STAR_SCHEMA (which has millions of rows) and then joins in data from 6 other tables:
STAR_SCHEMA.FACT
STAR_SCHEMA.FIRST_DIM
STAR_SCHEMA.SECOND_DIM
STAR_SCHEMA.THIRD_DIM
STAR_SCHEMA.FOURTH_DIM
STAR_SCHEMA.FIFTH_DIM
SELECT fields ...
FROM
(SELECT FCT.*
FROM STAR_SCHEMA.FACT FCT
WHERE FIRST_ID = (SELECT ID FROM STAR_SCHEMA.FIRST_DIM WHERE NAME = 'SOME Name')
and SECOND_B_ID = (select id from STAR_SCHEMA.SECOND_DIM where id = 67)
limit 1000
) FCT
JOIN STAR_SCHEMA.FIRST_DIM FIRST ON FIRST.ID = FCT.FIRST_ID
JOIN STAR_SCHEMA.THIRD_DIM THIRD ON THIRD.ID = FCT.THIRD_ID
JOIN STAR_SCHEMA.FOURTH_DIM FOURTH ON FOURTH.ID = FCT.FOURTH_ID
JOIN STAR_SCHEMA.FIFTH_DIM FIFTH ON FIFTH.ID = FCT.FIFTH_ID
LEFT JOIN STAR_SCHEMA.SECOND_DIM SECOND_A ON FCT.SECOND_A_ID = SECOND_A.ID
LEFT JOIN STAR_SCHEMA.SECOND_DIM SECOND_B ON FCT.SECOND_B_ID = SECOND_B.ID;
Describe the solution you'd like
As of now, JPAStreamer does not support subqueries.
Describe alternatives you've considered
The absence of subquery support in JPAStreamer can be attributed, in part, to the limitations of the JPA Criteria API on which JPAStreamer relies. Although the JPA Criteria API does offer support for WHERE-subqueries, it lacks the capability to handle subqueries using the FROM expression. As a result, JPAStreamer is unable to provide full subquery functionality due to these constraints within the underlying API. We might however be able to add partial support, e.g. allowing WHERE-subqueries.
For now - here is an equivalent approach that uses JPAStreamer. This will yield the same results albeit cannot match the performance of the raw SQL above.
- Page and filter the POINT_FACT table (assuming the matching entity is called PointFact).
Set<Integer> set = jpaStreamer.stream(StarSchema.class)
.filter(StarSchema$.firstDim.equal(“some value”).and(StarSchema$.secondDim.equal(“some other value”) // the filters cannot be expressed as subqueries at this point
.skip(100) // perform paging
.limit(1000) // perform paging
.mapToInt(StarSchema$.id) // map to some unique identifier, id is just an example
.collect(Collectors.toSet());
- Define the join and projection
final StreamConfiguration<StarSchema> sc = StreamConfiguration.of(StarSchema.class)
.joining(StarSchema$.projectDim, JoinType.INNER) // All relations must be defined in the PointFact Entity for JPAStreamer to know which column to join on, in this case the project id.
.joining(StarSchema $.firstDim, JoinType.INNER)
.joining(StarSchema $.secondDim, JoinType.INNER)
.joining(StarSchema $.thirdDim, JoinType.INNER)
.joining(StarSchema $.fourthDim, JoinType.LEFT)
.joining(StarSchema $.fifthDim, JoinType.LEFT)
.select(StarSchema.of(StarSchema$.X, StarSchema$.Y, StarSchema$.Z)); // Note that the projection requires that there is a matching constructor defined in the StarSchema Entity e.g. StarSchema(X, Y, Z)
- Perform the join only on rows in the predefined set
Stream<PointFact> films = jpaStreamer.stream(sc)
.filter(PointFact$.[id.in](http://id.in/)(set))) // only perform the join on rows in the predefined set
. …
Is your feature request related to a problem? Please describe.
When dealing with vast and intricate datasets, optimizing query performance becomes crucial. In such scenarios, leveraging subqueries can significantly enhance performance. Subqueries provide the ability to filter data before performing joins or aggregations, thereby reducing the volume of data processed in subsequent operations. By applying filtering at an early stage, the number of records involved in complex operations is limited, resulting in expedited query execution.
Here is an example that uses subqueries to page the source table STAR_SCHEMA (which has millions of rows) and then joins in data from 6 other tables:
Describe the solution you'd like
As of now, JPAStreamer does not support subqueries.
Describe alternatives you've considered
The absence of subquery support in JPAStreamer can be attributed, in part, to the limitations of the JPA Criteria API on which JPAStreamer relies. Although the JPA Criteria API does offer support for WHERE-subqueries, it lacks the capability to handle subqueries using the FROM expression. As a result, JPAStreamer is unable to provide full subquery functionality due to these constraints within the underlying API. We might however be able to add partial support, e.g. allowing WHERE-subqueries.
For now - here is an equivalent approach that uses JPAStreamer. This will yield the same results albeit cannot match the performance of the raw SQL above.