Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Parallelized map and optimize Database search API #2669

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ndegwamartin
Copy link
Collaborator

@ndegwamartin ndegwamartin commented Sep 7, 2024

IMPORTANT: All PRs must be linked to an issue (except for extremely trivial and straightforward changes).

Fixes #2668

Description
Optimizes the DatabaseImpl search APIs FHIR Resource(serialized) to HAPI FHIR Structure mapping block by introducing a parallelized implementation that uses async couroutines within each mapping iteration.

Alternative(s) considered
Have you considered any alternatives? And if so, why have you chosen the approach in this PR?

Type
Enhancement

Screenshots (if applicable)

Checklist

  • I have read and acknowledged the Code of conduct.
  • I have read the Contributing page.
  • I have signed the Google Individual CLA, or I am covered by my company's Corporate CLA.
  • I have discussed my proposed solution with code owners in the linked issue(s) and we have agreed upon the general approach.
  • I have run ./gradlew spotlessApply and ./gradlew spotlessCheck to check my code follows the style guide of this project.
  • I have run ./gradlew check and ./gradlew connectedCheck to test my changes locally.
  • I have built and run the demo app(s) to verify my change fixes the issue and/or does not break the demo app(s).

@ndegwamartin ndegwamartin requested a review from a team as a code owner September 7, 2024 10:43
- Optimize Database search API
Copy link
Collaborator

@FikriMilano FikriMilano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change looks great!

Additionally, could you provide some performance comparison between the old and new code? That will be cool to know

Copy link
Collaborator

@jingtang10 jingtang10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great work thanks @ndegwamartin!

ndegwamartin added a commit to opensrp/android-fhir that referenced this pull request Sep 10, 2024
FORK
         - With unmerged PR #9
            - WUP  #13

SDK
            - WUP google#2178
            - WUP google#2650
            - WUP google#2663
PERF
- WUP google#2669
- WUP google#2565
- WUP google#2561
- WUP google#2535
@jingtang10
Copy link
Collaborator

To summarised our discussion yesterday, I think there's still work to be done in this PR - @ndegwamartin to investigate thread pool etc. Pls comment when this is ready for next round of review - @FikriMilano @aditya-07 @yigit @stevenckngaa @vorburger @kevinmost pls also take a look at this.

@ndegwamartin
Copy link
Collaborator Author

Device: Physical, Samsung Galaxy Active Tab 2
Mode : Benchmarking with Kotlin system Timing's measureTimeMillis
Scope: Database search API method search

Optimization: None

Run 1

Resource Type Total Records Timetaken(seconds) DB Query(seconds)
Group ~1K 8 ~0.2
Task ~17K ~24 ~1.8
Patient ~11K ~456 ~1.3

Run 2

Resource Type Total Records Timetaken(seconds) DB Query(seconds)
Group ~1K ~2 ~0.1
Task ~17K ~22 ~1.7
Patient ~11K ~472 ~1.2

Optimization: Using async with parent context (usually Dispatchers.IO)

Run 1

Resource Type Total Records Timetaken(seconds) DB Query(seconds)
Group ~1K 4.8 ~0.2
Task ~17K ~24 ~1.7
Patient ~11K ~450 ~1.3

Run 2

Resource Type Total Records Timetaken(seconds) DB Query(seconds)
Group ~1K ~2 ~0.1
Task ~17K ~24 ~1.7
Patient ~11K ~455 ~1.3

Optimization: Using async with Dispatchers.Default .
(Note - Threads safety of the FHIR JsonParser is achieved through creating a new instance for each loop)

Run 1

Resource Type Total Records Timetaken(seconds) DB Query(seconds)
Group ~1K ~5 ~0.2
Task ~17K ~5.4 ~1.8
Patient ~11K ~208 ~1.4

Run 2

Resource Type Total Records Timetaken(seconds) DB Query(seconds)
Group ~1K ~0.5 ~0.1
Task ~17K ~5 ~1.7
Patient ~11K ~204 ~1.3

Note - The tests were carried out in a QA test environment. In the real world Patients would be more than Groups (i.e. Patients = ~10 x No. of Groups ) and Tasks would be even more than Patients (i.e Tasks = ~30 x No. of Patients)

@ndegwamartin
Copy link
Collaborator Author

Full specs of the device:

  • Samsung Galaxy Tab Active2
  • Android 9 (28)
  • 3GB Memory

@ndegwamartin ndegwamartin marked this pull request as ready for review September 18, 2024 10:46
@@ -460,6 +470,11 @@ internal class DatabaseImpl(
}
}

/** Implementation of a parallelized map */
suspend fun <A, B> Iterable<A>.pmap(f: suspend (A) -> B): List<B> = coroutineScope {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you rename pmap to something which recommends to pass functions doing CPU intensive work.
May be "pmapCPU" ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes total sense because of the Dispatcher constraint

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had restricted it for use in the DB search API class but with the rename I could potentially move it out to the generic Utils class for reuse elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New
Development

Successfully merging this pull request may close these issues.

Optimize the Database Search API
4 participants