-
Notifications
You must be signed in to change notification settings - Fork 653
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance improvements #935
base: cabg_cardio_gmf_htn_trial
Are you sure you want to change the base?
Performance improvements #935
Conversation
Added 3 more commits, bringing the speed up to over 40% on my machines |
Thank you for the pull request!! There is a lot of content to consider, especially the caching and unnecessary object creation. The concurrency fixes will need to be reviewed more closely, because we actually have run Synthea in a multi-threaded fashion before, although not recently. Another consideration we have to review is that the results are exactly the same between runs (assuming the random seeds and initial conditions are the same). Repeatability is important, so we'll need to review your changes from that perspective. Finally, I can tell you we won't accept your implementation of random Gaussian, even if you cite the great Donald Knuth (which I appreciate) and even if it speeds up the software by an order of magnitude. In this case, when the tradeoff is speed and using the standard library, I'm going to err on the side of the standard library 99 times out of 100. Finally, it appears as though some of the unit tests are failing (you can check these locally by running I may have additional thoughts after a closer review, but thank you again! |
Threading is still used in the application today for population generation. But since the threading assigns a single thread to work on a single Person object, everything that is solely used from within that Person object doesn't need to be thread safe. This is where I have replaced the ConcurrentHashMap instances with the HashMap instances. You will notice that for the clone() method of the Module class I have not replaced it, because I was unsure of its used across threads and did not have time to investigate, so in that case I simply fixed the construction to be single shot instead of multiple calls to put(). The constructor is inherently thread safe because the object doesn't really "exist" until after the constructor is finished, so ConcurrentHashMap doesn't have to deal with locks inside the constructor. The put on the other hand has to be thread safe, so there are locks involved which severely slow it down. So in the runtime of Module.clone() was greatly improved, while keeping the runtime penalty on the ConcurrentHashMap.get() when later used. Some objects are just very expensive to construct, and in that case I replaced them with either ThreadLocal instances if the object is not thread safe (eg: Calendar), or with a simple static instance where it is at least used in a thread safe manner (eg: NormalDistribution) "My" implementation of the random generator is actually copy&paste of the JVM implementation (See https://github.com/AdoptOpenJDK/openjdk-jdk11/blob/master/src/java.base/share/classes/java/util/Random.java#L583 ) The only difference being that the method signature is not annotated with "synchronized". I was simply removing the performance penalty of the implicit lock, since we do not need thread safety (the Random instance is only used inside Person object which is only accessed from single thread as per above). I understand if you prefer not to use it, just know it's not a "random" implementation of Random :) The unit tests failing are the same one that are already failing on the cabg_cardio_gmf_htn_trial branch, my changes did not fix those break, but also haven't caused new break as far as I've noticed). If you have any more concerns, let me know and I'll do my best to explain the changes. |
Thank you, I realize that. Not all of our experiments are public, and some of those have not restricted 1 thread to 1 patient as you see here. So, it is something for us to consider. Regarding the Gaussian implementation, please revert that change from the Pull Request. Regarding the Unit Test failures -- OK, understood. Thank you. |
Reduced runtime by around 30% when tested on two different machines, a 4c/8t laptop, and a 40c/80t server.