Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid synchronization on GetQueues #116

Draft
wants to merge 11 commits into
base: master
Choose a base branch
from

Conversation

klockla
Copy link
Collaborator

@klockla klockla commented Jan 22, 2025

The purpose of this PR is to avoid the synchronization on the internal queue map.

The synchronized(getQueues()) call causes contention during long queue traversals which are very likely to occur during calls to countURLs & ListURLs.

This change is based on the ConcurrentOrdereredMap class which uses StampLock to provide fine
fine-grained locking over the concurrent map.

(alternative in ConcurrentLinkedHashMap based on ReentrantLock, need to compare under stress tests)

Signed-off-by: Laurent Klock [email protected]

Signed-off-by: Laurent Klock <[email protected]>
Finalized AbstractFrontierService modification (removal of synchronizations on queues)

Signed-off-by: Laurent Klock <[email protected]>
@klockla klockla self-assigned this Jan 22, 2025
@jnioche
Copy link
Collaborator

jnioche commented Jan 22, 2025

Thanks @klockla, sounds good
Would be great to have a test or benchmark to compare the existing approach vs ConcurrentOrdereredMap vs ConcurrentLinkedHashMap indeed.
Given that the same approach is used in key parts of StormCrawler it would be great to have a better approach.
Is the code for ConcurrentOrdereredMap and ConcurrentLinkedHashMap from another project or did you write that? We would need a strong set of unit tests to make sure they are bullet proof

@klockla klockla force-pushed the concurrentlinkedhashmap branch from c13f23d to 10bc16e Compare February 18, 2025 15:04
Signed-off-by: Laurent Klock <[email protected]>
…erence to the order entry in the value map

(huge boost on remove operation by avoiding searching the matching key)

Signed-off-by: Laurent Klock <[email protected]>
@klockla klockla force-pushed the concurrentlinkedhashmap branch from 10bc16e to 0f52dcc Compare February 18, 2025 15:31
Signed-off-by: Laurent Klock <[email protected]>
@klockla
Copy link
Collaborator Author

klockla commented Feb 19, 2025

Hello @jnioche
I tried some other implementations with different combinations of data structures and locking mechanisms.
So far, the best candidate according to my tests is ConcurrentStripedOrderedMap (yes, I'll change this name later :) )
See some benchmarks in the attached Excel file

ConcurentOrderMapTests.xlsx

I'm going to drop the others..but your feedback is welcome

@jnioche
Copy link
Collaborator

jnioche commented Feb 19, 2025

Thanks. What's the difference between Bench, ReadHeavy and WriteHeavy?
Is Synchronized LinkedHashMap what we currently have and isn't it performing better on Bench and WriteHeavy?
Is this code taken from somwhere or did you write it from scratch?

@klockla
Copy link
Collaborator Author

klockla commented Feb 20, 2025

The code is written from scratch. You can see the different performance tests in ConcurrentMapStressTest.java
The purpose of these tests are to test the performance under heavy load with many threads doing many ops on the maps.

Basically, in bench every thread are just doing put, gets and a remove every 2 loop cycle.
ReadHeavy is mostly doing gets and a put every 1000 cycle (on a pre initialized map)
WriteHeavy is the opposite: a put every cycle, a get every 100 cycle and a remove every 500.
Yes, the synchronized LinkedHashMap is performing a bit better on bench and WriteHeavy but the whole idea of the thing
is to have a data structure that avoids a global synchronization while we're iterating over its elements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants