You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I have a question about the QUASI_CLOSED state containers. I've read the document 1 and a comment in the source 2, and I understand that the QUASI_CLOSED container is a result when containers have not closed properly with Ratis consensus.
So I have some questions about the state:
Q1: Do we have the FORCE_CLOSE transition 2 as of Ozone 1.3?
Q2: Alternatively, can we manually or automatically move the QUASI_CLOSED containers to the CLOSED state? (Does HDDS-7980 fix that?)
Q3: If BCS ID is not matched among DNs, I think the highest BCS ID replica will be chosen 1. Do we have some handlers to do that? or how do we fix them manually (e.g., remove other replicas and fire the under-replicated handler) ?
We've got these errors:
2023-05-10 11:11:24,565 [qtp1209411469-737118] INFO org.apache.hadoop.hdds.scm.storage.BlockInputStream: Unable to read information for block conID: 2785903 locID: 107544262875088826 bcsId: 9298 from pipeline PipelineID=53f5bfe7-b15c-405f-82ba-f3a507e437bf: Unable to find the block with bcsID 9298 .Container 2785903 bcsId is 9285.
2023-05-10 11:11:24,567 [qtp1209411469-737118] WARN org.apache.hadoop.hdds.scm.storage.BlockInputStream: No new pipeline for block conID: 2785903 locID: 107544262875088826 bcsId: 9298
2023-05-10 11:11:24,567 [qtp1209411469-737118] WARN org.eclipse.jetty.server.HttpChannel: handleException /path/to/file org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Unable to find the block with bcsID 9298 .Container 2785903 bcsId is 9285.
2023-05-10 11:11:24,567 [qtp1209411469-737118] WARN org.eclipse.jetty.server.HttpChannelState: unhandled due to prior sendError
javax.servlet.ServletException: javax.servlet.ServletException: org.glassfish.jersey.server.ContainerException: org.apache.hadoop.hdds.scm.container.common.helpers.StorageContainerException: Unable to find the block with bcsID 9298 .Container 2785903 bcsId is 9285.
@k5342
The error you have faced is in quasi-closed state, where the DN from which read has happened is not in sync.
FOCE_CLOSE transition is available in 1.3
QUASI_CLOSED to CLOSED transition is automatic by replication manager "based on number of DN in quasi-closed for container more than majority of replication factor".
It choose highest bcsid among all the replica available at SCM and use that.
Further replication manager performs replication of node with highest bcsid and deletion of replica with lesser bcsid.
So if container is not comming out of quasi_closed state, check majority of DN for the container are available or not.
this mechanism is available in 1.3 using LegacyReplicationManager.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi, I have a question about the
QUASI_CLOSED
state containers. I've read the document 1 and a comment in the source 2, and I understand that theQUASI_CLOSED
container is a result when containers have not closed properly with Ratis consensus.So I have some questions about the state:
FORCE_CLOSE
transition 2 as of Ozone 1.3?QUASI_CLOSED
containers to theCLOSED
state? (Does HDDS-7980 fix that?)We've got these errors:
Footnotes
https://docs.google.com/document/d/1vqZoafqIjueqSwoYRIDaCwz9XLpsNoyizlxHXmYrHWI/preview ↩ ↩2
https://github.com/apache/ozone/blob/ozone-1.3/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/container/ContainerStateManager.java#L52 ↩ ↩2
Beta Was this translation helpful? Give feedback.
All reactions