-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-11462. Enhancing DataNode I/O Monitoring Capabilities. #7206
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @slfan1989. I only reviewed the metrics changes for now since the web UI changes are not something I'm too familiar with.
clientProtocolServer = new HddsDatanodeClientProtocolServer( | ||
datanodeDetails, conf, HddsVersionInfo.HDDS_VERSION_INFO, | ||
reconfigurationHandler); | ||
|
||
serviceRuntimeInfo.setRpcPort(String.valueOf(clientProtocolServer.getClientRpcAddress().getPort())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The datanode has multiple RPC ports for various purposes, see here. This one is specifically the one used by the client for config reload. Do other components have an RPC port metric, and do they add more information about what this port is used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you very much for your question!
I hope to present the RPC port needed for client connections, which might be more appropriately named ClientRpcPort
. I should add some comments to make the code clearer.
-
OM
displays theRpcPort (9862)
in JMX.
ozone/hadoop-ozone/ozone-manager/src/main/java/org/apache/hadoop/ozone/om/OMMXBean.java
Lines 28 to 33 in d3899d2
@InterfaceAudience.Private public interface OMMXBean extends ServiceRuntimeInfo { String getRpcPort(); List<List<String>> getRatisRoles(); -
SCM
currently displaysDatanodeRpcPort (9861)
andClientRpcPort (9860)
in JMX.
ozone/hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/server/SCMMXBean.java
Lines 34 to 50 in d3899d2
/** * Get the SCM RPC server port that used to listen to datanode requests. * @return SCM datanode RPC server port */ String getDatanodeRpcPort(); /** * Get the SCM RPC server port that used to listen to client requests. * @return SCM client RPC server port */ String getClientRpcPort(); /** * Returns safe mode status. * @return boolean */ boolean isInSafeMode();
@@ -303,15 +304,17 @@ public void start() { | |||
datanodeDetails.setPort(DatanodeDetails.newPort(HTTPS, | |||
httpServer.getHttpsAddress().getPort())); | |||
} | |||
serviceRuntimeInfo.setHttpPort(String.valueOf(httpServer.getHttpAddress().getPort())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about the HTTPS port and address? are other components publishing these as separate metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct. we should add the corresponding port metric for HTTPS
. As for the hostname
, I believe we can have HTTP
and HTTPS
share the same one, since they are theoretically consistent.
...tainer-service/src/main/java/org/apache/hadoop/ozone/container/common/impl/ContainerSet.java
Outdated
Show resolved
Hide resolved
public void setContainers(long count) { | ||
this.containers = count; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class currently pulls most of its metrics from the volume
field and does not depend on external setters except for dbCompactLatency
. Can we do the same for this metric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I came up with a solution. we will inject the ContainerController
into the HddsVolume
to make it easier to retrieve the container count on each disk. What do you think about this approach?
@@ -110,9 +112,19 @@ public void scanContainer(Container<?> c) | |||
|
|||
@Override | |||
public Iterator<Container<?>> getContainerIterator() { | |||
recordContainersMetric(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only called at the beginning of a background scan, which on dense volumes may happen once every few weeks. Real time updates as containers are moved, created, deleted, should be plugged directly into the ContainerSet
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was very helpful and helped me better understand this part of the code. I’ve improved the code. If you have time, please take a look.
@errose28 Thank you very much for reviewing the code! I will make improvements to it as soon as possible. |
@errose28 I have made some modifications to the PR code. Could you please take some time to review it again? Thank you very much! |
@ivandika3 This pr involves some changes to the display of DN. I'm wondering if you could help with the review? cc: @errose28 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slfan1989 Thanks for the patch. Overall LGTM. Left some comments.
hadoop-hdds/container-service/src/main/resources/webapps/hddsDatanode/dn-overview.html
Show resolved
Hide resolved
hadoop-hdds/container-service/src/main/resources/webapps/hddsDatanode/dn-overview.html
Show resolved
Hide resolved
hadoop-hdds/container-service/src/main/resources/webapps/hddsDatanode/dn.js
Outdated
Show resolved
Hide resolved
hadoop-hdds/framework/src/main/resources/webapps/static/templates/overview.html
Outdated
Show resolved
Hide resolved
hadoop-hdds/container-service/src/main/resources/webapps/hddsDatanode/dn.js
Show resolved
Hide resolved
@ivandika3 Thank you very much for helping to review the code! I will improve it as soon as possible based on your suggestions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@slfan1989 Thanks for the update. LGTM +1.
Please help to do final verification on the pages and update the screenshot in the description page.
hadoop-hdds/container-service/src/main/resources/webapps/hddsDatanode/dn-overview.html
Show resolved
Hide resolved
hadoop-hdds/container-service/src/main/resources/webapps/hddsDatanode/dn.js
Show resolved
Hide resolved
@ivandika3 Thank you very much for reviewing this PR. I will prepare the relevant screenshots as soon as possible.
|
Thanks @slfan1989 for the patch and @errose28 for the review. |
@ivandika3 @errose28 Thank you very much for the review! |
What changes were proposed in this pull request?
In this enhanced functionality, we have added the following features:
What is the link to the Apache JIRA
JIRA: HDDS-11462. Enhancing DataNode I/O Monitoring Capabilities.
How was this patch tested?
Junit Test.