Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11462. Enhancing DataNode I/O Monitoring Capabilities. #7206

Merged
merged 3 commits into from
Nov 8, 2024

Conversation

slfan1989
Copy link
Contributor

@slfan1989 slfan1989 commented Sep 16, 2024

What changes were proposed in this pull request?

In this enhanced functionality, we have added the following features:

  1. DataNode UI now supports displaying heartbeat information.

image

  1. The DataNode Volume list now shows the number of containers.

image

  1. DataNode includes new tabs for I/O Stats and Data Scanner.

IO Status

image

Data Scanner

image

  1. I/O Stats now supports more detailed performance metrics.
{
    "name" : "Hadoop:service=HddsDatanode,name=VolumeIOStats-/data2/ozonedata/hddsdata",
    "modelerType" : "VolumeIOStats-/data2/ozonedata/hddsdata",
    "tag.StorageDirectory" : "/data2/ozonedata/hddsdata/hdds",
    "tag.Hostname" : "bigdata-s2688-hdp446.apache01",
    "ReadLatency60sNumOps" : 0,
    "ReadLatency60s50thPercentileLatency" : 0,
    "ReadLatency60s75thPercentileLatency" : 0,
    "ReadLatency60s90thPercentileLatency" : 0,
    "ReadLatency60s95thPercentileLatency" : 0,
    "ReadLatency60s99thPercentileLatency" : 0,
    "WriteLatency60sNumOps" : 0,
    "WriteLatency60s50thPercentileLatency" : 0,
    "WriteLatency60s75thPercentileLatency" : 0,
    "WriteLatency60s90thPercentileLatency" : 0,
    "WriteLatency60s95thPercentileLatency" : 0,
    "WriteLatency60s99thPercentileLatency" : 0,
    "ReadBytes" : 128011956,
    "ReadOpCount" : 131,
    "ReadTimeNumOps" : 131,
    "ReadTimeAvgTime" : 1.3125,
    "WriteBytes" : 0,
    "WriteOpCount" : 0,
    "WriteTimeNumOps" : 0,
    "WriteTimeAvgTime" : 0.0
  }
  1. Optimized the display of JVM information in the DataNode UI.

image

image

What is the link to the Apache JIRA

JIRA: HDDS-11462. Enhancing DataNode I/O Monitoring Capabilities.

How was this patch tested?

Junit Test.

@slfan1989 slfan1989 marked this pull request as ready for review September 16, 2024 11:22
Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @slfan1989. I only reviewed the metrics changes for now since the web UI changes are not something I'm too familiar with.

clientProtocolServer = new HddsDatanodeClientProtocolServer(
datanodeDetails, conf, HddsVersionInfo.HDDS_VERSION_INFO,
reconfigurationHandler);

serviceRuntimeInfo.setRpcPort(String.valueOf(clientProtocolServer.getClientRpcAddress().getPort()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The datanode has multiple RPC ports for various purposes, see here. This one is specifically the one used by the client for config reload. Do other components have an RPC port metric, and do they add more information about what this port is used for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your question!

I hope to present the RPC port needed for client connections, which might be more appropriately named ClientRpcPort. I should add some comments to make the code clearer.

@@ -303,15 +304,17 @@ public void start() {
datanodeDetails.setPort(DatanodeDetails.newPort(HTTPS,
httpServer.getHttpsAddress().getPort()));
}
serviceRuntimeInfo.setHttpPort(String.valueOf(httpServer.getHttpAddress().getPort()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the HTTPS port and address? are other components publishing these as separate metrics?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct. we should add the corresponding port metric for HTTPS. As for the hostname, I believe we can have HTTP and HTTPS share the same one, since they are theoretically consistent.

Comment on lines 165 to 163
public void setContainers(long count) {
this.containers = count;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class currently pulls most of its metrics from the volume field and does not depend on external setters except for dbCompactLatency. Can we do the same for this metric?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I came up with a solution. we will inject the ContainerController into the HddsVolume to make it easier to retrieve the container count on each disk. What do you think about this approach?

@@ -110,9 +112,19 @@ public void scanContainer(Container<?> c)

@Override
public Iterator<Container<?>> getContainerIterator() {
recordContainersMetric();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only called at the beginning of a background scan, which on dense volumes may happen once every few weeks. Real time updates as containers are moved, created, deleted, should be plugged directly into the ContainerSet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was very helpful and helped me better understand this part of the code. I’ve improved the code. If you have time, please take a look.

@slfan1989
Copy link
Contributor Author

Thanks for working on this @slfan1989. I only reviewed the metrics changes for now since the web UI changes are not something I'm too familiar with.

@errose28 Thank you very much for reviewing the code! I will make improvements to it as soon as possible.

@slfan1989
Copy link
Contributor Author

@errose28 I have made some modifications to the PR code. Could you please take some time to review it again? Thank you very much!

@slfan1989
Copy link
Contributor Author

@ivandika3 This pr involves some changes to the display of DN. I'm wondering if you could help with the review?
I want to keep pushing this PR forward.

cc: @errose28

@ivandika3 ivandika3 self-requested a review October 16, 2024 07:00
Copy link
Contributor

@ivandika3 ivandika3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfan1989 Thanks for the patch. Overall LGTM. Left some comments.

@slfan1989
Copy link
Contributor Author

@slfan1989 Thanks for the patch. Overall LGTM. Left some comments.

@ivandika3 Thank you very much for helping to review the code! I will improve it as soon as possible based on your suggestions.

Copy link
Contributor

@ivandika3 ivandika3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@slfan1989 Thanks for the update. LGTM +1.

Please help to do final verification on the pages and update the screenshot in the description page.

@slfan1989
Copy link
Contributor Author

slfan1989 commented Nov 7, 2024

@slfan1989 Thanks for the update. LGTM +1.

Please help to do final verification on the pages and update the screenshot in the description page.

@ivandika3 Thank you very much for reviewing this PR. I will prepare the relevant screenshots as soon as possible.

Overview

image

image

IOStatus

image

Data Scanner

image

@ivandika3 ivandika3 merged commit 4e603aa into apache:master Nov 8, 2024
40 checks passed
@ivandika3
Copy link
Contributor

Thanks @slfan1989 for the patch and @errose28 for the review.

@slfan1989
Copy link
Contributor Author

@ivandika3 @errose28 Thank you very much for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants