Skip to content

ES-10063 Add multi-project support for more stats APIs #127650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

PeteGillinElastic
Copy link
Member

@PeteGillinElastic PeteGillinElastic commented May 2, 2025

This affects the following APIs:

  • GET _nodes/stats:
    • For indices, it now prefixes the index name with the project ID (for non-default projects). Previously, it didn't tell you which project an index was in, and it failed if two projects had the same index name.
    • For ingest, it now gets the pipeline and processor stats for all projects, and prefixes the pipeline ID with the project ID. Previously, it only got them for the default project.
  • GET /_cluster/stats:
    • For ingest, it now aggregates the pipeline and processor stats for all projects. Previously, it only got them for the default project.
  • GET /_info:
    • For ingest, same as for GET /_nodes/stats.

It does not add multi-project support for the repositories section on _nodes/stats, since repositories aren't MP-ready yet AFAIK.

This is done by making IndicesService.stats() and IngestService.stats() include project IDs in the NodeIndicesStats and IngestStats objects they return, and making those stats objects incorporate the project IDs when converting to XContent.

The transitive callers of these two methods are rather extensive (including all callers to NodeService.stats(), all callers of TransportNodesStatsAction, and so on). To ensure the change is safe, the callers were all checked out, and they fall into the following cases:

  • The behaviour change is one of the desired enhancements described above.
  • There is no behaviour change because it was getting node stats but neither indices nor ingest stats were requested.
  • There is no behaviour change because it was getting indices and/or ingest stats but only using aggregate values.
  • In MachineLearningUsageTransportAction and TransportGetTrainedModelsStatsAction, the IngestStats returned will return stats from all projects instead of just the default with this change, but they have been changed to filter the non-default project stats out, so this change is a noop there. (These actions are not MP-ready yet.)
  • MonitoringService will be affected, but this is the legacy monitoring module which is not in use anywhere that MP is going to be enabled. (If anything, the behaviour is probably improved by this change, as it will now include project IDs, rather than producing ambiguous unqualified results and failing in the case of duplicates.)

This affects the following APIs:
 - `GET _nodes/stats`:
   - For `indices`, it now prefixes the index name with the project ID (for non-default projects). Previously, it didn't tell you which project an index was in, and it failed if two projects had the same index name.
   - For `ingest`, it now gets the pipeline and processor stats for all projects, and prefixes the pipeline ID with the project ID. Previously, it only got them for the default project.
 - `GET /_cluster/stats`:
   - For `ingest`, it now aggregates the pipeline and processor stats for all projects. Previously, it only got them for the default project.
 - `GET /_info`:
   - For `ingest`, same as for `GET /_nodes/stats`.

This is done by making `IndicesService.stats()` and `IngestService.stats()` include project IDs in the `NodeIndicesStats` and `IngestStats` objects they return, and making those stats objects incorporate the project IDs when converting to XContent.

The transitive callers of these two methods are rather extensive (including all callers to `NodeService.stats()`, all callers of `TransportNodesStatsAction`, and so on). To ensure the change is safe, the callers were all checked out, and they fall into the following cases:
 - The behaviour change is one of the desired enhancements described above.
 - There is no behaviour change because it was getting node stats but neither `indices` nor `ingest` stats were requested.
 - There is no behaviour change because it was getting `indices` and/or `ingest` stats but only using aggregate values.
 - In `MachineLearningUsageTransportAction` and `TransportGetTrainedModelsStatsAction`, the `IngestStats` returned will return stats from all projects instead of just the default with this change, but they have been changed to filter the non-default project stats out, so this change is a noop there. (These actions are not MP-ready yet.)
 - `MonitoringService` will be affected, but this is the legacy monitoring module which is not in use anywhere that MP is going to be enabled. (If anything, the behaviour is probably improved by this change, as it will now include project IDs, rather than producing ambiguous unqualified results and failing in the case of duplicates.)
@PeteGillinElastic PeteGillinElastic added >non-issue :Data Management/Stats Statistics tracking and retrieval APIs v9.1.0 labels May 2, 2025
@PeteGillinElastic PeteGillinElastic marked this pull request as ready for review May 2, 2025 16:40
@elasticsearchmachine elasticsearchmachine added the Team:Data Management Meta label for data/management team label May 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-data-management (Team:Data Management)

@PeteGillinElastic PeteGillinElastic changed the title Add multi-project support for more stats APIs ES-10063 Add multi-project support for more stats APIs May 2, 2025
Copy link
Contributor

@nielsbauman nielsbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for working on this, Pete! I did a first round of review by just looking at the production files. I'll have a look at the tests later on. This feels like the right direction; my comments are mostly small and/or thinking out loud.

Comment on lines +717 to +722
for (Map.Entry<
ProjectId,
Map<String, List<org.elasticsearch.ingest.IngestStats.ProcessorStat>>> processorStatsForProject : nodeStat
.getIngestStats()
.processorStats()
.entrySet()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we extract some variables here to keep this a bit more readable? The multi-line types inside a for loop declaration is a bit intense. Same goes for the loop below.

Also, is there a reason this class is fully qualified instead of imported? I don't immediately see any name conflicts but maybe I'm missing something. I still think it's better to extract variables - I'm personally not a big fan of multi-line for loop declarations but maybe that's just me - but it would already help to import the class.

String,
List<org.elasticsearch.ingest.IngestStats.ProcessorStat>> processorStats : processorStatsForProject.getValue()
.entrySet()) {
pipelineIdsByProject.computeIfAbsent(projectId, k -> new HashSet<>()).add(processorStats.getKey());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to keep a map of all the pipeline IDs, right? Since we only need the count in the end, we can just keep one set when we loop over the processors in the project and then add the size of the set to an int/long variable that's initialized outside all the loops (technically it could cause an integer overflow, but I don't think that's something we really need to protect against here).

Comment on lines +104 to +105
* Constructs an instance. If {@code projectsByIndex} argument is non-null, then the index-to-project map will be stored, and the
* project IDs will be prepended to the index names when converting this instance to XContent (except when it is the default project).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you mean empty instead of non-null. Also, I think the then the index-to-project map will be stored part is not relevant anymore as we're always storing that field? But maybe I'm misinterpreting it.

Comment on lines +322 to +324
if (Objects.equals(projectId, Metadata.DEFAULT_PROJECT_ID)) {
return index.getName();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little bit on the fence about this approach. Thinking out loud: on the one hand, this maintains BwC with little effort. On the other hand, if we ever have indices in the default project this'll be a bit off. It's been mentioned a handful of times that we might want to consider (so, super hypothetically) having a shared cluster-level GeoIP index to avoid downloading the same DB over again. We shouldn't make this decision purely based on the GeoIP index, but the idea of the faint possibility of cluster-level indices is something to think about. At the same time, this approach doesn't "break" anything in that case either, it just makes the output inconsistent. So, we either try to account for that situation proactively now, or we decide to go with this approach and we solve that issue if/when it comes. Any thoughts?

static List<PipelineStat> merge(List<PipelineStat> first, List<PipelineStat> second) {
var totalsPerPipeline = new HashMap<String, PipelineStat>();
record MergeKey(ProjectId projectId, String pipelineId) {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL you could define records inside methods! Do you have any idea how that compiles? I'm mostly curious if the compiler extracts that record definition somewhere or whether it's going to redefine that record on every invocation of this method.

@@ -11,7 +11,7 @@ esplugin {

dependencies {
testImplementation project(path: ':test:test-clusters')
clusterModules project(':test:external-modules:test-multi-project')
clusterModules(project(':test:external-modules:test-multi-project'), project(":modules:ingest-common"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the more common syntax is:

Suggested change
clusterModules(project(':test:external-modules:test-multi-project'), project(":modules:ingest-common"))
clusterModules project(':test:external-modules:test-multi-project')
clusterModules project(":modules:ingest-common")

@@ -79,7 +79,7 @@ static NodeIndicesStats buildNodeIndicesStats(RoutingNode routingNode, long byte
IndexShardStats shardStats = new IndexShardStats(shardId, new ShardStats[] { shardStat });
indexStats.computeIfAbsent(shardId.getIndex(), k -> new ArrayList<>()).add(shardStats);
}
return new NodeIndicesStats(COMMON_STATS, Map.of(), indexStats, true);
return new NodeIndicesStats(COMMON_STATS, Map.of(), indexStats, Map.of(), true);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we add a (small) comment here why were using Map.of() for the projectsByIndex parameter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some YAML tests from the ingest-common module are muted in this file because they depend on stats APIs:

    '^test/ingest/15_info_ingest/*', // uses cluster info API
    '^test/ingest/70_bulk/Test bulk request with default pipeline', // uses stats API
    '^test/ingest/70_bulk/Test bulk request without default pipeline', // uses stats API

However, because we decided to prefix the project ID, we won't be able to unmute those tests... I don't immediately have a good suggestion, but I thought I'd mention it so we're aware. If we decide that that is OK, we should move these lines to the bottom so they're next to the comment explaining why some stats API tests are muted in MP.

Comment on lines +576 to +579
return indices.values()
.stream()
.map(AbstractIndexComponent::index)
.collect(Collectors.toMap(index -> index, index -> clusterService.state().metadata().projectFor(index).id()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if it's worth it to build this map manually instead of using Streams, for performance reasons. I believe the stats API is called pretty regularly by monitoring agents (but I could be mistaken), and if we're doing this for all indices in all projects, the difference between Streams and non-streams might be noticeable. What do you think?

Comment on lines +576 to +579
return indices.values()
.stream()
.map(AbstractIndexComponent::index)
.collect(Collectors.toMap(index -> index, index -> clusterService.state().metadata().projectFor(index).id()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking out loud: by using Metadata#projectFor, we throw an exception if the index is not found. I'm not familiar enough with the IndicesService to know whether there is a possibility of the service being "behind" the state in ClusterService. For instance, when an index is deleted from the cluster state, is the indices service updated atomically? Or if an index is added, is it first added to the cluster state or the IndicesService? If you don't already have an answer to those questions (which is completely fine), I think it's worth having a look to be sure what kind of behavior we could see here.

Copy link
Contributor

@samxbr samxbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall! I haven't gone over the tests yet, will run another pass for the tests tomorrow.

public NodeIndicesStats(
CommonStats oldStats,
Map<Index, CommonStats> statsByIndex,
Map<Index, List<IndexShardStats>> statsByShard,
Map<Index, ProjectId> projectsByIndex,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I like this approach of passing in the index to project map

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Data Management/Stats Statistics tracking and retrieval APIs >non-issue Team:Data Management Meta label for data/management team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants