Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project proposal: Registry Data Quality Improvements #2246

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

svrnm
Copy link
Member

@svrnm svrnm commented Jul 31, 2024

This has been discussed at multiple places, I was trying to write this down in different shapes and forms before, but @mx-psi's issue (open-telemetry/opentelemetry.io#4921) gave me the urgently needed last kick to create and publish this.

Note that this should not be a blocker for the mentioned issue (io4921), since it is related to the collector1.0 release.

@samsp-msft
Copy link

I was pointed at this by @CodeBlanch.

For areas like opentelemetry-dotnet-contrib, it's unclear once you start delving into the component tree as to what the supported state of the component is, who the owners are and policies for updates etc. If a lot of that data is going to be schematized into the component yaml - then can we also have a component that can be included in the readme to return the data in a common format, like a badge? Maybe this also acts as hyperlink into the registry page for that component?
Maybe also a github action that will look for and validate the yaml, error if its invalid, if not push a notification to the central repository to scrape the data or something?

@svrnm
Copy link
Member Author

svrnm commented Aug 1, 2024

For areas like opentelemetry-dotnet-contrib, it's unclear once you start delving into the component tree as to what the supported state of the component is, who the owners are and policies for updates etc.

This is not unique to .NET contrib, Collector SIG started to address this with mdatagen, that's why I called it out as a starting point.

If a lot of that data is going to be schematized into the component yaml - then can we also have a component that can be included in the readme to return the data in a common format, like a badge?

That would be great, yes!

Maybe this also acts as hyperlink into the registry page for that component?

Yes:-)

Maybe also a github action that will look for and validate the yaml, error if its invalid, if not push a notification to the central repository to scrape the data or something?

That's part of the automation I was thinking about. We have a scrapper that is working on data from README.md and some other heuristics, which is far from optional. Having a standardized YAML + schema across the project (and ecosystem), is what this project is about!

@theletterf
Copy link
Member

The parent issue is Collector related, but other OTel projects that produce metadata (such as language SDKs) might have different schemas or needs. That complicates things a bit. Would working on a unified metadata schema help here?

Context: We pull in upstream metadata into Splunk docs, or produce it downstream, and it's challenging.

@svrnm svrnm added the area/project-proposal Submitting a filled out project template label Aug 5, 2024
@svrnm
Copy link
Member Author

svrnm commented Aug 6, 2024

The parent issue is Collector related, but other OTel projects that produce metadata (such as language SDKs) might have different schemas or needs. That complicates things a bit. Would working on a unified metadata schema help here?

That's what this project is about (80%, the rest is tooling&automation)

projects/registry-data-quality.md Outdated Show resolved Hide resolved
projects/registry-data-quality.md Outdated Show resolved Hide resolved
2. After a package has been added to the registry, it is a manual process to verify if the package is still available and
if the meta-data is still correct.
2. Packages lack a lot of information, and the information available is sometimes of bad quality. Metadata that might be available
for packages is not in a human readable format, or in the case of the collector (`mdatagen`-data) not consumed by the registry.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a technical reason why it is not consumed? Or is it just "we have not implemented this"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a mix of "we have not implemented this" and let's make sure that what we consume is in an agreed format (hence this project proposal)

and how useable is that data. If we address those issues end users will see the registry as a valuable place and will use
it to find the building blocks they need, and by reaching that state, we can use the registry to accomplish other goals, like

- making end users aware of non-otel-community created components they can use. Combined with a system to label "good quality"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is another whole project of its own 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, this project would be the foundation for such another project

@svrnm svrnm mentioned this pull request Sep 4, 2024
Copy link
Member

@mx-psi mx-psi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. I guess the big question is the staffing, but the project itself seems sound and important

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/project-proposal Submitting a filled out project template
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants