-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[New Feature]: Remove superfluous data from DISP-S1 product GRQ ES #1035
Labels
Comments
philipjyoon
added
enhancement
New feature or request
needs triage
Issue that requires triage
labels
Dec 4, 2024
philipjyoon
added a commit
that referenced
this issue
Dec 5, 2024
…han the .nc file. Currently we are repeating the exact same metadata for .xml, .png, and so on that are uncessary and makes the DB huge
philipjyoon
added a commit
that referenced
this issue
Dec 6, 2024
…need to commit, deploy, and then test
philipjyoon
added a commit
that referenced
this issue
Dec 6, 2024
…de is hard to test so we need to commit, deploy, and then test
philipjyoon
added a commit
that referenced
this issue
Dec 6, 2024
philipjyoon
added a commit
that referenced
this issue
Dec 6, 2024
philipjyoon
added a commit
that referenced
this issue
Dec 9, 2024
philipjyoon
added a commit
that referenced
this issue
Dec 9, 2024
philipjyoon
added a commit
that referenced
this issue
Dec 9, 2024
philipjyoon
added a commit
that referenced
this issue
Dec 10, 2024
philipjyoon
added a commit
that referenced
this issue
Dec 10, 2024
philipjyoon
added a commit
that referenced
this issue
Dec 10, 2024
philipjyoon
added a commit
that referenced
this issue
Dec 12, 2024
…re repetitive and take up large amount of space
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Checked for duplicates
Yes - I've already checked
Alternatives considered
Yes - and alternatives don't suffice
Related problems
No response
Describe the feature request
From Slack:
"Looking at the actual DISP-S1 product metadata, it's repeats the exact same information over and over again and that's why it's got 1mb worth of data. It's listing lineage and input file list (which themselves are redundant with each other) 4 times over for each .nc .png .iso.xml files etc.
Is this all really necessary? As far as I'm aware we need this metadata for two purposes: 1) enable bach-ui (do we even run bach-ui anymore? and 2) DAAC consumption."
"
Ok I've looked at: product2dataset.py opera_pge_wrapper.py send_notify_msg.py send_notify_msg.sh bach api utils source code, hysds-io.json.send_notify_msg the docker files for cnm send, GRQ rule for triggering cnm send job and I'm not seeing anything that consumes the DISP-S1 input file list, lineage list, and the localize list. I guess we store all that information for debugging purposes perhaps.
Those 3 lists repeat 4 times total; each list is about 500 lines of json. So that's roughly 6000 lines of json out of 6650 lines of json. If we get rid of 3 times out of 4 and then get rid of the localize and input file list; just keep one set of lineage then we end up with a ~1200 line json file, which is 80% reduction in data volume. That will yield 200GB end of production volume as previous 1TB. I think that would be reasonable volume.
It looks like there are 4 products in the json and we are copying-pasting the identical runconfig to each of them. So perhaps we could just get rid of that altogether and gain more space savings and may make this process a lil easier too. Clearly the run config used to generate the png or iso_xml for the same product is identical.
"
The text was updated successfully, but these errors were encountered: