Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

readers.py chokes on rwl when there are repeated series IDs #33

Open
kanchukaitis opened this issue Jun 27, 2022 · 6 comments
Open

readers.py chokes on rwl when there are repeated series IDs #33

kanchukaitis opened this issue Jun 27, 2022 · 6 comments
Assignees

Comments

@kanchukaitis
Copy link
Collaborator

This is a common problem with dplR too - readers.py needs a way to deal with repeated sample IDs (either a verbose warning or a modification of the sample ID (e.g. adding an underscore)). In general we need to test readers.py with a variety of .rwl files (not just the idealized test ones) and we need informative error messages
viet001.rwl.txt

@AndyBunn
Copy link
Collaborator

AndyBunn commented Jun 28, 2022 via email

@CosiMichele
Copy link
Collaborator

CosiMichele commented Aug 22, 2022

With @ifeoluwaale back, we can address this. @kanchukaitis do you have an error message to see? Or is the file you have uploaded here an example of an error causing input file?

Edit: are there any other specific sample files from ITRDB that we can look at? So @ifeoluwaale can break readers.py some more (and fix all the problems)

@kanchukaitis
Copy link
Collaborator Author

Hi @CosiMichele @ifeoluwaale - yeah, the viet001.rwl is giving an error. Ideally, we should now shoot for being able to acquire ANY ITRDB rwl file and read it in (not just the 3 test files we have) - even if there is ultimately a failure, we need to have verbose error output of where the failure is occurring - but yes, let's start with the attached viet001.rwl and go from there

@kanchukaitis
Copy link
Collaborator Author

@CosiMichele @ifeoluwaale - here are some others to try with various challenges:

https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/asia/th001.rwl
https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/northamerica/canada/cana157.rwl
https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/northamerica/canada/cana323.rwl

All three have some classic challenges typical of some of the LDEO rwl files (particular in series names)

@AndyBunn
Copy link
Collaborator

AndyBunn commented Aug 23, 2022 via email

@kanchukaitis kanchukaitis added question Further information is requested and removed question Further information is requested labels Mar 20, 2023
@kanchukaitis
Copy link
Collaborator Author

@ifeoluwaale is looking into whether/how we solved this before we close it. Procedure would be (1) identify repeated sample identifications, (2) warn/yell at user, and then optionally (3) rename one or more series in a predictable way but not using common core IDs (A, B, C ... etc. risk making the problem worse) - opinions @AndyBunn about the best way to deal with this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants