Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documenting the available test data #141

Closed
jypeter opened this issue May 31, 2017 · 8 comments
Closed

Documenting the available test data #141

jypeter opened this issue May 31, 2017 · 8 comments

Comments

@jypeter
Copy link
Member

jypeter commented May 31, 2017

Could we have a web page documenting the content of uvcdat/sample_data? It's not always easy to infer the content of a file from just the file name and extension, and there is no README file

Information like:

  • variable name and shape (useful if you want to know which variables are 3D, or how may time steps are available)
  • what type of grid (regular, etc...)
  • what the variable can be used to test
  • optionally where the variable comes from
  • ...

I usually use clt.nc and otherwise I have to make an educated guess and do a ncdump -h on some files till I find what I need

I use the following to download the test data, but I don't remember if this is documented somewhere

python -c 'import vcs; vcs.download_sample_data_files(); print "\nFinished downloading sample data to", vcs.sample_data'

@dnadeau4
Copy link
Contributor

dnadeau4 commented May 31, 2017

Acutally @doutriaux1 is downloading sample_data on-demand when you run python run_tests.py in CDMS. The file needed to run the tests are listed in share/test_data_files.txt

@doutriaux1
Copy link
Contributor

@jypeter the test suite now downloads only the data it needs as @dnadeau4 mentioned. The sample_data is so old now it is really mainly relevant for the test suites only.

@jypeter
Copy link
Member Author

jypeter commented Jun 1, 2017

Well, nobody cares about how old the data is, as long as it is suitable for the tests or tutorials

I think it would be useful to explicitly tell people how to download and use the sample data if they want to make a simple script to reproduce a bug (rather than having to upload their own data somewhere), or for teaching purpose.

I have just remembered that I have 2 open issues on related but slightly different topics:

@durack1
Copy link
Member

durack1 commented Jun 1, 2017

@jypeter you have some great ideas, and I'd also guess some great demo code that should be included in the repo.. @doutriaux1 where would it be best for @jypeter to drop new code in a PR? It seems rather than talking about this in issues @jypeter the fastest path might be to direct you to drop these suggestions into a PR so the code/pieces can be pulled into the master branch

@jypeter
Copy link
Member Author

jypeter commented Jun 1, 2017

Thanks @durack1 but unfortunately the only thing I know how to do with git(hub) is creating issues! I don't have experienced git users around me, and you guys are a bit too far away for giving me a walkthrough (probably more than one would be required)

I'm more comfortable reporting what might be corrected/improved and let you guys decide what to do with it (and when), and implement/test it, depending on your cdat roadmap. I'm not even supposed to spend time on this (but I know it's useful for us in the short and long run), so taking (quite some) time to open issues related to my usage of cdat is about as much help I can give.

@dnadeau4
Copy link
Contributor

dnadeau4 commented Jan 9, 2018

@jypeter I created this quick and dirty page for now.

http://cdms.readthedocs.io/en/cdmsdocsmerge/manual/sample_data.html

@dnadeau4 dnadeau4 closed this as completed Jan 9, 2018
@jypeter
Copy link
Member Author

jypeter commented Jan 10, 2018

Thanks @dnadeau4 ! This will make it possible to put a reference link to the data in test scripts, if need be. What I had in mind was also describing which kind of tests each test file was suited for, but someone would need to spend time on it.

Something quick and dirty workaround would be to put a link to the text output of ncdump -h for each file below the links to the nc files, so that people can get an idea of what's in the file without having to download it. The kind of info you'd get from a dods server, without needing the dods server

@dnadeau4
Copy link
Contributor

dnadeau4 commented Apr 2, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants