-
-
Notifications
You must be signed in to change notification settings - Fork 260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Default field size and decimal length when writing shapefiles #114
Comments
I have no idea if this is still active, as it was created in 2017, but it's open and it looks like it was added to a milestone in 2022, so here we go: First, I see from the docs that 'F' and 'N' are the same -- that is a pity, it would be really good to support an actual integer type. For example, we are writing shape files that have truly integer fields, but they are getting detected as Real by other software -- for example OGR (GDAL) (via Python):
It would be really nice if the integers could come through as integers. I can convert in the client code, but that's a limitation in the discoverability of the data. (in the above: And making 'N' and 'F' be different would help with the defaults -- an integer wold be an integer :-) I haven't looked carefully at the DBF format yet -- the ESRI shapefile spec helpfully (not) just references the DBASE spec -- without even a link :-( But according to wikipedia: """
It doesn't sound like anything over 12 is supported anyway. But if you are correct that it's an option to go larger, maybe these values are reasonable defaults?
This is a serious challenge -- there simply is no default if you have to have a fixed number of decimal places -- it depends on the order of magnitude of the number. Is actual floating point not an option (that is: 1.234e10 and 1.234e-10 -- same amount of precision, totally different number of places after the decimal point) If it does have to be fixed, I think there should be no default -- it depends on what data you are trying to store, and only the person writing the data can know what's appropriate. (I got to say -- it is really bad that we are so dependent on such an ancient file format! -- but what can you do?)
wait! looking now at your docs, it seems it DOES support true float. e.g:
In that case, a C float is about 8 decimal digits, and double 16 -- a Python double is 16 digits. So 8 or 16 digits would be reasonable defaults. For integers, 64 bit ints support 20 digits, but those are really big, 32 bit ints are, I think 10 digits, so not a bad default. If I'm totally wrong here, do you have a pointer to the spec for the DBF format as used by shapefiles? I haven't been able to find it yet. I did find this: http://www.manmrk.net/tutorials/database/xbase/data_types.html#DATA_TYPES |
Looking a bit more, perhaps you could follow similar defaults, etc to the OGR Shapefile writer: (https://gdal.org/drivers/vector/shapefile.html)
|
Due to recent changes since 1.2.10, the issue of field and value types have been raised as a concern by several users. Most recently, @klasko2 pointed out in #99 that saving a float value to an 'F' field will save it as an integer, because the default number of decimals is 0 when defining a new field. This begs the more general question for the next version of PyShp:
What should be the default field 'size' and 'decimal' for different field types?
I hope this thread can be used as a place for people to voice their concerns and share their experiences and expectations regarding shapefiles and dbf field types.
The Issue
Until now, field size (i.e. how many bytes) has been always set to 50, and decimal always to 0.
Instead, I think the case can be made that any numeric field should default to a decimal number. This leaves us with some open questions:
-100000000000000000000000000000000000000000000000.0
, or as detailed as-0.000000000000000000000000000000000000000000000001
(provided the decimal arg is set accordingly)? That might actually seem excessively high for most users so perhaps it should be lowered to produce smaller shapefiles? What's the default in other software?0.123456
. Perhaps this is too small, should it be instead 12 or 16? What's the default in other software?For the remaining field types I think the following would be non-controversial:
abcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcdeabcde
.Any and all thoughts are appreciated!
The text was updated successfully, but these errors were encountered: