-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New optional variable attribute: "user_unit" #190
Comments
I'm aware of one example of this. The controlled vocabularies for UKCP (UK Climate Projections) include a I'm not against CF recommending an attribute name for 'user units', but if there are already a variety of alternatives in use then adoption of a new name may be slow. |
Thanks @DanHollis, from this I take it that the concept as such is useful. Finding an attribute name that suits everyone might as you say be more difficult. It would be good to know if there are other more or less well established alternatives already in use. @cameronsmith1: in another issue you hinted at similar needs for labeling plots. And, @Dave-Allured, in the same issue you were not entirely satisfied (here and here) with formal restrictions to how/when "by-volume" and other recognised units can be used. Would adding such an optional attribute address your concerns? |
Hi @larsbarring . I am sure many people would find this convenient. However, there is a general aversion within CF to allowing equivalent information to appear more than once. The concern is that file generators may alter one of the metadata items and forget about the other, leaving an inconsistency within the file. This issue has been discussed in other contexts before, and I think this concern has always won the argument. |
Using a standard name without units isn't allowed in CF, so if the canonical units specified in the standard name table are not ... useful, we'd need a new standard name. I dislike the idea of a second units attribute, because, as others have said, providing multiple fields with the same information can cause problems - when they don't agree, and/or when code doesn't know to check multiple fields. |
@ngalbraith If the standard name describes a dimensionless quantity, you may omit the unit attribute: From section 3.1:
Thus far, every time I've personally encountered someone who thought they could not use the units they were accustomed to, it was due to misunderstanding between the relationship of canonical units and their actual units. |
@larsbarring, sorry, I missed your ping. Your proposal is constructive and generous. However I think it is moving in the wrong direction. I feel that alternate labeling mechanisms are symptoms of a larger problem, the deficiency of the I would like to remove some of the traditional restrictions on |
It didn't occur to me that the proposal was only for dimensionless variables, but, if so, then I have no objection at all. Thanks! |
Hello, it is certainly good to try to minimise redundancy, but CF does already embrace attributes that are often overlapping in information content with others. The So, that all said. I'm OK in principle with the possibility of redundancy in the this case of standardised and non-standardised units. This is independent of whether of not the new attribute is considered a good idea! So, on the question of whether or not may be a good idea, I think that it is fine if some groups will find it useful, and I don't think that an extra optional attribute of this nature would be a burden on the conventions. With regards the name of the proposed attribute, I'm not keen on the it including the word "user" - a common software term but not very descriptive for a metadata standard. I quite like "long_units", for the connection with long_name, and that attribute the value will often have more characters (not a great consideration, I admit!). The original proposal suggests that the attribute value would be wholly unstandardised, which is fine, but it occurs to me that you could do an inverse standardisation by saying that it's value can not be a valid CF value. This is easily checked, and would prevent the attribute being accidentally used in place of the real Thanks, |
The In some other datasets I've seen, usually the Unless CF wants to control the contents/values of the attribute, CF should not define or recommend anything other than the usual "a file may also contain non-standard attributes" and you can do whatever you want with them. You can add a "user_units" attribute to your data right now and continue to be CF compliant. I guess my feeling on this is, defining an attribute and not controlling it with the conventions is basically the same as not doing anything with the existing conventions, which allows whatever the user wants already. |
If the "other units" are intended as information to be read by humans, rather than processed by programs, they could be recorded in the |
Hello Jonathan, we could of course put anything in the long_name, but I feel that that would make the other units concept less tractable, because I think that humans and software may want to access the other units independently of the identity of the quantity. For example, if the reader legitimately changed the |
Since the interest is in recording a unit which isn't standardised and may not be udunits- or SI-compliant, I don't think one would expect generic software to touch it. Whenever someone does an operation on a field, they should pay attention to whether the |
Hello Jonathan, what do you think about the case of being able to delete the other units attribute, leaving the long name alone? Software could easily do this, but it could never modify the long_name attribute in this way. |
I think we're probably talking about non-existing software here that might be able to automatically adjust all the metadata automatically whenever a CF-compliant netCDF variable is modified (in ways that can't all be anticipated) and then rewritten. If such software did exist, it would probably have to routinely eliminate the "long_name" and any "comment" (which might no longer apply after the data have been modified) and eliminate any non-CF attributes recorded in the original file (which also might need to be changed in ways unknown to the software). If the new units attribute were made part of CF (as proposed above), it too would have to be eliminated in the case of a change in actual units. More useful software would probably ask the user whether any of the metadata it couldn't interpret should be modified before rewriting the modified data. If that's what's envisioned for software of this kind, then adding another attribute (alternative units) would just be added to the list of attributes the software would have to eliminate. I guess one could therefore justify adding alternative units if enough folks would find it helpful. On the other hand, I think it unlikely this attribute would be defined except in very rare cases. For those cases, I would suggest that the data provider propose the "new units" be added to udunits as a valid unit, rather than modifying the CF conventions. Alternatively, for a sub-community of users who find this attribute helpful, they could mandate it be used (as a non-CF required attribute) for any data they exchange. For CMIP there are more than a dozen non-CF standard attributes that are required to be included in the netCDF files shared under that project (e.g., experiment_id, source_id, among others listed here). I therefore vote against adding an "alternative units" attribute unless there is more evidence provided showing that it will find broad use across the climate and weather forecast community. |
This discussion made me wonder why we have I guess the benefit of prescribing an attribute name is that it encourages all data writers to use the same attribute for similar information (which helps data readers know where to look for such information). However this only really works if it is introduced on day 1. To introduce something similar now (such as I mentioned at the start of this thread that UKCP have a |
IMO Different issue/question it's about replacing/extending UDUNITS as standard reference (with exceptions) for the |
Hello, It occurs to me that CF has, in some sense, always had this feature in that it allows Some potential uses for a non-standardised attribute will, quite rightly, probably never be acceptable to Udunits, nor CF, such as Karl's point about software is a good one. Which properties should be modified or deleted after field has been modified or combined with another field is subjective, and there are various approaches to how a software library behaves by default. For example, in cf-python, if you divide "air_temperature" by "time" the result will have modified units, be stripped of standard and long names; but (e.g.) comment, history, etc will be remain as those that were present on the left hand side operand. I thought that that was probably OK most of the time, but that will not always be the case. Adding a the potential removal of another attribute is, for me, just a another line or two of library code, so is not really a burden. Similarly, if you were doing the aforementioned operation without the benefit of a CF-aware library, then you will either not worry about the metadata because you just need a numbers, or else else you will in which case one more on a small list if concerns seems OK to me. Perhaps a guidance list (website/appendix ?) of standardised attributes that may need attention after field manipulation would be a useful resource. Thanks, |
Thanks David @davidhassell --- your comment captures much of my thinking behind the initial post. I just the other day learned about the Just to briefly respond to some of the earlier comments: @DanHollis writes:
Yes, ideally this should have been solved day 1, but new needs arrive from time to time. And the fact that one -- or several -- subgroup(s) have solved it, possibly in inconsistent ways, do not decrease the usability of a common attribute name. New users will be guided by the recommendation, and there is even a chance that groups with their own solution will at some future time find it useful to switch to what is recommended. After all, this is how standards and conventions arise in the first place. @taylor13 writes:
The particular use case I am involved with is standard name @DocOtak and @JonathanGregory suggests that the alternative unit should be included in the long name instead of having a separate attribute. But the idea behind the proposal is exactly as what @davidhassell suggests: it will be easier to manipulate if the need arises. Take, for example the situation where @ngalbraith: No, the proposed enhancement is not limited to dimensionless variables only. |
Dear @larsbarring, @davidhassell et al. David suggests that when the On the contrary, I think it would be easier to put the user-preferred units in the In Lars's own user case of David also commented
I think that's a good idea, and it could perhaps conveniently be done with an extra flag column in Appendix A. Best wishes Jonathan |
Hello, In @JonathanGregory's example of changing the units from Thanks, |
Hi, Could we use
If there's a regex pattern I can match it to, even better. |
@huard I think xclim should rely on the regular "units" attribute and ignore any additional (and optional) "long_units" attribute, which will likely be included in in less than 0.01% of variables written (that is to say "hardly ever included"). I think adding a new attribute that is unneeded nearly all the time is embellishing CF in a way that makes it less approachable for new users. |
@taylor13 xclim computes a number of indices in the "count_events_above_threshold" category, so there would be a legitimate case for us to support this type of feature if it went into the convention. Our team has had discussions with @larsbarring and his team about this over the last years and we look forward to a clean mechanism to include such information in the metadata. |
Thanks for chiming in. For your use case, how are the units actually going to be used? As labels or titles on a plot? Will your software convert the units to some other units? Could you provide a bit more explanation about how software would use the new attribute? |
Indeed, we do have downstream utilities that use I think our main issue with respect to this topic is that we currently define indicators of the "count_events_above_threshold" with units set to "days", even though we know it's not CF-compliant. We'd like our output to be fully CF-compliant, but not at the expense of leaving out information that we feel is essential to interpret the results. |
In response to both @JonathanGregory's question
and to @taylor13's
there is since at least a year ongoing work related to the WMO ET-CID for expanding the capabilities of the widely used software package CLIMPACT (ping @heroldn) to calculate trends in the supported indices. And if users already now find the canonical unit EDIT: Ahh, and the canonical unit for the trend is |
Dear all David writes
Suppose a quantity in I appreciate the wish for a more familiar unit to describe the quantity, but I continue to feel that putting it in the Best wishes Jonathan |
Dear all @davidhassell and I have just talked about this over a cup of tea. As a result, we feel that a critical question is the one Karl @taylor13 asked: What is the alternative units string going to be used for? If it's intended for labelling plots, the In the use-case of @larsbarring and @huard, with quantities having standard names like Best wishes Jonathan |
Dear Jonathan, Thanks for this comment/questions. I agree that the question @taylor13 asked is at the heart of this issue. And as you in you comment there are [at least] two answers:
|
Dear @larsbarring By all means, let us think of new standard names if that would solve the immediate problem. I remember your other open issues cf-convention/vocabularies#31 and cf-convention/vocabularies#19, which are both productive discussions that seem near to an outcome. Thanks Best wishes Jonathan |
Sometimes the formal requirements associated with the
units
attribute is not fully aligned with what a data producer/user is used to. Examples of this isppmv
and related "by-volume" units that were recently discussed,(see here), and unit1
vs.days
discussed in association with the standard namenumber_of_days_with_X_above|below_threshold
(see here). Also, the unit required byunits
for salinity vs. what is used in practice has been debated on several occasions over the years.To alleviate this situation I suggest to add a new optional variable attribute, where the attribute value is not managed by CF.
If we think of the
long_name
as something like a succinct plot title or table header, this new attribute would provide a [kind-of] associated "unit" users would be familiar with in the case the [usual]units
attribute is felt to be too much restricted by formal requirements.This new attribute could be called "user_unit", "alternative_unit" or something similar, or maybe "long_unit" if we want to more closely link it to the long name. It is expected to be used only when there is a widely recognized difference between the
units
and what is in common use.The text was updated successfully, but these errors were encountered: