diff --git a/appd.adoc b/appd.adoc index 4561359c..4561b277 100644 --- a/appd.adoc +++ b/appd.adoc @@ -279,8 +279,8 @@ No `standard_name` has been defined for `C` or `depth_c`. === Ocean sigma over z coordinate -**The description of this type of parametric vertical coordinate is defective in version 1.8 and earlier versions of the standard, in that it does not state what values the vertical coordinate variable should contain. -Therefore, in accordance with the rules, all versions of the standard before 1.9 are deprecated for datasets that use the "ocean sigma over z" coordinate.** +**The description of this type of parametric vertical coordinate is defective in version 1.8 and earlier versions of these conventions, in that it does not state what values the vertical coordinate variable should contain. +Therefore, in accordance with the rules, all versions of the conventions before 1.9 are deprecated for datasets that use the "ocean sigma over z" coordinate.** ---- standard_name = "ocean_sigma_z_coordinate" diff --git a/appf.adoc b/appf.adoc index 8c819518..7c291148 100644 --- a/appf.adoc +++ b/appf.adoc @@ -17,7 +17,7 @@ These are: - `semi_major_axis` - `semi_minor_axis` -In general we have used the FGDC "Content Standard for Digital Geospatial Metadata" <> as a guide in choosing the values for **`grid_mapping_name`** and the attribute names for the parameters describing map projections. +In general, the FGDC "Content Standard for Digital Geospatial Metadata" <> is used as a guide in choosing the values for **`grid_mapping_name`** and the attribute names for the parameters describing map projections. === Albers Equal Area @@ -84,7 +84,7 @@ This model is independent of the physical scan principles of any observing instr The model consists conceptually of a set of two rotating circles with a colocated centre, whose axes of rotation are perpendicular to each other. The axis of the outer circle is stationary, while the axis of the inner circle moves about the stationary axis. This means that a given viewing angle described using this model is the result of matrix multiplications, which is not commutative, so that order of operations is essential in achieving accurate results. -The two axes are conventionally called the sweep-angle and fixed-angle axes; we adhere to this terminology, although some find these terms confusing, for the sake of interoperability with existing implementations. +The two axes are conventionally called the sweep-angle and fixed-angle axes; this terminology is used here, although some find these terms confusing, for the sake of interoperability with existing implementations. + The algorithm for computing the mapping may be found at link:$$https://www.cgms-info.org/documents/pdf_cgms_03.pdf$$[https://www.cgms-info.org/documents/pdf_cgms_03.pdf]. diff --git a/apph.adoc b/apph.adoc index 29eaa434..88f71f8a 100644 --- a/apph.adoc +++ b/apph.adoc @@ -233,7 +233,7 @@ When the intention of a data variable is to contain only a single time series, t ==== While an idealized time series is defined at a single, stable point location, there are examples of time series, such as cabled ocean surface mooring measurements, in which the precise position of the observations varies slightly from a nominal fixed point. It is quite common that the deployment position of a station changes after maintenance or repositioning after it drifts. -In the following example we show how the spatial positions of such a time series should be encoded in CF. In addition, this example shows how lossless compression by gathering <> has been applied to the deployment coordinate variables, which otherwise would contain a lot of missing or repetitive data. +The following example shows how the spatial positions of such a time series should be encoded in CF. In addition, this example shows how lossless compression by gathering <> has been applied to the deployment coordinate variables, which otherwise would contain a lot of missing or repetitive data. Note that although this example shows only a single time series, the technique is applicable to all of the representations. [[example-h.5]] @@ -1203,7 +1203,7 @@ In the latter case, listing the vertical coordinate variable in the coordinates ==== Ragged array representation of time series profiles When the number of profiles and levels for each station varies, one can use a ragged array representation. -Each of the two element dimensions (time and vertical) could in principle be stored either contiguous or indexed, but this convention supports only one of the four possible choices. +Each of the two element dimensions (time and vertical) could in principle be stored either contiguous or indexed, but these conventions support only one of the four possible choices. This uses the contiguous ragged array representation for each profile (9.5.43.3), and the indexed ragged array representation to organise the profiles into time series (9.3.54). The canonical use case is when writing real-time data streams that contain profiles from many stations, arriving randomly, with the data for each entire profile written all at once. @@ -1443,7 +1443,7 @@ In the latter case, listing the vertical coordinate variable in the coordinates ==== Ragged array representation of trajectory profiles When the number of profiles and levels for each trajectory varies, one can use a ragged array representation. -Each of the two element dimensions (along a trajectory, within a profile) could in principle be stored either contiguous or indexed, but this convention supports only one of the four possible choices. +Each of the two element dimensions (along a trajectory, within a profile) could in principle be stored either contiguous or indexed, but these conventions support only one of the four possible choices. This uses the contiguous ragged array representation for each profile (9.3.3), and the indexed ragged array representation to organise the profiles into time series (9.3.4). The canonical use case is when writing real-time data streams that contain profiles from many trajectories, arriving randomly, with the data for each entire profile written all at once. diff --git a/appm.adoc b/appm.adoc index c1f7a7b2..32d3791b 100644 --- a/appm.adoc +++ b/appm.adoc @@ -5,7 +5,7 @@ == Leap Seconds This appendix describes the treatment of leap seconds in CF in more detail than <>, and provides guidance for writers of datasets about the choice of a suitable CF calendar. -Because precision to the second has rarely been needed in the climate and forecast community, leap seconds have typically been ignored, including in many datasets and versions of the CF standard before 1.12. +Because precision to the second has rarely been needed in the climate and forecast community, leap seconds have typically been ignored, including in many datasets and versions of the CF conventions before 1.12. In CF 1.13 and later, __in all calendars except the **`utc`** calendar__, @@ -13,7 +13,7 @@ In CF 1.13 and later, __in all calendars except the **`utc`** calendar__, * the difference is always 60 seconds between the time coordinates for the start of consecutive minutes. -If you are producing model-generated datasets with datetimes that follow the Gregorian calendar, the **`proleptic_gregorian`** calendar is recommended, because it unequivocally indicates to the user of the dataset that leap seconds are not included in the timeline (we assume this is true for model data). +If you are producing model-generated datasets with datetimes that follow the Gregorian calendar, the **`proleptic_gregorian`** calendar is recommended, because it unequivocally indicates to the user of the dataset that leap seconds are not included in the timeline (it is assumed that this is true for model data). On these grounds, it is preferable to the **`standard`** calendar, which is ambiguous in CF versions before 1.13 about the inclusion of leap seconds (see below). If you are producing real-world datasets with datetimes in UTC, and if it's important for the datetimes and time intervals to be accurate to the second, the **`utc`** calendar is recommended. @@ -34,7 +34,7 @@ It has the following consequences: * The difference between two time coordinates will differ from the duration of the time interval between the two instants by the net number of leap seconds that occurred between them. -We illustrate the differences between the **`utc`** and **`standard`** calendars with examples related to the leap second that was added to UTC at the end of 2016. +The differences between the **`utc`** and **`standard`** calendars are illustrated with examples related to the leap second that was added to UTC at the end of 2016. If **`calendar="utc"`** and **`units="seconds since 2016-12-31 23:59:58"`**, a value of 4 for the time coordinate represents the datetime 2017-01-01 00:00:01, because four seconds had elapsed by that instant since 2016-12-31 23:59:58 (seconds ending at 2016-12-31 23:59:59, 2016-12-31 23:59:60, 2017-01-01 00:00:00, 2017-01-01 00:00:01). If **`calendar="standard"`**, with the same **`units="seconds since 2016-12-31 23:59:58"`**, the datetime 2017-01-01 00:00:01 is represented by a time coordinate of 3, because the UTC leap second (from 2016-12-31 23:59:60 to 2017-01-01 00:00:00) is ignored when calculating time coordinates in the **`standard`** calendar . diff --git a/cf-conventions.adoc b/cf-conventions.adoc index 61f64312..a23f2bd8 100644 --- a/cf-conventions.adoc +++ b/cf-conventions.adoc @@ -51,7 +51,7 @@ This enables users of data from different sources to decide which quantities are The CF conventions generalize and extend the COARDS conventions <>. The extensions include metadata that provides a precise definition of each variable via specification of a standard name, describes the vertical locations corresponding to dimensionless vertical coordinate values, and provides the spatial coordinates of non-rectilinear gridded data. Since climate and forecast data are often not simply representative of points in space/time, other extensions provide for the description of coordinate intervals, multidimensional cells and climatological time coordinates, and indicate how a data value is representative of an interval or cell. -This standard also relaxes the COARDS constraints on dimension order and specifies methods for reducing the size of datasets. +These conventions also relax the COARDS constraints on dimension order and specifies methods for reducing the size of datasets. :numbered: include::ch01.adoc[] diff --git a/ch01.adoc b/ch01.adoc index 7217436a..bc89aaba 100644 --- a/ch01.adoc +++ b/ch01.adoc @@ -11,18 +11,18 @@ It is possible to provide the metadata describing how a field is located in time The purpose in restricting how the metadata is represented is to make it practical to write software that allows a machine to parse that metadata and to automatically associate each data value with its location in time and space. It is equally important that the metadata be easy for human users to write and to understand. -This standard is intended for use with climate and forecast data, for atmosphere, surface and ocean, and was designed with model-generated data particularly in mind. -We recognise that there are limits to what a standard can practically cover; we restrict ourselves to issues that we believe to be of common and frequent concern in the design of climate and forecast metadata. -Our main purpose therefore, is to propose a clear, adequate and flexible definition of the metadata needed for climate and forecast data. -Although this is specifically a netCDF standard, we feel that most of the ideas are of wider application. +These conventions are intended for use with climate and forecast data, for atmosphere, surface and ocean, and was designed with model-generated data particularly in mind. +It is recognised that there are limits to what a set of conventions can practically cover; these conventions are restricted to issues that are considered to be of common and frequent concern in the design of climate and forecast metadata. +The main purpose therefore, is to propose a clear, adequate and flexible definition of the metadata needed for climate and forecast data. +Although these conventions are specifically targeting the netCDF format, most of the ideas are of wider application. The metadata objects could be contained in file formats other than netCDF. Conversion of the metadata between files of different formats will be facilitated if conventions for all formats are based on similar ideas. -This convention is designed to be backward compatible with the COARDS conventions <>, by which we mean that a conforming COARDS dataset also conforms to the CF standard. +These conventions are designed to be backward compatible with the COARDS conventions <>, which implies that a conforming COARDS dataset also conforms to the CF conventions. Thus new applications that implement the CF conventions will be able to process COARDS datasets. -We have also striven to maximize conformance to the COARDS standard, that is, wherever the COARDS metadata conventions provide an adequate description we require their use. -Extensions to COARDS are implemented in a manner such that the content that doesn't depend on the extensions is still accessible to applications that adhere to the COARDS standard. +These conventions also strive to maximize conformance to the COARDS conventions, that is, wherever the COARDS metadata conventions provide an adequate description their use is included here. +Extensions to COARDS are implemented in a manner such that the content that doesn't depend on the extensions is still accessible to applications that adhere to the COARDS conventions. [[design, Section 1.2, "Principles for design"]] === Principles for design @@ -47,7 +47,7 @@ Therefore CF-netCDF does not use codes, but instead relies on controlled vocabul 8. Conventions are provided to allow data-producers to describe the data they wish to produce, rather than attempting to prescribe what data they should produce; consequently most CF conventions are optional. -9. Because many datasets remain in use for a long time after production, it is desirable that metadata written according to previous versions of the convention should also be compliant with and have the same interpretation under later versions. +9. Because many datasets remain in use for a long time after production, it is desirable that metadata written according to previous versions of the conventions should also be compliant with and have the same interpretation under later versions. 10. Because all previous versions must generally continue to be supported in software for the sake of archived datasets, and in order to limit the complexity of the conventions, there is a strong preference against introducing any new capability to the conventions when there is already some method that can adequately serve the same purpose (even if a different method would arguably be better than the existing one). @@ -65,7 +65,7 @@ aggregation variable:: A variable containing no data, but which enables the form ancestor group:: A group from which the referring group is descended via direct parent-child relationships -auxiliary coordinate variable:: Any netCDF variable that contains coordinate data, but is not a coordinate variable (in the sense of that term defined by the <> and used by this standard - see below). +auxiliary coordinate variable:: Any netCDF variable that contains coordinate data, but is not a coordinate variable (in the sense of that term defined by the <> and used by these conventions - see below). Unlike coordinate variables, there is no relationship between the name of an auxiliary coordinate variable and the name(s) of its dimension(s). boundary variable:: A boundary variable is associated with a variable that contains coordinate data. @@ -123,8 +123,8 @@ path:: Paths must follow the UNIX style path convention and may begin with eithe quantization variable:: A variable used as a container for attributes that define a specific quantization algorithm. The type of the variable is arbitrary since it contains no data. -recommendation:: Recommendations in this convention are meant to provide advice that may be helpful for reducing common mistakes. -In some cases we have recommended rather than required particular attributes in order to maintain backwards compatibility with COARDS. +recommendation:: Recommendations in these conventions are meant to provide advice that may be helpful for reducing common mistakes. +In some cases the use of particular attributes is recommended rather than required in order to maintain backwards compatibility with COARDS. An application must not depend on a dataset's adherence to recommendations. referring group:: The group in which a reference to a variable or dimension occurs. @@ -148,11 +148,11 @@ vertical dimension:: A dimension of a netCDF variable that has an associated ver [[overview, Section 1.4, "Overview"]] === Overview -No variable or dimension names are standardized by this convention. -Instead we follow the lead of the <> and standardize only the names of attributes and some of the values taken by those attributes. +No variable or dimension names are standardized by these conventions. +Instead, the lead of the <> is followed and only the names of attributes and some of the values taken by those attributes are standardized. Variable or dimension names can either be a single variable name or a path to a variable. The overview provided in this section will be followed with more complete descriptions in following sections. -<> contains a summary of all the attributes used in this convention. +<> contains a summary of all the attributes used in these conventions. Files using this version of the CF Conventions must set the <> defined attribute **`Conventions`** to contain the string value "**`CF-{current-version-as-attribute}`**" to identify datasets that conform to these conventions. @@ -171,7 +171,7 @@ The use of standard names will facilitate the exchange of climate and forecast d Four types of coordinates receive special treatment by these conventions: latitude, longitude, vertical, and time. Every variable must have associated metadata that allows identification of each such coordinate that is relevant. -Two independent parts of the convention allow this to be done. +Two independent parts of the conventions allow this to be done. There are conventions that identify the variables that contain the coordinate data, and there are conventions that identify the type of coordinate represented by that data. There are two methods used to identify variables that contain coordinate data. @@ -183,12 +183,12 @@ Once the variables containing coordinate data are identified, further convention Latitude, longitude, and time coordinates are identified solely by the value of their **`units`** attribute. Vertical coordinates with units of pressure may also be identified by the **`units`** attribute. Other vertical coordinates must use the attribute **`positive`** which determines whether the direction of increasing coordinate value is up or down. -Because identification of a coordinate type by its units involves the use of an external package <>, we provide the optional attribute **`axis`** for a direct identification of coordinates that correspond to latitude, longitude, vertical, or time axes. +Because identification of a coordinate type by its units involves the use of an external package <>, the optional attribute **`axis`** is provided for a direct identification of coordinates that correspond to latitude, longitude, vertical, or time axes. Latitude, longitude, and time are defined by internationally recognized standards, and hence, identifying the coordinates of these types is sufficient to locate data values uniquely with respect to time and a point on the earth's surface. On the other hand identifying the vertical coordinate is not necessarily sufficient to locate a data value vertically with respect to the earth's surface. In particular a model may output data on the parametric (usually dimensionless) vertical coordinate used in its mathematical formulation. -To achieve the goal of being able to spatially locate all data values, this convention provides a mapping, via the **`standard_name`** and **`formula_terms`** attributes of a parametric vertical coordinate variable, between its values and dimensional vertical coordinate values that can be uniquely located with respect to a point on the earth's surface (<>; <>). +To achieve the goal of being able to spatially locate all data values, these conventions provide a mapping, via the **`standard_name`** and **`formula_terms`** attributes of a parametric vertical coordinate variable, between its values and dimensional vertical coordinate values that can be uniquely located with respect to a point on the earth's surface (<>; <>). It is often the case that data values are not representative of single points in time, space and other dimensions, but rather of intervals or multidimensional cells. CF defines a **`bounds`** attribute to specify the extent of intervals or cells. @@ -215,20 +215,20 @@ The attribute **`compress`** is defined for this purpose. These conventions generalize and extend the COARDS conventions <>. A major design goal has been to maintain __backward compatibility__ with COARDS. Hence applications written to process datasets that conform to these conventions will also be able to process COARDS conforming datasets. -We have also striven to maximize __conformance__ to the COARDS standard so that datasets that only require the metadata that was available under COARDS will still be able to be processed by COARDS conforming applications. +The conventions also strive to maximize __conformance__ to the COARDS conventions so that datasets that only require the metadata that was available under COARDS will still be able to be processed by COARDS conforming applications. But because of the extensions that provide new metadata content, and the relaxation of some COARDS requirements, datasets that conform to these conventions will not necessarily be recognized by applications that adhere to the COARDS conventions. The features of these conventions that allow writing netCDF files that are not COARDS conforming are summarized below. COARDS standardizes the description of grids composed of independent latitude, longitude, vertical, and time axes. In addition to standardizing the metadata required to identify each of these axis types, COARDS requires (_time_, _vertical_, _latitude_, _longitude_) as the CDL order for the dimensions of a variable, with longitude being the most rapidly varying dimension (the last dimension in CDL order). Because of I/O performance considerations it may not be possible for models to output their data in conformance with the COARDS requirement. -The CF convention places no rigid restrictions on the order of dimensions, however we encourage data producers to make the extra effort to stay within the COARDS standard order. +The CF conventions place no rigid restrictions on the order of dimensions, however data producers are encouraged to make the extra effort to stay within the COARDS conventions order. The use of non-COARDS axis ordering will render files inaccessible to some applications and limit interoperability. Often a buffering operation can be used to miminize performance penalties when axis ordering in model code does not match the axis ordering of a COARDS file. COARDS addresses the issue of identifying dimensionless vertical coordinates, but does not provide any mechanism for mapping the dimensionless values to dimensional ones that can be located with respect to the earth's surface. -For backwards compatibility we continue to allow (but do not require) the **`units`** attribute of dimensionless vertical coordinates to take the values "level", "layer", or "sigma_level." -But we recommend that the **`standard_name`** and **`formula_terms`** attributes be used to identify the appropriate definition of the dimensionless vertical coordinate (see <>). +For backwards compatibility the **`units`** attribute of dimensionless vertical coordinates to take the values "level", "layer", or "sigma_level" continues to be allowed (but not required). +But it is recommended that the **`standard_name`** and **`formula_terms`** attributes be used to identify the appropriate definition of the dimensionless vertical coordinate (see <>). The CF conventions define attributes which enable the description of data properties that are outside the scope of the COARDS conventions. These new attributes do not violate the COARDS conventions, but applications that only recognize COARDS conforming datasets will not have the capabilities that the new attributes are meant to enable. diff --git a/ch02.adoc b/ch02.adoc index cdfe5ecc..d0f91e70 100644 --- a/ch02.adoc +++ b/ch02.adoc @@ -1,8 +1,8 @@ == NetCDF Files and Components The components of a netCDF file are described in section 2 of the <>. -In this section we describe conventions associated with filenames and the basic components of a netCDF file. -We also introduce new attributes for describing the contents of a file. +In this section conventions associated with filenames and the basic components of a netCDF file are described. +New attributes for describing the contents of a file are also introduced. === Filename @@ -53,15 +53,15 @@ The examples in this document that use string-valued variables alternate between === Naming Conventions It is recommended that variable, dimension, attribute and group names begin with a letter and be composed of letters, digits, and underscores. -By the word _letters_ we mean the standard ASCII letters uppercase `A` to `Z` and lowercase `a` to `z`. -By the word _digits_ we mean the standard ASCII digits `0` to `9`, and similarly _underscores_ means the standard ASCII underscore `_`. +The word _letters_ means the standard ASCII letters uppercase `A` to `Z` and lowercase `a` to `z`. +The word _digits_ means the standard ASCII digits `0` to `9`, and similarly _underscores_ means the standard ASCII underscore `_`. Note that this is in conformance with the COARDS conventions, but is more restrictive than the netCDF interface which allows almost all Unicode characters encoded as multibyte UTF-8 characters (link:$$https://docs.unidata.ucar.edu/nug/current/file_format_specifications.html$$[NUG Appendix B]). The netCDF interface also allows leading underscores in names, but the NUG states that this is reserved for system use. Case is significant in netCDF names, but it is recommended that names should not be distinguished purely by case, i.e., if case is disregarded, no two names should be the same. It is also recommended that names should be obviously meaningful, if possible, as this renders the file more effectively self-describing. -This convention does not standardize any variable or dimension names. +These conventions do not standardize any variable or dimension names. Attribute names and their contents, where standardized, are given in English in this document and should appear in English in conforming netCDF files for the sake of portability. Languages other than English are permitted for variables, dimensions, and non-standardized attributes. The content of some standardized attributes are string values that are not standardized, and thus are not required to be in English. @@ -71,14 +71,14 @@ For example, a description of what a variable represents may be given in a non-E === Dimensions A variable may have any number of dimensions, including zero, and the dimensions must all have different names. -__COARDS strongly recommends limiting the number of dimensions to four, but we wish to allow greater flexibility__. +__COARDS strongly recommends limiting the number of dimensions to four, but these conventions allow greater flexibility__. The dimensions of the variable define the axes of the quantity it contains. Dimensions other than those of space and time may be included. Several examples can be found in this document. Under certain circumstances, one may need more than one dimension in a particular quantity. For instance, a variable containing a two-dimensional probability density function might correlate the temperature at two different vertical levels, and hence would have temperature on both axes. -If any or all of the dimensions of a variable have the interpretations of "date or time" (**`T`**), "height or depth" (**`Z`**), "latitude" (**`Y`**), or "longitude" (**`X`**) then we recommend, but do not require (see <>), those dimensions to appear in the relative order **`T`**, then **`Z`**, then **`Y`**, then **`X`** in the CDL definition corresponding to the file. +If any or all of the dimensions of a variable have the interpretations of "date or time" (**`T`**), "height or depth" (**`Z`**), "latitude" (**`Y`**), or "longitude" (**`X`**) then it is recommended but not required (see <>) that those dimensions to appear in the relative order **`T`**, then **`Z`**, then **`Y`**, then **`X`** in the CDL definition corresponding to the file. All other dimensions should, whenever possible, be placed to the left of the spatiotemporal dimensions. Dimensions may be of any size, including unity. @@ -90,7 +90,7 @@ For example, a variable containing data for temperature at 1.5 m above the groun [[variables]] === Variables -This convention does not standardize variable names. +These conventions do not standardize variable names. NetCDF variables that contain coordinate data are referred to as __coordinate variables__, __auxiliary coordinate variables__, __scalar coordinate variables__, or __multidimensional coordinate variables__. @@ -105,7 +105,7 @@ Missing data is not allowed in coordinate variables. The NUG conventions for missing data changed significantly between version 2.3 and version 2.4. Since version 2.4 the NUG defines missing data as all values outside of the **`valid_range`**, and specifies how the **`valid_range`** should be defined from the **`_FillValue`** (which has library specified default values) if it hasn't been explicitly specified. -If only one missing value is needed for a variable then we recommend that this value be specified using the **`_FillValue`** attribute. +If only one missing value is needed for a variable then it is recommended that this value be specified using the **`_FillValue`** attribute. Doing this guarantees that the missing value will be recognized by generic applications that follow either the before or after version 2.4 conventions. The scalar attribute with the name **`_FillValue`** and of the same type as its variable is recognized by the netCDF library as the value used to pre-fill disk space allocated to the variable. @@ -119,26 +119,26 @@ Note that values that are identified as missing should not be transformed. Since the missing value is outside the valid range it is possible that applying a transformation to it could result in an invalid operation. For example, the default **`_FillValue`** is very close to the maximum representable value of IEEE single precision floats, and multiplying it by 100 produces an "Infinity" (using single precision arithmetic). -This convention defines a two-element vector attribute **`actual_range`** for variables containing numeric data. +These conventions define a two-element vector attribute **`actual_range`** for variables containing numeric data. If the variable is packed using the **`scale_factor`** and **`add_offset`** attributes (see <>), the elements of the **`actual_range`** should have the type intended for the unpacked data. The elements of **`actual_range`** must be exactly equal to the minimum and the maximum data values which occur in the variable (when unpacked if packing is used), and both must be within the **`valid_range`** if specified. If the data is all missing or invalid, the **`actual_range`** attribute cannot be used. === Attributes -This standard describes many attributes (some mandatory, others optional), but a file may also contain non-standard attributes. -Such attributes do not represent a violation of this standard. +These conventions describe many attributes (some mandatory, others optional), but a file may also contain non-standard attributes. +Such attributes do not represent a violation of these conventions. Application programs should ignore attributes that they do not recognise or which are irrelevant for their purposes. Conventional attribute names should be used wherever applicable. Non-standard names should be as meaningful as possible. Before introducing an attribute, consideration should be given to whether the information would be better represented as a variable. In general, if a proposed attribute requires ancillary data to describe it, is multidimensional, requires any of the defined netCDF dimensions to index its values, or requires a significant amount of storage, a variable should be used instead. -When this standard defines string attributes that may take various prescribed values, the possible values are generally given in lower case. +When these conventions define string attributes that may take various prescribed values, the possible values are generally given in lower case. However, applications programs should not be sensitive to case in these attributes. -Several string attributes are defined by this standard to contain "blank-separated lists". +Several string attributes are defined by these conventions to contain "blank-separated lists". Consecutive words in such a list are separated by one or more adjacent spaces. The list may begin and end with any number of spaces. -See <> for a list of attributes described by this standard. +See <> for a list of attributes described by these conventions. [[identification-of-conventions]] ==== Identification of Conventions @@ -167,7 +167,7 @@ For readability in ncdump outputs it is recommended to embed newline characters For backwards compatibility with COARDS none of these global attributes is required. The <> defines **`title`** and **`history`** to be global attributes. -We wish to allow the newly defined attributes, i.e., **`institution`**, **`source`**, **`references`**, and **`comment`**, to be either global or assigned to individual variables. +The newly defined attributes, i.e., **`institution`**, **`source`**, **`references`**, and **`comment`**, may be either global or assigned to individual variables. When an attribute appears both globally and as a variable attribute, the variable's version has precedence. **`title`**:: A succinct description of what is in the dataset. @@ -180,7 +180,7 @@ If it is observational, **`source`** should characterize it (e.g., "**`surface o **`history`**:: Provides an audit trail for modifications to the original data. Well-behaved generic netCDF filters will automatically append their name and the parameters with which they were invoked to the global history attribute of an input netCDF file. -We recommend that each line begin by indicating the date and time of day that the program was executed. +It is recommended that each line begin by indicating the date and time of day that the program was executed. **`references`**:: Published or web-based references that describe the data or methods used to produce it. @@ -196,9 +196,9 @@ The only attribute for which CF standardises the use of external variables is ** === Groups Groups provide a powerful mechanism to structure data hierarchically. -This convention does not standardize group names. +These conventions do not standardize group names. It may be of benefit to name groups in such a way that human readers can interpret them. -However, files that conform to this standard shall not require software to interpret or decode information from group names. +However, files that conform to these conventions shall not require software to interpret or decode information from group names. References to out-of-group variable and dimensions shall be found by applying the scoping rules outlined below. ==== Scope @@ -245,7 +245,7 @@ The lateral search algorithm may only be used for <> coordinate variables; [NOTE] ==== This use of the lateral search strategy to find them is discouraged. -They are allowed mainly for backwards-compatibility with existing datasets, and may be deprecated in future versions of the standard. +They are allowed mainly for backwards-compatibility with existing datasets, and may be deprecated in future versions of these conventions. ==== ==== Application of attributes @@ -405,7 +405,7 @@ Each of the rows contains the sizes of the fragments along that dimension, padde 90 45 45 180 180 _ ``` -From this array we can deduce, for instance, that the shape of the fragment (in its canonical form, see <>) at position `[0, 1, 1]` of the array of fragments is `(17, 45, 180)`; and that this fragment occupies zero-based indices 0 to 16 of the Z aggregated dimension, 90 to 134 of the Y aggregated dimension, and 180 to 359 of the X aggregated dimension. +From this array it can be deduced, for instance, that the shape of the fragment (in its canonical form, see <>) at position `[0, 1, 1]` of the array of fragments is `(17, 45, 180)`; and that this fragment occupies zero-based indices 0 to 16 of the Z aggregated dimension, 90 to 134 of the Y aggregated dimension, and 180 to 359 of the X aggregated dimension. See <>. In the special case that aggregated data is scalar, the `map` variable must also be scalar and contain the value `1`. @@ -507,7 +507,7 @@ The data for the `level`, `latitude` and `longitude` variables are omitted for [[fragment-interpretation, Section 2.8.2 "Fragment Interpretation"]] ==== Fragment Interpretation -Fragment datasets can be encoded in many different but equivalent ways, so we define a __canonical form__ of a fragment that provides a view of the fragment for which its data are consistent with the data from other fragments, as well as with the attributes of the aggregation variable. +Fragment datasets can be encoded in many different but equivalent ways, so a __canonical form__ of a fragment is defined that provides a view of the fragment for which its data are consistent with the data from other fragments, as well as with the attributes of the aggregation variable. When constructing the aggregated data, it is assumed that each fragment's data has been transformed to its canonical form. The canonical form of a fragment's data is such that: diff --git a/ch03.adoc b/ch03.adoc index 217dbe9c..8ba1a0f1 100644 --- a/ch03.adoc +++ b/ch03.adoc @@ -1,19 +1,19 @@ == Description of the Data The attributes described in this section are used to provide a description of the content and the units of measurement for each variable. -We continue to support the use of the **`units`** and **`long_name`** attributes as defined in COARDS. -We extend COARDS by adding the optional **`standard_name`** attribute which is used to provide unique identifiers for variables. +The use of the **`units`** and **`long_name`** attributes as defined in the COARDS conventions is maintained in these conventions. +The COARDS conventions are extended by adding the optional **`standard_name`** attribute which is used to provide unique identifiers for variables. This is important for data exchange since one cannot necessarily identify a particular variable based on the name assigned to it by the institution that provided the data. The **`standard_name`** attribute can be used to identify variables that contain coordinate data. -But since it is an optional attribute, applications that implement these standards must continue to be able to identify coordinate types based on the COARDS conventions. +But since it is an optional attribute, applications that implement these conventions must continue to be able to identify coordinate types based on the COARDS conventions. [[units, Section 3.1, "Units"]] === Units The **`units`** attribute is required for all variables that represent dimensional quantities (except for boundary variables defined in <> and climatology boundary variables defined in <>). The **`units`** attribute is permitted but not required for dimensionless quantities (see <>). -If multiplication by a dimensionless constant and addition of a dimensionless constant are the only operations required for the value of a dimensional quantity expressed in one unit to be converted to the value expressed in another unit, we describe the two units as __physically equivalent__. +If multiplication by a dimensionless constant and addition of a dimensionless constant are the only operations required for the value of a dimensional quantity expressed in one unit to be converted to the value expressed in another unit, the two units are considered __physically equivalent__. The value of the **`units`** attribute is a string that can be recognized by the UDUNITS package <>, with the exceptions that are given in <> and <>. Note that case is significant in the **`units`** strings. @@ -22,7 +22,7 @@ CF does not assume or require that the UDUNITS software will be used for **`unit In most **`units`** conversions, the sole operation on the data is multiplication by a scale factor. Special treatment is required in converting the **`units`** of variables that involve temperature (<>) and the **`units`** of time coordinate variables (<>). -The COARDS convention prohibits the unit `degrees` altogether, but this unit is not forbidden by the CF convention because it may in fact be appropriate for a variable containing, say, solar zenith angle. +The COARDS conventions prohibit the unit `degrees` altogether, but this unit is not forbidden by the CF conventions because it may in fact be appropriate for a variable containing, say, solar zenith angle. The unit `degrees` is also allowed on coordinate variables such as the latitude and longitude coordinates of a transformed grid. In this case the coordinate values are not true latitudes and longitudes, which must always be identified using the more specific forms of `degrees` as described in <> and <>. @@ -36,16 +36,16 @@ The canonical unit (see also <>) for dimensionless quantities tha The UDUNITS package defines a few dimensionless units, such as `percent`, `ppm` (parts per million, 1e-6), and `ppb` (parts per billion, 1e-9). As an alternative to the canonical **`units`** of `1` or some other unitless number, the **`units`** for a dimensionless quantity may be given as a ratio of dimensional units, for instance `mg kg-1` for a mass ratio of 1e-6, or `microlitre litre-1` for a volume ratio of 1e-6. Data-producers are invited to consider whether this alternative would be more helpful to the users of their data. -The CF convention supports dimensionless units that are UDUNITS compatible, with one exception, concerning the dimensionless units defined by UDUNITS for volume ratios, such as `ppmv` and `ppbv`. +The CF conventions support dimensionless units that are UDUNITS compatible, with one exception, concerning the dimensionless units defined by UDUNITS for volume ratios, such as `ppmv` and `ppbv`. These units are allowed in the **`units`** attribute by CF only if the data variable has no **`standard_name`**. These units are prohibited by CF if there is a **`standard_name`**, because the **`standard_name`** defines whether the quantity is a volume ratio, so the **`units`** are needed only to indicate a dimensionless number. Information describing a dimensionless physical quantity itself (e.g. "area fraction" or "probability") does not belong in the **`units`** attribute, but should be given in the **`long_name`** or **`standard_name`** attributes (see <> and <>), in the same way as for physical quantities with dimensional units. As an exception, to maintain backwards compatibility with COARDS, the text strings `level`, `layer`, and `sigma_level` are allowed in the **`units`** attribute, in order to indicate dimensionless vertical coordinates. -This use of **`units`** is not compatible with UDUNITS, and is deprecated by this standard because conventions for more precisely identifying dimensionless vertical coordinates are available (see <>). +This use of **`units`** is not compatible with UDUNITS, and is deprecated by these conventions because conventions for more precisely identifying dimensionless vertical coordinates are available (see <>). -The UDUNITS syntax that allows scale factors and offsets to be applied to a unit is not supported by this standard, except for case of specifying reference time, see section <>. +The UDUNITS syntax that allows scale factors and offsets to be applied to a unit is not supported by these conventions, except for case of specifying reference time, see section <>. The application of any scale factors or offsets to data should be indicated by the **`scale_factor`** and **`add_offset`** attributes. Use of these attributes for data packing, which is their most important application, is discussed in detail in <>. @@ -54,7 +54,7 @@ Use of these attributes for data packing, which is their most important applicat ==== Temperature units The **`units`** of temperature imply an origin (i.e. zero point) for the associated measurement scale. -When the temperature value is the degree of warmth with respect to the origin of the measurement scale, we call it an _on-scale temperature_. +When the temperature value is the degree of warmth with respect to the origin of the measurement scale, it is called an _on-scale temperature_. When **`units`** of on-scale temperature are converted, the data may require the addition of an offset as well as multiplication by a scale factor, because the physical meaning of a numerical value of zero for an on-scale temperature depends on the unit of measurement. On-scale temperature is _unique_ among quantities in the respect that the origin and the unit of measurement are both defined by the **`units`** and therefore cannot be chosen independently. For all other quantities, the origin and the unit of measurement are independent. @@ -70,7 +70,7 @@ A **`standard_name`** (<>) or **`standard_name`** modifier (<>; <>) imply that temperature must be interpreted as temperature difference, but this attribute is optional too. In order to convert the **`units`** correctly, it is essential to know whether a temperature is on-scale or a difference. -Therefore this standard strongly recommends that any variable whose **`units`** involve a temperature unit should also have a **`units_metadata`** attribute to make the distinction. +Therefore these conventions strongly recommend that any variable whose **`units`** involve a temperature unit should also have a **`units_metadata`** attribute to make the distinction. This attribute must have one of the following three values: `temperature: on_scale`, `temperature: difference`, `temperature: unknown`. The **`units_metadata`** attribute, **`standard_name`** modifier (<>) and **`cell_methods`** attribute (<>) must be consistent if present. A variable must not have a **`units_metadata`** attribute if it has no **`units`** attribute or if its **`units`** do not involve a temperature unit. @@ -102,11 +102,11 @@ This value of **`units_metadata`** indicates that the data-writer does not know If the **`units_metadata`** attribute is not present, the data-reader should assume `temperature: unknown`. The **`units_metadata`** attribute was introduced in CF 1.11. In data written according to versions before 1.11, `temperature: unknown` should be assumed for all **`units`** involving temperature, if it cannot be deduced from other metadata. -We note (for guidance _only_ regarding `temperature: unknown`, _not_ as a CF convention) that the UDUNITS software assumes `temperature: on_scale` for **`units`** strings containing only a unit of temperature, and `temperature: difference` for **`units`** strings in which a unit of temperature is raised to any power other than unity, or multiplied or divided by any other unit. +It is noted (for guidance _only_ regarding `temperature: unknown`, _not_ as a CF convention) that the UDUNITS software assumes `temperature: on_scale` for **`units`** strings containing only a unit of temperature, and `temperature: difference` for **`units`** strings in which a unit of temperature is raised to any power other than unity, or multiplied or divided by any other unit. With `temperature: on_scale`, correct conversion can be guaranteed only for pure temperature **`units`**. If the quantity is an on-scale temperature multiplied by some other quantity, it is not possible to convert the data from the **`units`** given to any other **`units`** that involve a temperature with a different origin, given only the **`units`**. -For instance, when temperature is on-scale, a value in `kg degree_C m-2` can be converted to a value in `kg K m-2` only if we know separately the values in `degree_C` and `kg m-2` of which it is the product. +For instance, when temperature is on-scale, a value in `kg degree_C m-2` can be converted to a value in `kg K m-2` only if the values in `degree_C` and `kg m-2` of which it is the product are separately known. [[units-multiples, Section 3.1.3, "Scale factors and offsets"]] @@ -115,7 +115,7 @@ For instance, when temperature is on-scale, a value in `kg degree_C m-2` can be UDUNITS recognises the <> prefixes shown in <> for decimal multiples and submultiples of units, and allows them to be applied to non-SI units as well. UDUNITS offers a syntax for indicating arbitrary scale factors and offsets to be applied to a unit. (Note that this is different from the scale factors and offsets used for converting between **`units`**, as discussed for temperature in <>.) -This UDUNITS syntax for arbitrary transformation of **`units`** is not supported by the CF standard, except for the case of specifying reference time (<>). +This UDUNITS syntax for arbitrary transformation of **`units`** is not supported by the CF conventions, except for the case of specifying reference time (<>). The application of any scale factors or offsets to data should be indicated by the **`scale_factor`** and **`add_offset`** attributes. Use of these attributes for data packing, which is their most important application, is discussed in detail in <>. @@ -152,7 +152,7 @@ A fundamental requirement for exchange of scientific data is the ability to desc To some extent this is the role of the **`long_name`** attribute as defined in the <>. However, usage of **`long_name`** is completely ad-hoc. For many applications it is desirable to have a more definitive description of the quantity, which allows users of data from different sources (some of which might be models and others observational) to determine whether quantities are in fact comparable. -For this reason each variable may optionally be given a "standard name", whose meaning is defined by this convention. +For this reason each variable may optionally be given a "standard name", whose meaning is defined by these conventions. There may be several variables in a dataset with any given standard name, and these may be distinguished by other metadata, such as coordinates (<>) and **`cell_methods`** (<>). A standard name is associated with a variable via the attribute **`standard_name`** which takes a string value comprised of a standard name optionally followed by one or more blanks and a standard name modifier (a string value from <>). @@ -170,11 +170,11 @@ Unless it is dimensionless, a variable with a **`standard_name`** attribute must Units of time coordinates (<>), whose **`units`** attribute includes the word **`since`**, are _not_ physically equivalent to time units that do not include **`since`** in the **`units`**. To mark this distinction, the canonical unit given for quantities used for time coordinates is **`s since 1972-01-01`**. The reference datetime in the canonical unit (the beginning of the day i.e. midnight on 1st January 1972 at 0 `degrees_east`) is not restrictive; the time coordinate variable's own **`units`** may contain any reference datetime (after **`since`**) that is valid in its calendar. -(We use `1972-01-01` because it is when the current definition of UTC came into force, and a valid datetime in all CF calendars; see also <>.) +(`1972-01-01` is used because that is when the current definition of UTC came into force, and a valid datetime in all CF calendars; see also <>.) In both kinds of time **`units`** attribute (with or without **`since`**), any unit for measuring time can be used i.e. any unit which is physically equivalent to the SI base unit of time, namely the second. description:: The description is meant to clarify the qualifiers of the fundamental quantities such as which surface a quantity is defined on or what the flux sign conventions are. -We don't attempt to provide precise definitions of fundumental physical quantities (e.g., temperature) which may be found in the literature. +No attempt is made to provide precise definitions of fundumental physical quantities (e.g., temperature) which may be found in the literature. The description may define rules on the variable type, attributes and coordinates which must be complied with by any variable carrying that standard name (such as in Example 3.5). The standard name table is located at diff --git a/ch04.adoc b/ch04.adoc index ca51ad11..cee506f0 100644 --- a/ch04.adoc +++ b/ch04.adoc @@ -6,18 +6,18 @@ The commonest use of coordinate variables is to locate the data in space and time, but coordinates may be provided for any other continuous geophysical quantity (e.g. density, temperature, radiation wavelength, zenith angle of radiance, sea surface wave frequency) or discrete category (see <>, e.g. area type, model level number, ensemble member number) on which the data variable depends. Four types of coordinates receive special treatment by these conventions: latitude, longitude, vertical, and time. -We continue to support the special role that the **`units`** and **`positive`** attributes play in the COARDS convention to identify coordinate type. -As an extension to COARDS, we strongly recommend that a parametric (usually dimensionless) vertical coordinate variable should be associated, via **`standard_name`** and **`formula_terms`** attributes, with its explicit definition, which provides a mapping between its values and dimensional vertical coordinate values that can be uniquely located with respect to a point on the earth's surface. +The special role that the **`units`** and **`positive`** attributes play in the COARDS conventions to identify coordinate type continues to be supported. +As an extension to COARDS, it is strongly recommended that a parametric (usually dimensionless) vertical coordinate variable should be associated, via **`standard_name`** and **`formula_terms`** attributes, with its explicit definition, which provides a mapping between its values and dimensional vertical coordinate values that can be uniquely located with respect to a point on the earth's surface. -Because identification of a coordinate type by its units is complicated by requiring the use of an external package <>, we provide two optional methods that yield a direct identification. +Because identification of a coordinate type by its units is complicated by requiring the use of an external package <>, two optional methods are provided that yield a direct identification. The attribute **`axis`** may be attached to a coordinate variable and given one of the values **`X`**, **`Y`**, **`Z`** or **`T`** which stand for a longitude, latitude, vertical, or time axis respectively. Alternatively the **`standard_name`** attribute may be used for direct identification. But note that these optional attributes are in addition to the required COARDS metadata. -To identify generic spatial coordinates we recommend that the **`axis`** attribute be attached to these coordinates and given one of the values **`X`**, **`Y`** or **`Z`**. +To identify generic spatial coordinates, it is recommended that the **`axis`** attribute be attached to these coordinates and given one of the values **`X`**, **`Y`** or **`Z`**. The values **`X`** and **`Y`** for the axis attribute should be used to identify horizontal coordinate variables. If both X- and Y-axis are identified, **`X-Y-up`** should define a right-handed coordinate system, i.e. rotation from the positive X direction to the positive Y direction is anticlockwise if viewed from above. -We strongly recommend that coordinate variables be used for all coordinate types whenever they are applicable. +It is strongly recommended that coordinate variables be used for all coordinate types whenever they are applicable. The methods of identifying coordinate types described in this section apply both to coordinate variables and to auxiliary coordinate variables named by the **`coordinates`** attribute (see <>). @@ -81,7 +81,7 @@ float lon(lon) ; Application writers should note that the UDUNITS package has limited recognition of the directionality implied by the "east" part of the unit specification. It defines **`degrees_east`** to be pi/180 radians, and hence equivalent to **`degrees_north`**. -We recommend the determination that a coordinate is a longitude type should be done via a string match between the given unit and one of the acceptable forms of **`degrees_east`**. +Hence, determination that a coordinate is a longitude type should be done via a string match between the given unit and one of the acceptable forms of **`degrees_east`**. Optionally, the longitude type may be indicated additionally by providing the **`standard_name`** attribute with the value **`longitude`**, and/or the **`axis`** attribute with the value **`X`**. @@ -94,7 +94,7 @@ Variables representing dimensional height or depth axes must always explicitly i The direction of positive (i.e., the direction in which the coordinate values are increasing), whether up or down, cannot in all cases be inferred from the units. The direction of positive is useful for applications displaying the data. -For this reason the attribute **`positive`** as defined in the COARDS standard is required if the vertical axis units are not a valid unit of pressure (as determined by the UDUNITS package <>) -- otherwise its inclusion is optional. +For this reason the attribute **`positive`** as defined in the COARDS conventions is required if the vertical axis units are not a valid unit of pressure (as determined by the UDUNITS package <>) -- otherwise its inclusion is optional. The **`positive`** attribute may have the value **`up`** or **`down`** (case insensitive). This attribute may be applied to either coordinate variables or auxiliary coordinate variables that contain vertical coordinate data. @@ -136,8 +136,8 @@ Plural forms are also acceptable. ==== Dimensionless Vertical Coordinate The **`units`** attribute is not required for dimensionless coordinates. -For backwards compatibility with COARDS we continue to allow the **`units`** attribute to take one of the values: **`level`**, **`layer`**, or **`sigma_level`**. -These values are not recognized by the UDUNITS package, and are considered a deprecated feature in the CF standard. +For backwards compatibility with COARDS the **`units`** attribute may take one of the values: **`level`**, **`layer`**, or **`sigma_level`**. +These values are not recognized by the UDUNITS package, and are considered a deprecated feature in the CF conventions. [[parametric-vertical-coordinate, Section 4.3.3, "Parametric Vertical Coordinate"]] ==== Parametric Vertical Coordinate @@ -146,7 +146,7 @@ In some cases dimensional vertical coordinates are a function of horizontal loca The `standard_name` of the parametric (usually dimensionless) vertical coordinate variable can be used to find the definition of the associated computed (always dimensional) vertical coordinate in <>. The definition provides a mapping between the parametric vertical coordinate values and computed values that can positively and uniquely indicate the location of the data. The `formula_terms` attribute can be used to associate terms in the definitions with variables in a netCDF file, and the `computed_standard_name` attribute can be used to supply the `standard_name` of the computed vertical coordinate values computed according to the definition. -To maintain backwards compatibility with COARDS the use of these attributes is not required, but is strongly recommended. +To maintain backwards compatibility with the COARDS conventions the use of these attributes is not required, but is strongly recommended. Some of the definitions may be supplemented with information stored in the `grid_mapping` variable about the datum used as a vertical reference (e.g. geoid, other geopotential datum or reference ellipsoid; see <> and <>). [[atm-sigma-coord-ex]] @@ -233,7 +233,7 @@ It must comprise a unit of measure that is physically equivalent (see <>) The time coordinate exactly equals the length of the time interval from the instant identified by the reference datetime to the instant identified by the time coordinate, in all cases except when leap seconds occur between the two instants in the **`standard`** calendar. (See <> for details.) -The CF standard follows UDUNITS (<>) in the definition of the acceptable units of measure for time. +The CF conventions follow UDUNITS (<>) in the definition of the acceptable units of measure for time. The most commonly used of these units (and their symbols) are **`day`** (**`d`**), **`hour`** (**`h`**), **`minute`** (**`min`**) and **`second`** (**`s`**). Plural forms are also acceptable. In CF, following UDUNITS, any unit may optionally have one of the decimal prefixes for multiples and submultiples (<>) e.g. **`millisecond`** or **`ms`**. @@ -241,12 +241,12 @@ CF recommends __not__ to use these prefixes with any unit of time other than **` UDUNITS defines a **`year`** to be exactly 365.242198781 days (the interval between 2 successive passages of the sun through vernal equinox). __It is not a calendar year.__ UDUNITS defines a **`month`** to be exactly **`year/12`**, which is __not a calendar month__. -We recommend that **`year`** and **`month`** should not be used, because of the potential for mistakes and confusion. +It is recommended that **`year`** and **`month`** should not be used, because of the potential for mistakes and confusion. UDUNITS defines a **`minute`** as 60 **`seconds`**, an **`hour`** as 3600 **`seconds`** and a **`day`** as 86400 **`seconds`**, consistent with the <> definitions of these non-SI units. These are fixed units of measure. When a leap second is inserted into UTC, the minute, hour and day affected differ by one second from their usual durations according to clock time, but the units of **`minute`**, **`hour`** and **`day`** do not. -To avoid mistakes and confusion, we therefore recommend that these units should not be used in the **`utc`** calendar (<>). +To avoid mistakes and confusion, it is therefore recommended that these units should not be used in the **`utc`** calendar (<>). UDUNITS permits a number of alternatives to the word **`since`** in the units of time coordinates. All the alternatives have exactly the same meaning in UDUNITS. @@ -263,8 +263,8 @@ Its format is __y__-__m__-__d__ [__H__:__M__:__S__[ __T__]], where [...] indicat __T__ is __not__ a time zone name or acronym; it is an interval of time. The default for time zone offset __T__ is zero, which may also be explicitly indicated in any of the numeric formats for __T__ defined below, or by the letter **`Z`**, sometimes referred to as "Zulu Time". -We suggest that a zero offset be stated explicitly to avoid confusion in situations where omitting it might be misunderstood as indicating local time. -We recommend that a non-zero time zone offset should __not__ be specified __in any situation__, because it is easy to make mistakes about the sign of the offset and allowance for daylight-saving/summer time. +It is suggested that a zero offset be stated explicitly to avoid confusion in situations where omitting it might be misunderstood as indicating local time. +It is recommended that a non-zero time zone offset should __not__ be specified __in any situation__, because it is easy to make mistakes about the sign of the offset and allowance for daylight-saving/summer time. A non-zero offset is __not allowed__ in the **`utc`** and **`tai`** calendars (see <>). In a time zone with zero offset, time (approximately) equals mean solar time for 0 **`degrees_east`** of longitude. @@ -352,7 +352,7 @@ Leap seconds are ignored by most software, including UDUNITS. * Use the **`standard`** calendar for observational data in other cases. * Use the **`proleptic_gregorian`** calendar for model-generated data. -(In principle a model could be programmed to include leap seconds, but we assume that this is not the case.) +(In principle a model could be programmed to include leap seconds, but it is assumed that this is not the case.) <> compares these calendars' treatment of leap seconds and explains the above recommendations. Leap seconds do not need to be considered in the **`360_day`**, **`365_day`** and **`366_day`** calendars. diff --git a/ch05.adoc b/ch05.adoc index 67820286..6d7031a1 100644 --- a/ch05.adoc +++ b/ch05.adoc @@ -16,7 +16,7 @@ First, string-valued coordinates (<>) will have a dimension for maximum Second, if an auxiliary coordinate variable of a data variable that has been compressed by gathering (<>) does not span the compressed dimension, then its dimensions may be any subset of the data variable's uncompressed dimensions, i.e. any of the dimensions of the data variable except the compressed dimension, and any of the dimensions listed by the **`compress`** attribute of the compressed coordinate variable. Third, in the ragged array representations of data (<>), special methods are needed to connect the data and coordinates. -We recommend that the name of a multidimensional coordinate variable should not match the name of any of its dimensions because that precludes supplying a coordinate variable for the dimension. +It is recommended that the name of a multidimensional coordinate variable should not match the name of any of its dimensions because that precludes supplying a coordinate variable for the dimension. This practice also avoids potential bugs in applications that determine coordinate variables by only checking for a name match between a dimension and a variable and not checking that the variable is one dimensional. If the longitude, latitude, vertical or time coordinate is multi-valued, varies in only one dimension, and varies independently of other spatiotemporal coordinates, it is not permitted to store it as an auxiliary coordinate variable. @@ -130,7 +130,7 @@ This faciliates processing of this data by generic applications that don't recog A "reduced" longitude-latitude grid is one in which the points are arranged along constant latitude lines with the number of points on a latitude line decreasing toward the poles. Storing this type of gridded data in two-dimensional arrays wastes space, and results in the presence of missing values in the 2D coordinate variables. -We recommend that this type of gridded data be stored using the compression scheme described in <>. +It is recommended that this type of gridded data be stored using the compression scheme described in <>. Compression by gathering preserves structure by storing a set of indices that allows an application to easily scatter the compressed data back to two-dimensional arrays. The compressed latitude and longitude auxiliary coordinate variables are identified by the `coordinates` attribute. @@ -441,7 +441,7 @@ The `crs_wkt` attribute should comprise a text string that conforms to the WKT s If desired the text string may contain embedded newline characters to aid human readability. However, any such characters are purely cosmetic and do not alter the meaning of the attribute value. It is envisaged that the value of the `crs_wkt` attribute typically will be a single line of text, one intended primarily for machine processing. -Other than the requirement to be a valid WKT string, the CF convention does not prescribe the content of the `crs_wkt` attribute since it will necessarily be context-dependent. +Other than the requirement to be a valid WKT string, the CF conventions do not prescribe the content of the `crs_wkt` attribute since it will necessarily be context-dependent. Where a `crs_wkt` attribute is added to a `grid_mapping`, the extended syntax for the `grid_mapping` attribute enables the list of variables containing coordinate values being referenced to be explicitly stated and the CRS WKT Axis order to be explicitly defined. The explicit definition of WKT CRS Axis order is expected by the OGC standards for referencing by coordinates. @@ -512,7 +512,7 @@ Example 5.12 illustrates how certain WKT elements - all of which are optional - Note: To enhance readability of these examples, the WKT value has been split across multiple lines and embedded quotation marks (") left unescaped - in real netCDF files such characters would need to be escaped. In CDL, within the CRS WKT definition string, newlines would need to be encoded within the string as `\n` and double quotes as `\"`. -Also for readability, we have dropped the quotation marks which would delimit the entire `crs_wkt` string. +Also for readability, the quotation marks which would delimit the entire `crs_wkt` string have been dropped. This pseudo CDL will not parse directly. [[british-national-grid-newlyn-datum-in-crs-wkt-format]] @@ -653,7 +653,7 @@ Similarly, a string-valued scalar coordinate variable has the same meaning and p Note however that use of this feature with a latitude, longitude, vertical, or time coordinate will inhibit COARDS conforming applications from recognizing them. Once a name is used for a scalar coordinate variable it can not be used for a 1D coordinate variable. -For this reason we strongly recommend against using a name for a scalar coordinate variable that matches the name of any dimension in the file. +For this reason it is strongly recommended against using a name for a scalar coordinate variable that matches the name of any dimension in the file. If a data variable has two or more scalar coordinate variables, they are regarded as though they were all independent coordinate variables with dimensions of size one. If two or more single-valued coordinates are not independent, but have related values (this might be the case, for instance, for time and forecast period, or vertical coordinate and model level number, <>), they should be stored as coordinate or auxiliary coordinate variables of the same size one dimension, not as scalar coordinate variables. diff --git a/ch06.adoc b/ch06.adoc index 6d957e9b..21eddb22 100644 --- a/ch06.adoc +++ b/ch06.adoc @@ -21,7 +21,7 @@ This is a convenience feature. ==== Geographic Regions When data is representative of geographic regions which can be identified by names but which have complex boundaries that cannot practically be specified using longitude and latitude boundary coordinates, a labeled axis should be used to identify the regions. -We recommend that the names be chosen from the list of link:$$https://cfconventions.org/Data/cf-standard-names/docs/standardized-region-names.html$$[standardized region names] whenever possible. +WIt is recommended that the names be chosen from the list of link:$$https://cfconventions.org/Data/cf-standard-names/docs/standardized-region-names.html$$[standardized region names] whenever possible. To indicate that the label values are standardized the variable that contains the labels must be given the **`standard_name`** attribute with the value `region`. [[northward-heat-transport-in-atlantic-ocean-ex]] @@ -29,7 +29,7 @@ To indicate that the label values are standardized the variable that contains th .Northward heat transport in Atlantic Ocean ==== -Suppose we have data representing northward heat transport across a set of zonal slices in the Atlantic Ocean. +Suppose one has data representing northward heat transport across a set of zonal slices in the Atlantic Ocean. Note that the standard names to describe this quantity do not include location information. That is provided by the latitude coordinate and the labeled axis: diff --git a/ch07.adoc b/ch07.adoc index 740a9340..85b1da52 100644 --- a/ch07.adoc +++ b/ch07.adoc @@ -5,11 +5,11 @@ When gridded data does not represent the point values of a field but instead represents some characteristic of the field within cells of non-zero size, a complete description of the variable should include metadata that describes the domain or extent of each cell, and the characteristic of the field that the cell values represent. The commonest cases have one-dimensional cells along spatiotemporal axes, for instance cells along a time axis for consecutive months whose values contain monthly means. The conventions presented in <>, <> and <> describe cases in which each grid point is associated with a cell consisting of a single one-dimensional interval, a single two-dimensional polygonal area, or in general a single _n_-dimensional volume in the _n_-dimensional space described by its coordinate variables. -As an alternative to _n_-dimensional volumes with bounds, we provide <>, for the case of geospatial applications in which each data value pertains to a single real-world feature, such as a river, watershed or country, represented by one or more points, lines or polygons. +As an alternative to _n_-dimensional volumes with bounds, <> is provided, for the case of geospatial applications in which each data value pertains to a single real-world feature, such as a river, watershed or country, represented by one or more points, lines or polygons. It is possible for a single data value to be the result of an operation whose domain is a disjoint set of intervals or areas. This is true for many types of climatological statistic; for example, the mean January temperature for the years 1971-2000 is computed from the 30 individual months of January, which are a set of discontiguous time-intervals. -Climatological statistics are of such importance that we provide special methods for describing their associated computational domains in <>. +Climatological statistics are of such importance that special methods are provided for describing their associated computational domains in <>. Climatological statistics and other kinds of statistic, e.g. zonal means, may be used as a reference with respect to which anomalies are computed. <> gives conventions for relating an anomaly data variable to its reference statistic, and for describing how the latter was computed. @@ -19,7 +19,7 @@ Climatological statistics and other kinds of statistic, e.g. zonal means, may be To delimit the cells, the **`bounds`** attribute may be added to the appropriate coordinate variable(s). The value of **`bounds`** is the name of the variable that contains the vertices of the cell boundaries. -We refer to this type of variable as a "boundary variable." +This type of variable is referred to as a "boundary variable." If cell boundaries are provided, it is recommended that each gridpoint should lie somewhere within or upon the boundaries of its own cell. If cell boundaries are not provided (using the **`bounds`** attribute), an application can make no assumption about the location or extent of the cells. @@ -30,7 +30,7 @@ Nonetheless, the bounds may still be included, for instance because the grid is A cell of truly zero size can be indicated by giving it coincident boundaries. A boundary variable must have one more dimension than its associated coordinate or auxiliary coordinate variable. -We refer to the additional dimension as the "vertex dimension". +The additional dimension is referred to as the "vertex dimension". The vertex dimension must be the most rapidly varying dimension (the last dimension in CDL order), and its size is the maximum number of cell vertices. The vertex dimension must be of size two if the associated variable is one-dimensional (<>), and of size greater than two if the associated variable has more than one dimension (<>). @@ -78,7 +78,7 @@ variables: ---- The boundary variable **`time_bnds`** associates a time point **`i`** with the time interval whose boundaries are **`time_bnds(i,0)`** and **`time_bnds(i,1)`**. The instant **`time(i)`** should be contained within the interval, or be at one end of it. -For instance, with **`i=2`** we might have **`time(2)=10.5`**, **`time_bnds(2,0)=10.0`**, **`time_bnds(2,1)=11.0`**. +For instance, with **`i=2`** one might have **`time(2)=10.5`**, **`time_bnds(2,0)=10.0`**, **`time_bnds(2,1)=11.0`**. If the times are increasing e.g. **`time(3)`** = **`11.5`** > **`10.5`** = **`time(2)`**, which implies **`time(i+1)`** > **`time(i)`** for all **`i`** because coordinates must be monotonic, the bounds must also be increasing for all **`i`**, e.g. **`timebnd(2,1)`** >= **`timebnd(2,0)`**. If adjacent intervals are contiguous, the shared endpoint must be identical. For example, if the interval **`i=3`** begins at **`11.0`** days, when interval **`i=2`** ends, the values in **`timebnd(3,0)`** and **`timebnd(2,1)`** must be _exactly_ the same. @@ -126,7 +126,7 @@ The gridpoint location, **`(lat(j,i),lon(j,i))`**, should be contained within th The vertices must be ordered such that, when visiting the vertices in order, the four-sided perimeter of the cell is traversed anticlockwise on the lon-lat surface as seen from above. If i-j-upward is a right-handed coordinate system (like lon-lat-upward), this can be arranged as in <>. Let us call the side of cell **`(j,i)`** facing cell **`(j,i-1)`** the "**`i-1`**" side, the side facing cell **`(j,i+1)`** the "**`i+1`**" side, and similarly for "**`j-1`**" and "**`j+1`**". -Then we can refer to the vertex formed by sides **`i-1`** and **`j-1`** as **`(j-1,i-1)`**. +Then the vertex formed by sides **`i-1`** and **`j-1`** can be referred to as **`(j-1,i-1)`**. With this notation, the four vertices are indexed as follows: **`0=(j-1,i-1)`**, **`1=(j-1,i+1)`**, **`2=(j+1,i+1)`**, **`3=(j+1,i-1)`**. [[img-bnd_2d_coords]] @@ -169,7 +169,7 @@ For any term that depends on the vertical dimension, however, the variable names Whenever a **`formula_terms`** attribute is attached to a boundary variable, the formula terms may additionally be identified using a second method: variables appearing in the vertical coordinates' **`formula_terms`** may be declared to be coordinate, scalar coordinate or auxiliary coordinate variables, and those coordinates may have **`bounds`** attributes that identify their boundary variables. In that case, the **`bounds`** attribute of a formula terms variable must be consistent with the **`formula_terms`** attribute of the boundary variable. -Software digesting legacy datasets (constructed prior to version 1.7 of this standard) may have to rely in some cases on the first method of identifying the formula term variables and in other cases, on the second. +Software digesting legacy datasets (constructed prior to version 1.7 of these conventions) may have to rely in some cases on the first method of identifying the formula term variables and in other cases, on the second. Starting from version 1.7, however, the first method will be sufficient. [[specifying-formula_terms-ex]] @@ -228,7 +228,7 @@ In this case, rather than (or in addition to) indicating grid cell area, it may To indicate extra information about the spatial properties of a variable's grid cells, a **`cell_measures`** attribute may be defined for a variable. This is a string attribute comprising a list of blank-separated pairs of words of the form "**`measure: name`**". For the moment, "**`area`**" and "**`volume`**" are the only defined measures, but others may be supported in future. -The "name" is the name of the variable containing the measure values, which we refer to as a "measure variable". +The "name" is the name of the variable containing the measure values, which is called a "measure variable". The dimensions of a measure variable must be the same as or a subset of the dimensions of the variable to which it is related, but their order is not restricted, and with one exception: If a cell measure variable of a data variable that has been compressed by gathering (<>) does not span the compressed dimension, then its dimensions may be any subset of the data variable's uncompressed dimensions, i.e. any of the dimensions of the data variable except the compressed dimension, and any of the dimensions listed by the **`compress`** attribute of the compressed coordinate variable. In the case of area, for example, the field itself might be a function of longitude, latitude, and time, but the variable containing the area values would only include longitude and latitude dimensions (and the dimension order could be reversed, although this is not recommended). @@ -277,7 +277,7 @@ variables: [[cell-methods, Section 7.3, "Cell Methods"]] === Cell Methods -To describe the characteristic of a field that is represented by cell values, we define the **`cell_methods`** attribute of the variable. +To describe the characteristic of a field that is represented by cell values, the **`cell_methods`** attribute of the variable is used. This is a string attribute comprising a list of blank-separated words of the form "__name: method__". Each "__name: method__" pair indicates that for an axis identified by __name__, the cell values representing the field have been determined or derived by the specified __method__. For example, if data values have been generated by computing time means, then this could be indicated with **`cell_methods="t: mean"`**, assuming here that the name of the time dimension variable is "t". @@ -382,7 +382,7 @@ If there is no standardized information, the keyword **`comment:`** should be om For instance, an area-weighted mean over latitude could be indicated as **`lat: mean (area-weighted)`** or **`lat: mean (interval: 1 degree_north comment: area-weighted)`**. A dimension of size one may be the result of "collapsing" an axis by some statistical operation, for instance by calculating a variance from time series data. -We strongly recommend that dimensions of size one be retained (or scalar coordinate variables be defined) to enable documentation of the method (through the **`cell_methods`** attribute) and its domain (through the **`bounds`** attribute). +It is strongly recommended that dimensions of size one be retained (or scalar coordinate variables be defined) to enable documentation of the method (through the **`cell_methods`** attribute) and its domain (through the **`bounds`** attribute). [[surface-air-temperature-variance-ex]] [caption="Example 7.6. "] @@ -506,7 +506,7 @@ For example, there could be a **`cell_methods`** entry of "**`longitude: mean`** That would indicate a mean over all longitudes. Note, however, that if in addition the data variable had a scalar coordinate variable with a **`standard_name`** of **`region`** and a value of **`atlantic_ocean`**, it would indicate a mean over longitudes that lie within the Atlantic Ocean, not all longitudes. -We recommend that whenever possible, cell bounds should be supplied by giving the variable a dimension of size one and attaching bounds to the associated coordinate variable. +It is recommended that whenever possible, cell bounds should be supplied by giving the variable a dimension of size one and attaching bounds to the associated coordinate variable. [[climatological-statistics, Section 7.4, "Climatological Statistics"]] === Climatological Statistics @@ -514,8 +514,8 @@ We recommend that whenever possible, cell bounds should be supplied by giving th Climatological statistics may be derived from corresponding portions of the annual cycle in a set of years, e.g., the average January temperatures in the climatology of 1961-1990, where the values are derived by averaging the 30 Januarys from the separate years. Portions of the climatological cycle are specified by references to dates within the calendar year. However, a calendar year is not a well-defined unit of time, because it differs between leap years and other years, and among calendars. -Nonetheless for practical purposes we wish to compare statistics for months or seasons from different calendars, and to make climatologies from a mixture of leap years and other years. -Hence we provide special conventions for indicating dates within the climatological year. +Nonetheless for practical purposes it may be desirable to compare statistics for months or seasons from different calendars, and to make climatologies from a mixture of leap years and other years. +Hence special conventions for indicating dates within the climatological year are provided. Climatological statistics may also be derived from corresponding portions of a range of days, for instance the average temperature for each hour of the average day in April 1997. In addition the two concepts may be used at once, for instance to indicate not April 1997, but the average April of the five years 1995-1999. @@ -528,20 +528,20 @@ The rules and recommendations for attributes of the climatological boundary vari Using the units and calendar of the time coordinate variable, element (i,0) of the climatology boundary variable specifies the beginning of the first subinterval and element (i,1) the end of the last subinterval used to evaluate the climatological statistics with index i in the time dimension. The time coordinates should be values that are representative of the climatological time intervals, such that an application which does not recognise climatological time will nonetheless be able to make a reasonable interpretation. -For compatibility with the COARDS standard, a climatological time coordinate in the default **`standard`** and **`julian`** calendars may be indicated by setting the datetime reference string in the time coordinate's **`units`** attribute to midnight at 0 `degrees_east` on 1 January in year 0 (i.e., **`since 0-1-1`**). +For compatibility with the COARDS conventions, a climatological time coordinate in the default **`standard`** and **`julian`** calendars may be indicated by setting the datetime reference string in the time coordinate's **`units`** attribute to midnight at 0 `degrees_east` on 1 January in year 0 (i.e., **`since 0-1-1`**). This convention is deprecated because it does not provide any information about the intervals used to compute the climatology, and there may be inconsistencies among software packages in the interpretation of the time coordinates with a reference time of year 0. Use of year 0 for this purpose is impossible in all other calendars, because year 0 is a valid year. A climatological axis may use different statistical methods to represent variation among years, within years and within days. For example, the average January temperature in a climatology is obtained by averaging both within years and over years. This is different from the average January-maximum temperature and the maximum January-average temperature. -For the former, we first calculate the maximum temperature in each January, then average these maxima; for the latter, we first calculate the average temperature in each January, then find the largest one. +For the former, first the maximum temperature in each January is calculated, then average these maxima; for the latter, first the average temperature in each January is calculated, then the largest one identified. As usual, the statistical operations are recorded in the **`cell_methods`** attribute, which may have two or three entries for the climatological time dimension. Valid values of the **`cell_methods`** attribute must be in one of the forms from the following list. The intervals over which various statistical methods are applied are determined by decomposing the date and time specifications of the climatological time bounds of a cell, as recorded in the variable named by the **`climatology`** attribute. (The date and time specifications must be calculated from the time coordinates expressed in units of "time interval since reference date and time".) -In the descriptions that follow we use the abbreviations __y__, __m__, __d__, __H__, __M__, and __S__ for year, month, day, hour, minute, and second respectively. +In the descriptions that follow the abbreviations __y__, __m__, __d__, __H__, __M__, and __S__ are used for year, month, day, hour, minute, and second respectively. The suffix __0__ indicates the earlier bound and __1__ the latter. time: method1 **`within years`**   time: method2 **`over years`**:: __method1__ is applied to the time intervals (mdHMS0-mdHMS1) within individual years and __method2__ is applied over the range of years (y0-y1). @@ -558,7 +558,7 @@ Analogous situations arise for daily intervals running across midnight from one When considering intervals within days, if the earlier time of day is equal to the later time of day, then the method is applied to a full 24 hour day. -__We have tried to make the examples in this section easier to understand by translating all time coordinate values to date and time formats. +__The examples in this section have been made easier to understand by translating all time coordinate values to date and time formats. This is not currently valid CDL syntax.__ [[climatological-seasons-ex]] @@ -733,13 +733,13 @@ An "anomaly" is the difference between a physical quantity and its statistical n For example, a commonly-used anomaly is the current temperature at a specific location minus the long-term average temperature there. CF offers two conventions for describing anomaly data. -In the remainder of this section and the following two (<> and <>), we describe a general convention for anomalies over any type of coordinate, including details about how the norm was calculated. -In <>, we describe a legacy convention that depends on special standard names. +In the remainder of this section and the following two (<> and <>), a general convention for anomalies over any type of coordinate is described, including details about how the norm was calculated. +In <>, a legacy convention is described that depends on special standard names. It can be used only for simple temporal anomalies, and is insufficiently informative for many use-cases. The generalized definition of an anomaly value __A__ of some physical quantity __q__ is the difference __P__ − __N__ between a particular value __P__ of __q__ and a normal value or norm __N__ of __q__. __N__ is some statistic calculated from the values of __q__ that lie within specified ranges of one or more of the variables (usually spatiotemporal coordinates) on which __q__ depends. -We denote this set of variables as {__c__}; usually there is only one variable in the set. +This set of variables is denoted as {__c__}; usually there is only one variable in the set. __P__ can be, but is not necessarily, one of the values of __q__ from which __N__ is calculated. In the same way, a data variable __A__ containing anomalies (an "anomaly data variable") is notionally the difference between a data variable __P__ containing the original data and a data variable __N__ (a "norm data variable") containing the statistical norm. @@ -772,13 +772,13 @@ This is known because the entries in **`cell_methods`** appear in order of appli * that __N__ was calculated from the variation of __P__ over dimensions identified by the __name__(s). Each __name__ must identify an "axis" of the anomaly data variable __A__. -By "axis" we mean a dimension and its corresponding coordinate variable, or a scalar coordinate variable. -In either case, we refer to the axis as an "anomaly axis" and to the variable as an "anomaly coordinate variable". +The word "axis" means a dimension and its corresponding coordinate variable, or a scalar coordinate variable. +In either case, the axis is referred to as an "anomaly axis" and to the variable as an "anomaly coordinate variable". The anomaly coordinate variable of each axis must have a **`standard_name`** attribute. Usually there is only one anomaly axis, and usually it is a spatiotemporal axis. For instance, for a data variable containing anomalies with respect to the zonal mean, __name__ identifies longitude as the anomaly axis e.g. "**`longitude: anomaly_wrt`** __norm__". -For an anomaly with respect to the area minimum, we need two __name__s e.g. "**`lat: lon: anomaly_wrt`** __norm__". +For an anomaly with respect to the area minimum, two __name__s are needed, e.g. "**`lat: lon: anomaly_wrt`** __norm__". As described in <>, the combination of horizontal axes can alternatively be represented by the word **`area`**, thus "**`area: anomaly_wrt`** __norm__". For an anomaly with respect to a statistic computed over time or climatological time, __name__ identifies the time axis of the anomaly data variable. @@ -789,7 +789,7 @@ In both cases, __norm__ is an ancillary variable of the anomaly data variable (< If the norm data variable __N__ is present in the dataset, it can be named as __norm__ in the **`cell_methods`** of the anomaly variable. Such would be the case in a dataset that contains both zonal means and anomalies relative to those means. No modification to the metadata of __N__ is required for it to serve as the norm for __A__, and __N__ may still be treated as a data variable in its own right as well. -We describe the use of a norm data variable first (<>) because this case is conceptually more obvious, although it is uncommon for __N__ to be present in the dataset. +The use of a norm data variable is described first (<>) because this case is conceptually more obvious, although it is uncommon for __N__ to be present in the dataset. The second alternative (<>) is where __norm__ in **`cell_methods`** identifies a "norm metadata variable" instead of the norm data variable. A norm metadata variable contains information about the norm axes, but no data of its own. @@ -809,7 +809,7 @@ The anomaly data variable must name the norm data variable in its **`ancillary_v The norm data variable __N__ must have all the same axes as the anomaly data variable __A__, __except__ for the anomaly axes. For each anomaly axis (although usually there is only one), the norm data variable has a coordinate variable (and dimension of the same name) or a scalar coordinate variable (named in its **`coordinates`** attribute). -In either case, we refer to it as a "norm coordinate variable". +In either case, this is referred to as a "norm coordinate variable". The norm coordinate variable must have the same **`standard_name`** as the anomaly coordinate variable. It must also have boundary variables to indicate the coordinate range over which __N__ was calculated from __P__. @@ -836,7 +836,7 @@ If the climatological time axis is multivalued, a norm metadata variable is requ .Distinguishing temporal anomalies with different kinds of norm ==== The anomaly data variable (__A__, **`delta_tas`**) contains daily maxima (along its **`time`** dimension) for 16th-19th July 2023 of the anomaly in **`air_temperature`** with respect to the climatological mean (__N__, **`climatological_tas`**) of the 30-year period 1990-2019. -In this example, we include the variable **`tas`** (__P__), from which __A__ was calculated, as __P__ − __N__. +In this example, the variable **`tas`** (__P__) is included, from which __A__ was calculated, as __P__ − __N__. The variable **`tas`** has no metadata that formally identifies it as __P__, and __P__ would not usually be included in the dataset; it is shown here for comparison of its metadata with **`delta_tas`** and **`climatological_tas`**: ---- dimensions: @@ -1090,13 +1090,13 @@ The auxiliary coordinate variable has a **`select`** attribute naming the climat The value of element __i__ of the auxiliary coordinate variable is the index (numbering from 0) along the climatological time dimension of the norm corresponding to element __i__ of the anomaly time dimension. Using this method means that the norm metadata variable can refer to a subset of elements of the climatological time axis if only some of them are relevant, and it can refer repeatedly to elements of climatological time axis if there is more than one anomaly time referring to a given climatological time. -In the following example, by "timestep __i__" we mean the slice of the data variable along its time dimension with index __i__ (recalling that index numbering starts with 0). +In the following example, "timestep __i__" means the slice of the data variable along its time dimension with index __i__ (recalling that index numbering starts with 0). Suppose that __A__ contains monthly anomalies for the months of June, July, and August of 2023 and 2024 with respect to __N__, the 30-year climatological monthly means for 1990-2019. __N__ has a climatological time axis with a dimension of 12, one for each month January through December. The anomaly axis is time, and __A__ has a time dimension of 6 (two years times three months). The first time coordinate of __A__ is June 2023, and the first value of the auxiliary coordinate variable is 5, indicating that timestep 0 of __A__ (June 2023) is the anomaly with respect to the timestep 5 of __N__ (climatological June). -Since only June, July, and August climatological means are needed in this example, we could alternatively give the climatological time axis of __N__ a dimension of 3, with elements for those three months alone. +Since only June, July, and August climatological means are needed in this example, alternatively the climatological time axis of __N__ could be given a dimension of 3, with elements for those three months alone. In that case, the first value of the auxiliary coordinate variable would be 0 for climatological June. In an abstract sense, the norm metadata variable indicates that __A__ is the difference between __P__ and __N'__, where __N'__ is an abstract construction. @@ -1153,9 +1153,9 @@ data: Element **0** of **`month_indices`** is **5**. This means that element **0** of **`time`**, which is 2023-06-16 (with bounds of 2023-06-01 and 2023-07-01, i.e. the whole of June 2023) is paired with element **5** of **`climatological_time`**, which is the climatology for June 1991-2000. -It indicates that **`delta_tas(0,:,:)`**, where by **`:`** we mean the entire range of the dimension, is the difference between **`tas(0,:,:)`** and **`climatological_tas(5,:,:)`**. +It indicates that **`delta_tas(0,:,:)`**, where **`:`** means the entire range of the dimension, is the difference between **`tas(0,:,:)`** and **`climatological_tas(5,:,:)`**. -If we chose instead to show the climatological time axis with only the three months needed in this example (June, July, and August), the following lines would replace the corresponding ones in the example above: +If instead the climatological time axis is shown with only the three months needed in this example (June, July, and August), the following lines would replace the corresponding ones in the example above: ---- dimensions: @@ -1385,7 +1385,7 @@ The single dimension of the part node count variable must equal the total number For __polygon__ geometries with holes, the geometry container variable must have an **`interior_ring`** attribute that contains the name of a variable that indicates if the polygon parts are interior rings (i.e., holes) or not. This interior ring variable must contain the value 0 to indicate an exterior ring polygon and 1 to indicate an interior ring polygon. The single dimension of the interior ring variable must be the same dimension as that of the part node count variable. -The geometry types included in this convention are listed in Table 7.1. +The geometry types included in these conventions are listed in Table 7.1. [[table-geometry-types]] .Dimensionality, description, and additional required attributes for geometry_types. diff --git a/ch08.adoc b/ch08.adoc index 0ff924ca..be56509c 100644 --- a/ch08.adoc +++ b/ch08.adoc @@ -4,13 +4,13 @@ :figure: 0 There are three methods for reducing dataset size: packing, lossless compression, and lossy compression. -By packing we mean altering the data in a way that reduces its precision (but has no other effect on accuracy). -By lossless compression we mean techniques that store the data more efficiently and result in no loss of precision or accuracy. -By lossy compression we mean techniques that either store the data more efficiently and retain its precision but result in some loss in accuracy, or techniques that intentionally reduce data precision to improve the efficiency of subsequent lossless compression. +Packing means altering the data in a way that reduces its precision (but has no other effect on accuracy). +Lossless compression means techniques that store the data more efficiently and result in no loss of precision or accuracy. +Lossy compression means techniques that either store the data more efficiently and retain its precision but result in some loss in accuracy, or techniques that intentionally reduce data precision to improve the efficiency of subsequent lossless compression. Lossless compression only works in certain circumstances, e.g., when a variable contains a significant amount of missing or repeated data values. In this case it is possible to make use of standard utilities, e.g., UNIX **`compress`** or GNU **`gzip`**, to compress the entire file after it has been written. -In this section we offer an alternative compression method that is applied on a variable by variable basis. +In this section an alternative compression method is presented that is applied on a variable by variable basis. This has the advantage that only one variable need be uncompressed at a given time. The disadvantage is that generic utilities that don't recognize the CF conventions will not be able to operate on compressed variables. @@ -24,11 +24,11 @@ If both attributes are present, the data are scaled before the offset is added. When scaled data are written, the application should first subtract the offset and then divide by the scale factor. The units of a variable should be representative of the unpacked data. -This standard is more restrictive than the <> with respect to the use of the **`scale_factor`** and **`add_offset`** attributes; ambiguities and precision problems related to data type conversions are resolved by these restrictions. +These conventions are more restrictive than the <> with respect to the use of the **`scale_factor`** and **`add_offset`** attributes; ambiguities and precision problems related to data type conversions are resolved by these restrictions. When packed data is written, the **`scale_factor`** and **`add_offset`** attributes must be of the same type as the unpacked data, which must be either **`float`** or **`double`**. Data of type **`float`** must be packed into one of these types: **`byte`**, **`unsigned byte`**, **`short`**, **`unsigned short`**. Data of type **`double`** must be packed into one of these types: **`byte`**, **`unsigned byte`**, **`short`**, **`unsigned short`**, **`int`**, **`unsigned int`**. -When packed data is read, it should be unpacked to the type of the **`scale_factor`** and **`add_offset`** attributes, which must have the same type if both are present. For guidance only, we suggest that packed data which does not conform to the rules of this section regarding the types of the data variable and attributes should be unpacked to **`double`** type, in order to minimise the risk of loss of precision. +When packed data is read, it should be unpacked to the type of the **`scale_factor`** and **`add_offset`** attributes, which must have the same type if both are present. For guidance only, it is suggested that packed data which does not conform to the rules of this section regarding the types of the data variable and attributes should be unpacked to **`double`** type, in order to minimise the risk of loss of precision. When data to be packed contains missing values the attributes that indicate missing values (**`_FillValue`**, **`valid_min`**, **`valid_max`**, **`valid_range`**) must be of the same data type as the packed data. See <> for a discussion of how applications should treat variables that have attributes indicating both missing values and transformations defined by a scale and/or offset. @@ -57,9 +57,9 @@ The list variable must not have an associated boundary variable. [caption="Example 8.1. "] .Horizontal compression of a three-dimensional array ==== -We eliminate sea points at all depths in a longitude-latitude-depth array of soil temperatures. +In a longitude-latitude-depth array of soil temperatures all sea points at all depths can be eliminated. In this case, only the longitude and latitude axes would be affected by the compression. -We construct a list `landpoint(landpoint)` containing the indices of land points. +A list `landpoint(landpoint)` is constructed containing the indices of land points. ---- dimensions: @@ -79,7 +79,7 @@ variables: data: landpoint=363, 364, 365, ...; ---- -Since `landpoint(0)=363`, for instance, we know that `landsoilt(*,0)` maps on to point 363 of the original data with dimensions `(lat,lon)`. +Since `landpoint(0)=363`, for instance, it can be inferred that `landsoilt(*,0)` maps on to point 363 of the original data with dimensions `(lat,lon)`. This corresponds to indices `(3,75)`, i.e., `363 = 3*96 + 75`. ==== @@ -87,7 +87,7 @@ This corresponds to indices `(3,75)`, i.e., `363 = 3*96 + 75`. [caption="Example 8.2. "] .Compression of a three-dimensional field ==== -We compress a longitude-latitude-depth field of ocean salinity by eliminating points below the sea-floor. +Points below the sea floor can be eliminated to compress a longitude-latitude-depth field of ocean salinity. In this case, all three dimensions are affected by the compression, since there are successively fewer active ocean points at increasing depths. ---- @@ -682,7 +682,7 @@ For instance, a **`computational_precision**` value of **`"64"**` would specify Geoscientific models and measurements generate false floating-point precision (scientifically meaningless data bits) that wastes storage space. False precision can mislead (by implying noise is signal) and is scientifically pointless. Quantization algorithms can eliminate false precision, usually by rounding the least significant bits of <> floating-point mantissas to zeros. -(Quantization of integer types, although theoretically allowed, is not covered by this convention.) +(Quantization of integer types, although theoretically allowed, is not covered by these conventions.) The quantized results are valid <> values---no special software or decoder is necessary to read them. Importantly, the quantized bits compress more efficiently than random bits. Thus quantization is sometimes referred to as a form of lossy compression although, strictly speaking, quantization only pre-conditions data for more efficient compression by a subsequent compressor. diff --git a/ch09.adoc b/ch09.adoc index e9f85c27..a9c0c4c5 100644 --- a/ch09.adoc +++ b/ch09.adoc @@ -7,7 +7,7 @@ Discrete sampling geometry datasets are characterized by a dimensionality that i === Features and feature types Each type of discrete sampling geometry (point, time series, profile or trajectory) is defined by the relationships among its spatiotemporal coordinates. -We refer to the type of discrete sampling geometry as its **featureType**. +The type of discrete sampling geometry is called its **featureType**. The term {ldquo} **feature** {rdquo} refers herein to a single instance of the **discrete sampling geometry** (such as a single time series). The representation of such features in a CF dataset was supported previous to the introduction of this chapter using a particular convention, which is still supported (that described by section 9.3.1). This chapter describes further conventions which offer advantages of efficiency and clarity for storing a collection of features in a single file. @@ -72,7 +72,7 @@ Instance variables provide the metadata that differentiates individual features. The subscripts o and p distinguish the data elements that compose a single feature. For example in a collection of **timeSeries** features, each time series instance, i, has data values at various times, o. In a collection of **profile** features, the subscript, o, provides the index position along the vertical axis of each profile instance. -We refer to data values in a feature as its **elements**, and to the dimensions of o and p as **element dimensions**. +Data values in a feature are called **elements**, and the dimensions of o and p are called **element dimensions**. Each feature can have its own set of element subscripts o and p. For instance, in a collection of timeSeries features, each individual timeSeries can have its own set of times. The notation t(i,o) means there is a set of times with subscripts o for the elements of each feature i.   @@ -96,7 +96,7 @@ Four types of representation are utilized in this chapter: * two **multidimensional array representations**, in which each feature instance is allocated the identical amount of storage space. In these representations the instance dimension and the element dimension(s) are distinct CF coordinate axes (typical of coordinate axes discussed in chapter 4); and * two **ragged array representations**, in which each feature is provided with the minimum amount of space that it requires. -In these representations the instances of the individual features are stacked sequentially along the same array dimension as the elements of the features; we refer to this combined dimension as the **sample dimension**. +In these representations the instances of the individual features are stacked sequentially along the same array dimension as the elements of the features; this combined dimension is called the **sample dimension**. In the multidimensional array representations, data variables have both an instance dimension and an element dimension. The dimensions may be given in any order. diff --git a/history.adoc b/history.adoc index 89f4fa0e..8f0cd44c 100644 --- a/history.adoc +++ b/history.adoc @@ -7,6 +7,8 @@ === Version 1.14-draft +* {issues}623[Issue #623]: Consistently refer to "CF conventions" and use of impersonal from. + === Version 1.13 (17 December 2025) * {issues}582[Issue #582]: Conventions for anomaly data @@ -21,7 +23,7 @@ === Version 1.12 (04 December 2024) -* {issues}513[Issue #513]: Include DOI and License information in the conventions document +* {issues}513[Issue #513]: Include DOI and License information in the conventions document * {issues}499[Issue #499]: Formatting of local links in text * {issues}566[Issue #566]: Fix invalid CRS WKT attribute in example 5.12. * {issues}527[Issue #527]: Clarify the conventions for boundary variables, especially for auxiliary coordinate variables of more than one dimension, state that there is no default for boundaries, add more information about bounds in section 1. @@ -47,7 +49,7 @@ * {issues}458[Issue #458]: Clarify the use of compressed dimensions in related variables * {issues}486[Issue #486]: Fix PDF formatting problems and invalid references * {issues}490[Issue #490]: Simple correction to Example 6.1.2 -* {issues}457[Issue #457]: Creation date of the draft Conventions document +* {issues}457[Issue #457]: Creation date of the draft Conventions document * {issues}445[Issue #445]: Updates concerning the Polar Stereographic Grid Mapping * {issues}468[Issue #468]: Update section 2.3 to clarify recommended character set * {issues}147[Issue #147]: Clarify the use of compressed dimensions in related variables @@ -74,7 +76,7 @@ * {pull-requests}408[Pull request #408]: Deleted a sentence on "rotated Mercator" under `Oblique Mercator` grid mapping in Appendix F * {issues}265[Issue #265]: Clarification of the requirements on bounds variable attributes * {issues}260[Issue #260]: Clarify use of dimensionless units -* {issues}410[Issue #410]: Delete "on a spherical Earth" from the definition of the `latitude_longitude` grid mapping in Appendix F +* {issues}410[Issue #410]: Delete "on a spherical Earth" from the definition of the `latitude_longitude` grid mapping in Appendix F * {issues}153[Issue #153]: Reference UGRID conventions in CF === Version 1.10 (31 August 2022)