New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should measurand and associated SU data items always have the same content type #278
Comments
The point that the bracket notation cannot handle some SU values for integers is worth making and we are fortunate that we now recognise that the SU can always be presented as a separate data value. I will add a comment on this to the DDLm chapter. |
Here is the revised text of the relevant part of the draft DDLm chapter. Feel free to comment!
|
Thank you for the answers. The revised text looks great, however, the revision might not actually be required after all. A more detailed explanation is provided below. I gave some more thought to this and came to the conclusion that expressing small SU values for integers using the parenthesis notation might be possible after all if the integer value syntax is slightly relaxed by allowing integer values to have any number of 0 after the decimal separator (e.g. treat 5, 5.0, 5.00, 5.000 ... as valid expressions of integer five). For example, Furthermore, this might already be valid if Given the above, the combination of Two related questions:
[1] https://www.iucr.org/resources/cif/spec/version1.1/cifsyntax, Appendix A Line 1683 in 9a2b576
[3] https://www.iucr.org/__data/assets/text_file/0009/112131/CIF2-ENBF.txt [4] https://www.iucr.org/__data/assets/pdf_file/0007/16378/dREL_spec_aug08.pdf, section 2.2.2 |
These are good questions that have not been explicitly answered. For fun I will ask the DDLm group to see if they have any ideas. My own feeling is that the string '2.0' for data name '_xyz' of type 'Integer' corresponds to the integer 2, just as both delimited and undelimited strings can be considered numbers. |
As an additional note, after a lot of discussion the developers of JSON-Schema also came to the same conclusion [1] as @jamesrhester:
This definition also includes the scientific notation, that is actually also not covered https://json-schema.org/understanding-json-schema/reference/numeric.html#integer |
I concur with James that as far as statistics go, there is no inconsistency in items defined to take integer values having standard uncertainties that are non-integral. However, I think that there is an inconsistency between typing these specific items as integers and attributing an uncertainty to them. Do they convey exact counts measured by a device that can produce such? Then they have no uncertainty. Or do they convey estimates of counts? then there is no good reason to constrain them to be integral -- in fact, I would say that doing so would be statistically wrong. Similar would apply to most other items. I'm having trouble seeing, semantically, where it would make sense for an item defined to describe an uncertain measurement to be constrained to be integral. |
Thank you @vaitkus for finding that JSON information. I think it is reasonable to view the JSON specification as playing a similar role to the CIF1.1 syntax specification, that is, the only source of type information is the value itself. I think our context here is a little different, as we have a dictionary involved, so we can't draw as much guidance as we might like from how JSON have tackled it. When a dictionary is available that states that a particular string should be an integer, then fundamentally the programmer is faced with a choice as to which of '2', '2.000', '0.2E1' to accept. One of the functions of a standard is to clarify such choices, and if we can try to make the standard align with the most likely choices of CIF software authors, who don't always read standards in quite the detail that we discuss them, then the standard will be more effective as we will have helped those that do think about these things to work in the same way as the rest. So I think we state that integers don't contain exponents or decimal points, in keeping with programming language behaviour and intuitive expectations. I will update the DDLm draft to clarify this if we and the DDLm group find this acceptable. If a SU that is not an integer needs to be provided, then the @jcbollinger makes an interesting point, but we are stuck with a historical decision that these data items are integers. To continue the discussion, measured raw counts must be integers. Due to Poisson counting statistics we can use that measurement to estimate the most likely value as being the measured value with sqrt(N) error bars. And I agree that this estimate is a real number. What has happened is that these two steps (measurement and estimate of true value) have been conflated, and for our IUCr SU formatting rules this only matters for the numbers 2 and 3. Interestingly, the powder dictionary explicitly says ( So we have the following choices, assuming that we treat integers as not having a decimal point:
As a final comment, the |
Approach (2) seems nicer since it keeps things consistent between dictionaries (same approach). Also, I think that associated |
The |
The only comment from the core DMG approved of this change. The actions to take are:
|
Issue was resolved by merging PR #315. |
I have noticed that several Measurand data items from the
CIF_CORE
dictionary have theInteger
content type while the associated SU data items have theReal
content type. Data items in question:Also note, that while it is possible to assign a real SU value to a integer measurand value using a separate data item, this does not seem to be possible using the alternative parenthesis notation. For example:
can be written as 43(1), but it is unclear how the same could be done for:
Two questions:
_diffr_refln.count_*
data items be changed in any way (e.g. by redefining SU items as integers)?The text was updated successfully, but these errors were encountered: