Skip to content

Converts the cxx11 Variable interface to map to/from stdint types

Created by: germasch

First of all, this is just a proof-of-concept, so I'm not asking for it to actually be merged. For example, it only handles Variables, but not Attributes the same way.

This PR is a proposed solution to the problem discussed in #1150 (closed). It leaves the core of adios2 unchanged, but it changes the cxx11 interface to be more symmetric with respect to reading back variables using the same type used write them.

Even though there's been some discussion already in #1150 (closed), let me repeat some background -- sorry for this being so lengthy, but I'm trying to make this PR kinda self-contained.

  • C++11 inherited from C a kinda messy type system.
    • There are type aliases (same as typedef), which are just new names for the same type. E.g., using size_t = unsigned long (or typedef unsigned long size_t).
    • The integer types are not well defined w.r.t their actual representation. E.g., a long int is 32 bits on 32-bit Linux, but 64 bits on 64-bit system. There are, however, type aliases defined to help alleviate this ambiguity, e.g., uint64_t is an alias to a fundamental type that's unsigned and 64 bits wide. On Mac OS 64 bit (which I'll use as an example, because that's what's on my laptop, though I believe 64 bit Linux is the same), uint64_t is an alias for unsigned long long, though it could have been an alias for unsigned long, as both of these are 64 bit unsigned integer types. However, it is important to realize that while unsigned long and unsigned long long are both 64 bit unsigned types, they are distinct types.
  • I will call two types "equivalent" if they have the same signedness and the same number of bit. On my 64 bit MacOS, unsigned long and unsigned long long are equivalent, though on, e.g., 32-bit Linux they are not. They are, however distinct types (not aliases) as far as the language is concerned.
  • For another example, char and signed char are equivalent on 64 bit MacOS, but they are distinct types.
  • Disregarding that there may be non-native 128 bit integer types, too, there are 8 different representations of integers (signed, unsigned x 8, 16, 32, 64 bit). But in C++11, there are 11 distinct integer types, which map to those representations, in a not necessarily standard way.

Back to adios2: When using the cxx11 interface for defining a variable and writing it to disk with bp3, any of the 11 integer types can be used, or any of their aliases (including the stdint type aliases), and they will be mapped to their corresponding, possible machine-dependent actual representation. E.g., on Linux-32, unsigned long will be written as an unsigned 32 bit quantity (because that's what it actually is). On Linux-64/MacOS-64, it will be written as an unsigned 64 bit quantity (again, because that's what it actually is). One lesson to be learned here is that if one wants to write portable files, one should not unsigned long, but rather uint32_t or uint64_t, depending on what's actually needed. However making a variable of unsigned long type is supported and works, just with a machine-dependent result.

However, InquireVariable / reading is not really symmetric in that for writing, one can use any 64-bit unsigned type to create a 64-bit unsigned value in the file. When reading back that quantity, one cannot use any 64-bit type, though, it has to be fundamental type underlying uint64_t or an alias for that type.

In my opinion, this asymmetry is not good. If adios2 allows me to write an unsigned long variable, it should allow me to read the value back into an unsigned long variable. One way to deal with the asymmetry could be to only allow reading and writing of the 8 stdint types, though it's not enforceable to use only the stdint types, since those are just aliases not distinct types, so when allowing to write uint64_t, one would automatically allow to also write unsigned long long on MacOS-64, but one would prohibit to write unsigned long or size_t. In addition to being difficult to enforce, it also makes it a hassle to use adios2 for certain use cases, like adding checkpoint / restart to existing codes. In this case, it makes a lot of sense to write, e.g., size_t or char variables to a file, even though those types are ambiguous in that different system will have different representations of such types. However, with the goal being to read the data back on the very same system, portability of checkpoint files isn't much of a concern, while having to deal with converting, e.g, an array of size_t to a stdint type of choice so that it can be unambiguously written/restored could be quite painful.

This PR implements an alternate proposed solution. As before, writing of any, e.g., 64 bit integer type is allowed (e.g., both unsigned long and unsigned long long on MacOS-64). But it is now also allowed to read such a value back into any 64 bit integer type, ie., it can be read back into unsigned long or unsigned long long (or uint64_t or size_t) on MacOS-64.

This is all done at the cxx11 interface level, ie., the core is unchanged (though I think the core should probably be changed to just support the 8 stdint types in the first place then). A Variable<unsigned long long> is a distinct type from a Variable<unsigned long>, as it needs to be because how the language works, however, they internally will both just be wrappers to core::Variable<uint64_t> (again, on MacOS-64).

The necessary mapping to do this actually already existed, by means of adios2::TypeInfo. This proof-of-concept only does make the change for Variables, not for Attributes, though I don't think there's any reason that Attributes couldn't be handled the same way.

As far as I can tell, none of the automated tests fail that weren't already broken.

Merge request reports