introduce DataType as separate type
Created by: germasch
Alright, so this is going to be lengthy (both the code and this description). I can break this up into pieces that could be merged one at a time. (And actually, there are a couple more things that could/should be done on top of this, too).
I hadn't really planned to dive into ADIOS2, and I'll probably mostly disappear from here again soon, but I feel like I would like to see through what I started. This all began when I hit problems due to adios2 internally being templated for 11 integer types, while it really only could properly distinguish the 8 stdint types ({u,}int{8,16,32,64}_t
. The PRs I provided so far, which moved most of adios2's internals to the stdint-based types have already been merged. Actually, in making those PRs, I had already used something akin to the below, which made it possible to trace where and how the various types are used and what the expected values are.
The primary objective of this PR is to introduce a class DataType
as a replacement for where right now std::string
is used. For most of this series, DataType
actually is just a strongly typed wrapper around std::string
, which serves to document where strings are used to designate an adios2 types vs just some other string vs, e.g., a FFS data type. It then also narrows down DataType
to be only one out of a set of predefined strings (and doing that, it turns out while we tested again "unknown"
, there's actually no way a data type string would ever have been set to "unknown"
. Finally, it actually replaces DataType
to be just a wrapper around an int
, rather than around std::string
. That last part could / should be continued to actually use an enum, and doing this smartly, one could use that same enum in the C interface and in bpls, avoiding some mapping code.
This PR touches a lot of code, though most of it in fairly trivial ways. So I feel I better justify it:
- None of the bindings are affected. (Though it does open opportunities to improve them)
-
DataType
is strongly typed (it converts only explicitly to/from a string where needed) -
DataType
enforces a predefined subset of types, so no typos or accidental new inventions are possible. (This had happened, though not in ways that actually caused problems, e.g., comparing a type string to"int64_t"
, which is not a value it ever could have had. -
DataType
gets us close to being able to rename the type strings from the kinda unpleasant primitive-based names ("unsigned long long int"
) to the much clearer"uint64_t"
. - adios2 uses enum classes for pretty much every other choice, only the data type is represented by
std::string
right now. - While performance is obviously going to be mostly dominated by network and file system, the code passes type strings around a lot, and compares them even more (all the ADIOS2_FOREACH_* loops). Some of them are actually in places which I think are somewhat performance sensitive, like in
adiosMemory.h
. Small string optimization means that for the most part dynamic memory allocation is avoided, at least, though I believestd::string("unsigned long long int")
actually triggers a malloc on gcc. Comparing strings is definitely more complex than comparing integers, too. - As a somewhat arbitrary indication, I ran
size libadios2.2.3.1.dylib
on my mac:
master:
__TEXT __DATA __OBJC others dec hex
16019456 98304 0 6811648 22929408 15de000
after this PR:
__TEXT __DATA __OBJC others dec hex
15056896 98304 0 6815744 21970944 14f4000
That's actually a rather noticeable (1MB) decrease in __TEXT (actual machine code). I believe this mostly because I statically allocate the possible DataTypes, rather than creating the strings on demand all the time, not so much because of the string -> int change.
FWIW, after these changes I can tell that there are only a few places left that use the string representation of types. They almost all use the (successor of) GetType<T>()
, so they are independent of the actual string chosen. The exception is ffs_marshal.c
which as a C file cannot employ that mechanism, and hence would benefit from available enum (Though this PR doesn't go there yet). After that, the string representations are only used in debug messages and in the bindings.
E.g. bpls
output would be affected, and I'm not sure we'd want to go there, but that's another discussion.