sizeof(std::variant)
I was debugging a memory problem with the SOM training of the PHZ pipeline. Even if the input file was just around 100 MiB, the memory consumption would grow up to 4 GiB without any evident explanation.
It turns out that Alexandria’s Table
class is just too flexible. It can read POD as float
, double
, int
, but also
more complex types as std::vector<int>
or NdArray<int>
.
The latter is similar to numpy’s ndarray
, so it has to book-keep more
information that a plain std::vector
: i.e., shape, strides, underlying
container, etc.
Table::Row
does this using a boost::variant
with all the supported types,
which is all fine… except that the variant will keep as much memory as the
biggest type (like a union
), plus a type flag, plus any padding that may be
required.
sizeof(NdArray<int>)
was 112 bytes or so, blowing up the memory required for
each individual cell.
To reduce the memory required by an NdArray
I changed this:
class NdArray {
private:
size_t m_offset;
std::vector<size_t> m_shape, m_stride_size;
std::vector<std::string> m_attr_names;
size_t m_size, m_total_stride;
std::shared_ptr<ContainerInterface> m_container;
};
By sort-of a pimpl idiom:
class NdArray {
private:
struct Details {
size_t m_offset;
std::vector<size_t> m_shape, m_stride_size;
std::vector<std::string> m_attr_names;
size_t m_size, m_total_stride;
std::shared_ptr<ContainerInterface> m_container;
};
std::unique_ptr<Details> m_details_ptr;
};
Now sizeof(NdArray)
is just 8 bytes.
Sure, it complicates the constructors and require some indirection,
but the memory used when reading a catalog is greatly reduced.