sizeof(std::variant)
I was debugging a memory problem with the SOM training of the PHZ pipeline. Even if the input file was just around 100 MiB, the memory consumption would grow up to 4 GiB without any evident explanation.
It turns out that Alexandria’s Table
class is just too flexible. It can read POD as float, double, int, but also
more complex types as std::vector<int> or NdArray<int>.
The latter is similar to numpy’s ndarray, so it has to book-keep more
information that a plain std::vector: i.e., shape, strides, underlying
container, etc.
Table::Row does this using a boost::variant with all the supported types,
which is all fine… except that the variant will keep as much memory as the
biggest type (like a union), plus a type flag, plus any padding that may be
required.
sizeof(NdArray<int>) was 112 bytes or so, blowing up the memory required for
each individual cell.
To reduce the memory required by an NdArray I changed this:
class NdArray {
private:
size_t m_offset;
std::vector<size_t> m_shape, m_stride_size;
std::vector<std::string> m_attr_names;
size_t m_size, m_total_stride;
std::shared_ptr<ContainerInterface> m_container;
};
By sort-of a pimpl idiom:
class NdArray {
private:
struct Details {
size_t m_offset;
std::vector<size_t> m_shape, m_stride_size;
std::vector<std::string> m_attr_names;
size_t m_size, m_total_stride;
std::shared_ptr<ContainerInterface> m_container;
};
std::unique_ptr<Details> m_details_ptr;
};
Now sizeof(NdArray) is just 8 bytes.
Sure, it complicates the constructors and require some indirection,
but the memory used when reading a catalog is greatly reduced.