Looking at this, there's nothing in here that needs C++, to make the binary smaller, should just use C.
Also the asm code is trivial, so using #pragma aux would be easier, also could be easy to use the high bandwith API (aka use outsb and insb) to add more capabilities.