AAPCS requires stack pointers to be aligned on a double word boundary.
In addition, Clang-3.6 assumes the stack pointer is always aligned to
a 8 byte boundary upon function entry, at least in armv7-m, causing
hard-to-find errors in the compiled code.
This is the same implementation as for the Cortex-M4