Instead of having to guess the PC where the SP was sampled, always take
both. This allows "seamless" stack decoding for both serial and xflash
dumps, since we don't have to guess which function generated the dump.
Make the core functions (doing the sampling) be ``noinline`` as well,
so that they always have valid frame.