Along those lines, I was recently asked to translate some assembly instructions which were used to atomically increment a global counter (rather than incrementing the counter between calls to mutex_lock and mutex_unlock. However, I was reminded by a co-worker that Solaris 10 provides implementations of several low-level atomic ops. See the man pages for:
So rather than propagating more assembly into this port, the engineers are instead considering calls to atomic_add_64.This solution avoids application-level assembly code and therefore should be easier to maintain than the original version (since it is identical between both SPARC and x86, as well as for 32-bit and 64-bit). However, because this implementation uses function calls, it appears to be very slightly slower than inlined assembly code (which can be injected using either a parameterized asm() statement with the GNU compilers or with an inline assembly template with the Sun compilers.
I'll revisit this topic if the engineers on the project decide that the extra complexity of inline assembly is worth the maintenance burden.