Jump to content

Fast sqrt() for 16-bit integers

Recommended Posts

Presuming you are talking about a processor with no support in hardware for division or multiplication:


In the absence of hardware support, the fastest is bitwise calculation akin to long division. The cycle is basicly subtract then replace if less, followed by shifts. The structure is similar to long division, and is about the same number of cycles as a single division. This is one-cycle-per-bit in hardware, but a bit more work in software. Figure two shifts, a mask, and a subtraction, and either another mask or unsubracting, plus loop overhead. You may need a 32 (or 24) bit accumulator for the residuals to catch borrows from the LSb.


The FASTEST CONVERGING practical method is the Babylonian (divide and average; Newton's method), but that requires division, so is likely to be slower in practice for a 16 bit int, even with hardware division (though it may be close)


If exact value isn't needed, or the range is restricted, other methods can be faster, such as a lookup table, but for a 16bit value, the extra code and lookup table overhead are more likely to slow things down.


A bit more detail (what processor, what language, how accurate, etc) can narrow it down. This is a case where the wikipedia page isn't bad: https://en.wikipedia.org/wiki/Methods_of_computing_square_roots

Link to post
Share on other sites

It shouldn't be too hard to tweak the TivaWare isqrt.c to use a 'word' instead of a 'long' - change the types to uint16_t, run the loop 8 times instead of 16, shift by 14 instead of 30 (I think that's right). As always, write a test program on the host to prove the algorithm before it goes onto the micro.

Also, the third C example on this page might help: http://www.codecodex.com/wiki/Calculate_an_integer_square_root
You'll pick the uint16 settings.
Finally, a google search of "Integer Square Roots Jack Crenshaw" turns up lots of pages... some of which suggest that converting to float, using the FP, and converting back can be faster than the loops in these approximations. I haven't measured with the Tiva, so I don't know if this is true. Seems like it might... three instructions + the FP sqrt time? (you mentioned Stellaris, so I'm guessing Cortex M4F processor).
Well, here's my attempt at my first comment - the TivaWare isqrt for 16bits instead of 32.
 *	16-bit integer square root
 *	derived from TivaWare isqrt.c

unsigned short isqrt(unsigned short value)
    int i;
    unsigned short rem, root;

    rem  = 0;
    root = 0;

    // loop over the eight bits in the root
    for ( i = 0; i < 8; i++ ) {
        // shift the root up by one bit
        root <<= 1;

        // move next two bits from the input into the remainder
        rem = ((rem << 2) + (value >> 14));
        value <<= 2;

        // test root is (2n + 1)

        if ( root <= rem ) {
            // root not more than the remainder, so the new bit is one
            rem -= root;
        } else {
            // root is greater than the remainder, so the new bit is zero
    return (root >> 1);

#ifdef TEST
#include <stdio.h>
#include <stdlib.h>

int main(int ac, char *av[])
	int i;
	unsigned short n, r;
	for ( i = 1; i < ac; i++ ) {
		n = strtoul(av[i], NULL, 0);
		r = isqrt(n);
		printf("%-8s %8d %8d\n", av[i], n, r);
	return 0;

And the output of a test run:

$ ./iq 10 20 30 40 50 60 70 80 90 100 1024 2048 4096 8192 16384 32768
10             10        3
20             20        4
30             30        5
40             40        6
50             50        7
60             60        7
70             70        8
80             80        8
90             90        9
100           100       10
1024         1024       32
2048         2048       45
4096         4096       64
8192         8192       90
16384       16384      128
32768       32768      181

Have fun!


[Edit 1/1/2017 to fix incorrect comment in the code. Tsk!]

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...