pokmo

Fast sqrt() for 16-bit integers

4 posts in this topic

Hi

 

Does anyone know of a fast method for computing square roots of 16-bit integers? I've had a look at StellarisWare, but is there something more lightweight since the input is only 16-bit?

 

Any thought appreciated.

Share this post


Link to post
Share on other sites

Presuming you are talking about a processor with no support in hardware for division or multiplication:

 

In the absence of hardware support, the fastest is bitwise calculation akin to long division. The cycle is basicly subtract then replace if less, followed by shifts. The structure is similar to long division, and is about the same number of cycles as a single division. This is one-cycle-per-bit in hardware, but a bit more work in software. Figure two shifts, a mask, and a subtraction, and either another mask or unsubracting, plus loop overhead. You may need a 32 (or 24) bit accumulator for the residuals to catch borrows from the LSb.

 

The FASTEST CONVERGING practical method is the Babylonian (divide and average; Newton's method), but that requires division, so is likely to be slower in practice for a 16 bit int, even with hardware division (though it may be close)

 

If exact value isn't needed, or the range is restricted, other methods can be faster, such as a lookup table, but for a 16bit value, the extra code and lookup table overhead are more likely to slow things down.

 

A bit more detail (what processor, what language, how accurate, etc) can narrow it down. This is a case where the wikipedia page isn't bad: https://en.wikipedia.org/wiki/Methods_of_computing_square_roots

Share this post


Link to post
Share on other sites

It shouldn't be too hard to tweak the TivaWare isqrt.c to use a 'word' instead of a 'long' - change the types to uint16_t, run the loop 8 times instead of 16, shift by 14 instead of 30 (I think that's right). As always, write a test program on the host to prove the algorithm before it goes onto the micro.

 
Also, the third C example on this page might help: http://www.codecodex.com/wiki/Calculate_an_integer_square_root
You'll pick the uint16 settings.
 
Finally, a google search of "Integer Square Roots Jack Crenshaw" turns up lots of pages... some of which suggest that converting to float, using the FP, and converting back can be faster than the loops in these approximations. I haven't measured with the Tiva, so I don't know if this is true. Seems like it might... three instructions + the FP sqrt time? (you mentioned Stellaris, so I'm guessing Cortex M4F processor).
 
Well, here's my attempt at my first comment - the TivaWare isqrt for 16bits instead of 32.
 
/*
 *	16-bit integer square root
 *	derived from TivaWare isqrt.c
 */

unsigned short isqrt(unsigned short value)
{
    int i;
    unsigned short rem, root;

    rem  = 0;
    root = 0;

    // loop over the eight bits in the root
    for ( i = 0; i < 8; i++ ) {
        // shift the root up by one bit
        root <<= 1;

        // move next two bits from the input into the remainder
        rem = ((rem << 2) + (value >> 14));
        value <<= 2;

        // test root is (2n + 1)
        root++;

        if ( root <= rem ) {
            // root not more than the remainder, so the new bit is one
            rem -= root;
            root++;
        } else {
            // root is greater than the remainder, so the new bit is zero
            root--;
        }
    }
    return (root >> 1);
}

#ifdef TEST
#include <stdio.h>
#include <stdlib.h>

int main(int ac, char *av[])
{
	int i;
	unsigned short n, r;
	for ( i = 1; i < ac; i++ ) {
		n = strtoul(av[i], NULL, 0);
		r = isqrt(n);
		printf("%-8s %8d %8d\n", av[i], n, r);
	}
	return 0;
}
#endif

And the output of a test run:

$ ./iq 10 20 30 40 50 60 70 80 90 100 1024 2048 4096 8192 16384 32768
10             10        3
20             20        4
30             30        5
40             40        6
50             50        7
60             60        7
70             70        8
80             80        8
90             90        9
100           100       10
1024         1024       32
2048         2048       45
4096         4096       64
8192         8192       90
16384       16384      128
32768       32768      181

Have fun!

 

[Edit 1/1/2017 to fix incorrect comment in the code. Tsk!]

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now