43oh

# Fast sqrt() for 16-bit integers

## Recommended Posts

Hi

Does anyone know of a fast method for computing square roots of 16-bit integers? I've had a look at StellarisWare, but is there something more lightweight since the input is only 16-bit?

Any thought appreciated.

##### Share on other sites

Presuming you are talking about a processor with no support in hardware for division or multiplication:

In the absence of hardware support, the fastest is bitwise calculation akin to long division. The cycle is basicly subtract then replace if less, followed by shifts. The structure is similar to long division, and is about the same number of cycles as a single division. This is one-cycle-per-bit in hardware, but a bit more work in software. Figure two shifts, a mask, and a subtraction, and either another mask or unsubracting, plus loop overhead. You may need a 32 (or 24) bit accumulator for the residuals to catch borrows from the LSb.

The FASTEST CONVERGING practical method is the Babylonian (divide and average; Newton's method), but that requires division, so is likely to be slower in practice for a 16 bit int, even with hardware division (though it may be close)

If exact value isn't needed, or the range is restricted, other methods can be faster, such as a lookup table, but for a 16bit value, the extra code and lookup table overhead are more likely to slow things down.

A bit more detail (what processor, what language, how accurate, etc) can narrow it down. This is a case where the wikipedia page isn't bad: https://en.wikipedia.org/wiki/Methods_of_computing_square_roots

##### Share on other sites

It shouldn't be too hard to tweak the TivaWare isqrt.c to use a 'word' instead of a 'long' - change the types to uint16_t, run the loop 8 times instead of 16, shift by 14 instead of 30 (I think that's right). As always, write a test program on the host to prove the algorithm before it goes onto the micro.

You'll pick the uint16 settings.

Finally, a google search of "Integer Square Roots Jack Crenshaw" turns up lots of pages... some of which suggest that converting to float, using the FP, and converting back can be faster than the loops in these approximations. I haven't measured with the Tiva, so I don't know if this is true. Seems like it might... three instructions + the FP sqrt time? (you mentioned Stellaris, so I'm guessing Cortex M4F processor).

Well, here's my attempt at my first comment - the TivaWare isqrt for 16bits instead of 32.

```/*
*	16-bit integer square root
*	derived from TivaWare isqrt.c
*/

unsigned short isqrt(unsigned short value)
{
int i;
unsigned short rem, root;

rem  = 0;
root = 0;

// loop over the eight bits in the root
for ( i = 0; i < 8; i++ ) {
// shift the root up by one bit
root <<= 1;

// move next two bits from the input into the remainder
rem = ((rem << 2) + (value >> 14));
value <<= 2;

// test root is (2n + 1)
root++;

if ( root <= rem ) {
// root not more than the remainder, so the new bit is one
rem -= root;
root++;
} else {
// root is greater than the remainder, so the new bit is zero
root--;
}
}
return (root >> 1);
}

#ifdef TEST
#include <stdio.h>
#include <stdlib.h>

int main(int ac, char *av[])
{
int i;
unsigned short n, r;
for ( i = 1; i < ac; i++ ) {
n = strtoul(av[i], NULL, 0);
r = isqrt(n);
printf("%-8s %8d %8d\n", av[i], n, r);
}
return 0;
}
#endif

```

And the output of a test run:

```\$ ./iq 10 20 30 40 50 60 70 80 90 100 1024 2048 4096 8192 16384 32768
10             10        3
20             20        4
30             30        5
40             40        6
50             50        7
60             60        7
70             70        8
80             80        8
90             90        9
100           100       10
1024         1024       32
2048         2048       45
4096         4096       64
8192         8192       90
16384       16384      128
32768       32768      181

```

Have fun!

[Edit 1/1/2017 to fix incorrect comment in the code. Tsk!]

## Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account. ×   Pasted as rich text.   Paste as plain text instead

Only 75 emoji are allowed.

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
• Blog

• #### Activity

×
• Create New...