Jump to content
43oh

Why is this floating point operation eating up so much Flash?


Recommended Posts

Hello there,

 

I'm new to hardware-related programming and currently having my first experiences with MSP430G2553. I'm using CCS 5.3 and building an application which is printing stuff to a LCD. For this purpose, I found oPossum's tiny printf here: http://forum.43oh.com/topic/1289-tiny-printf-c-version/

 

But because I need also to print floating point values, I added some lines to this routine. Showing only the added lines of code:

void printf(char *format, ...)
{

	 // ....

	volatile double f;

         // ....

				case 'f':
					//format++;
					i = (*format++)-48; // numer of fractional digits
					f = va_arg(a, double);
					if(f < 0 ) f = -f, putc('-');
					xtoa((unsigned long)f, dv); // integral part
					putc('.');
					xtoa((unsigned long) ((f-((int) f)) * *(dv + (9-i))), dv+5); // fractional part
					break;
				case 0: return;
				default: goto bad_fmt;
			}
		} else
			bad_fmt: putc(c);
	}
	va_end(a);
}

With that, I am able to do something like printf("%f2", 3.1415f) to have two fractional digits.

 

My total code size in Flash according to the .map file with the original printf-Function was 0x13A2 bytes. With this added lines (case 'f'), it's 0x24F4. That means, it's almost double size! Wow.Mainly the last call of xtoa is responsible for this.  :-o Why? What is happening here, what is beeing added to my code? And just for interest: If I change the mult in last call of xtoa to an a plus, it's 1396 byte less.

 

How can I implement this function more efficient? 

 

Thank you!

Link to post
Share on other sites

Floating point operations are not natively supported on this processor, or really any low-end microcontroller for that matter. However, they can be simulated in integer math with the right code. CCS includes code libraries to make this happen, and they are tightly coupled with the compiler so CCS includes & uses them seamlessly when it sees you using float types. These code libraries aren't simple, and they bloat the app substantially. This will differ depending on how many different types of manipulations your code performs on floating point numbers.

 

To get around this, you need to implement your math purely in the domain of integer math. I am not an expert (or even all that familiar) on this topic so I will defer to others on this :)

 

Sent from my Galaxy Note II with Tapatalk 4

 

Link to post
Share on other sites

A bit of a rant on double precision...

 

I really don't understand the "rule of thumb" to always use double precision floating point.  I guess the rationale is: "if you don't know what exactly is going on, go with double."  For most signal processing work and graphics, single-precision does the job better-than-fine.  For financial calculations and cryptography, you need to use extended-precision integers anyway.  Even on 64bit processors, single precision floating point often runs faster because it more densely loads-up the VPU.  For example On ARMv7-a, single precision can use NEON, double cannot.  The speed difference is darn close to 4x in general programming, or 20x in tight loops.

 

Anyway, that's just a rant.  I've done a ton of mathematic programming in my past, and I can think of less than five times where it made sense to use double precision.  No offense to anyone here... just a long explanation of why I default to single precision (float) in my work and why I think more people should, too.

Link to post
Share on other sites

The rule of thumb comes from several places

 

1) Many general purpose processors (X86, motorola, etc), going back to the mainframe era, share hardware for single and double precision, so the only saving is accessing memory, which may or may not be a saving, depending on the details of the job and details of the memory architecture. The hardware was double prec, so why not use it? In some cases single precision was actually handled a little slower.

 

2) When I was involved with numerical/scientific computing (lots of integrating systems of differntial equations, etc) the more precision the better. Roundoff was a killer. Plenty of other things came in, as well, but that was a starting point in design.

 

3) K&R and Jon Bentley

 

 

These days, there are a ton of processors where single prec makes sense for many jobs, including most DSP's, some general purpose CPU's, and when using a GPU as a FP engine, as most GPU's support single prec only (often with some compromises, which can require extra code to deal with). More transistors and better fab techniques since the 1970's and 1980's has done a lot to eliminate some of the compromises we used to live with, just to get the FP in hardware at all. Over the last few years,  I've been relearning some of the tools I used back when doing FP in software with the 8080, Z80, and 6502.

Link to post
Share on other sites

2) When I was involved with numerical/scientific computing (lots of integrating systems of differntial equations, etc) the more precision the better. Roundoff was a killer. Plenty of other things came in, as well, but that was a starting point in design.

Good points (not just 2).  I think some of the bigger points are relics of the past, though.

 

Sure, there are some really deep calculations where you want 14 digits precision instead of 7, but I found that to be rare.  In my opinion, if there's something you're doing that needs 14 digits of floating point precision, you are doing something wrong and should re-architect (well, almost all of the time).  If you're doing a one-off computational model, then it's a different issue.  Maybe this is where the legacy is from, when C was a language used for modeling.  It isn't anymore, it's used for production.  Python or Matlab are generally used for modeling.

 

Another possibility is to do intermediate calculations as single precision, against a wider-sense double precision variable.  I did a fast JPEG encoder this way, it worked great.

Link to post
Share on other sites

No question that for most applications, in C or C++ or java, single precision is now the better option. The period from the early '80's to the early 2000's there were good reasons, on a general purpose processor, to use double unless there was a really good reason for single precision (memory usage was the best argument). Even the earliest DSP's handled single precision faster than double, and only hardcore modelling and utility tools are coded in C or C++ (I don't count java.... the floating point is too crippled to do the work where it would matter) where it would be significant for MOST purposes. 'Relics of the past' is a good summary, and if I didn't make it clear that was where I was going, I apolagise.

 

There wasn't enough real estate back in the day to handle both single and double with dedicated hardware, so gen purpose processor cores tended to be built for double.

 

The key things are that most applications don't do the types of calculation where the error accumulates enough to be a problem-- an error of 1 part in 100000 doesn't matter when your end result is 8 or 10 bits, and is barely there at 16 bits-- and modern processors tend toward efficient single precision, either by having an efficient single prec pipe (most general purpose processors... either dedicated or shared as double), or by using single prec hardware to do double prec (some DSP's, most if mot all GPU's, where the intended end result is an 8 or 10 bit channel intensity, etc) as needed.

 

I can't think of an application where the '430 would be the appropriate tool that would need double off the top of my head, but I'm sure that there are a few. I have built a few PID models where double was needed due to the relative magnitude of the inputs to the control signal, but in many cases, offsetting the control signal makes more sense if practical. (I much prefer a stardard tool, but sometimes ya gotta do what ya gotta do...)

 

Side note.... Matlab and Mathmatica do a good job for modeling. Python can be ok, but has some issues. (Anyone using Excel needs help.... major flaws, many of which are well documented. That doesn't stop people who should know beter from doing modelling in Excel, tho... Go figure.)

Link to post
Share on other sites

Out of curiosity, does changing double to float reduce the code size?

 

Indeed, it does. :)

 

Flash usage:

  • Without that code (initial post) at all: 0x134e
  • With volatile double: 0x24ac
  • With volatile float: 0x175a

So the code with float adds 1036 bytes to flash. Same code having a double variable adds 4446 bytes.

 

Thanks for all other replies. I know that I should avoid floating point, maybe I can find another solution to print out floating point values (I can't avoid them completely).

Link to post
Share on other sites

The penalty of software floating point. The difference in code size suggests to me that the operations are optimized for sppeed, not code size. It is roughly 4 times as many primative operations for adouble multiply as for float, as when doing long multiplication by hand. It goes by the square of the number of digits. (really by n_log_n, but using 16 bit ints as the primative operation, it really doesn't make a difference) If optimized for code size, I would expect only a few percent difference, but a significant time cost to loop and conditional overhead.

 

I do not miss coding FP routines in software. Not at all.

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...