AdamUK 0 Posted October 29, 2015 Share Posted October 29, 2015 Got my MSP432 a couple of days and have been playing around with it, mainly just porting across simple stuff I have previously done on the Arduino. I am getting a bit confused by the FPU and how it operates, I can from commands I see in the status window when I compile and upload that it appears the FPU is being enabled but I do not seem to be getting any benefit from it. For example the following code actually runs faster on an Arduino (an 8 bit based 16 MHz atmega328 Arduino) than on my 32 bit 48 Mhz M4F based MSP432. 1750ms v 1654ms:- for (int i=1;i<50000;i++) { result=sqrt(i); } I see that the sqrt() and pow() functions in Energia return a double and that the M4F only has a single precision FPU, I had tried casting results and/or operands as floats but it makes no difference. I have had a look through the forums and can find nothing on this so I am guessing it is something really silly. Can anyone cast any light on what'a going on here? Are there special functions I should use that keeps in single precision or something I need to set up first? I haven't tried writing similar code and compiling in CCS for comparison yet. I have by the way the noticed that long integer math is substantially faster just from the increased clock rate and not having to emulate the long datatype in software. Thanks in advance Adam Quote Link to post Share on other sites
AdamUK 0 Posted October 29, 2015 Author Share Posted October 29, 2015 I have done a bit more digging around and found a similar question on here, I had just been using the wrong search terms http://forum.43oh.com/topic/7431-noob-question-floating-point/ So using the sqrtf() function this is down to 500ms, this still doesn't seem very fast when it takes 1654ms on an 8 bit processor at a third of the clock speed having to emulate floating point in software. My guess is that the compiled code from Energia is still producing instructions to emulate floating path math in software rather than producing code to execute floating point instructions on the FPU. What am I missing? Quote Link to post Share on other sites
asgard20032 9 Posted October 29, 2015 Share Posted October 29, 2015 Could someone explain why an emulated double on msp432 would be slower than an emulated double on arduino? Maybe it has something to do with wait state and pipeline? Arduino being a single cycle zero wait state with a very simple pipeline, so in a loop, the pipeline don't slow thing down. For m4, i don't remember if it has pipeline optimization in loop, but zero wait state is probably what slow thing down. Quote Link to post Share on other sites
spirilis 1,265 Posted October 29, 2015 Share Posted October 29, 2015 Could also be differences in the code implementation of sqrtf() ... AVR's libc vs the RedHat newlib used on ARM. Quote Link to post Share on other sites
AdamUK 0 Posted October 29, 2015 Author Share Posted October 29, 2015 So what can I do to ensure that relevant code is being run on the FPU? Looking at what opcodes there are for the M4F FPU and I see that there is VSQRT.F32 which is basically sqrtf() in 14 cycles, what is clear is that my code is not running sqrtf() on the FPU because it is far too slow. Am I missing an additional #include or something else? Perhaps as a sanity check someone would like to put a few lines of code together to time running 50k x sqrtf(float) and see what results they get. Quote Link to post Share on other sites
asgard20032 9 Posted October 29, 2015 Share Posted October 29, 2015 So sqrtf from AVR libc would be more optimized? Or maybe the sqrtf from AVR libc is hand written in assembly///very compiler optimized, while the one from arm newlib is written in a generic way Quote Link to post Share on other sites
chicken 630 Posted October 29, 2015 Share Posted October 29, 2015 Might be something more trivial, like wrong/missing compiler switch when Energia invokes gcc. Quote Link to post Share on other sites
AdamUK 0 Posted October 29, 2015 Author Share Posted October 29, 2015 Would be nice to get to the bottom of it and get the benefit of the FPU Quote Link to post Share on other sites
igor 163 Posted October 29, 2015 Share Posted October 29, 2015 Have you tried generating an assembler listing of the code produced by Energia, and looked through that to see what instructions are being generated? It is a little convoluted to do (wish there was a switch/button for it in Energia). e.g. http://forum.43oh.com/topic/7485-assembler-listing/ tripwire 1 Quote Link to post Share on other sites
terjeio 134 Posted October 29, 2015 Share Posted October 29, 2015 @@AdamUK Just tried with TIs compiler: I did 5.000.000 iterations with sqrtf and timed it manually to around 7 seconds by setting breakpoints in CCS - I don't know how much the debugger slows down execution but still nearly 10 times faster than what you get with Energia. Quote Link to post Share on other sites
AdamUK 0 Posted October 29, 2015 Author Share Posted October 29, 2015 That sounds more like it. I guess it is just a matter of figuring out how to compile code for the FPU through Energia now. Quote Link to post Share on other sites
chicken 630 Posted October 30, 2015 Share Posted October 30, 2015 Hmm, compiler switches seem to be ok at the first glance. From the Energia verbose output: -mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant -fno-exceptions -fno-rtti (in Energia verbose output during compiling can be enabled under File > Preferences) Maybe worth comparing these to what CCS sets. Also make sure to verify in the debugger, that CCS (and Energia) didn't optimize away the sqrt operation. Quote Link to post Share on other sites
Rickta59 589 Posted October 30, 2015 Share Posted October 30, 2015 It seems like the TI RTOS used in Energia might add some overhead. Did you try this code? http://forum.43oh.com/topic/6847-sintable-example-of-serial-and-hw-floating-point-energia/ -rick Quote Link to post Share on other sites
terjeio 134 Posted October 30, 2015 Share Posted October 30, 2015 CCS did not optimize the sqrtf call away: ||$C$L14||: .dwpsn file "../main.c",line 117,column 3,is_stmt,isa 1 LDR A1, [SP, #0] ; [DPU_3_PIPE] |117| VMOV S0, A1 ; [DPU_LIN_PIPE] |117| VCVT.F32.S32 S0, S0 ; [DPU_LIN_PIPE] |117| $C$DW$76 .dwtag DW_TAG_TI_branch .dwattr $C$DW$76, DW_AT_low_pc(0x00) .dwattr $C$DW$76, DW_AT_name("sqrtf") .dwattr $C$DW$76, DW_AT_TI_call BL sqrtf ; [DPU_3_PIPE] |117| ; CALL OCCURS {sqrtf } ; [] |117| VSTR.32 S0, [SP, #4] ; [DPU_LIN_PIPE] |117| .dwpsn file "../main.c",line 116,column 27,is_stmt,isa 1 LDR A1, [SP, #0] ; [DPU_3_PIPE] |116| ADDS A1, A1, #1 ; [DPU_3_PIPE] |116| STR A1, [SP, #0] ; [DPU_3_PIPE] |116| .dwpsn file "../main.c",line 116,column 13,is_stmt,isa 1 LDR A2, $C$CON10 ; [DPU_3_PIPE] |116| LDR A1, [SP, #0] ; [DPU_3_PIPE] |116| CMP A2, A1 ; [DPU_3_PIPE] |116| BGT ||$C$L14|| ; [DPU_3_PIPE] |116| ; BRANCHCC OCCURS {||$C$L14||} ; [] |116| Disassembly: $C$L14: 0000023e: 9800 ldr r0, [sp] 00000240: EE000A10 vmov s0, r0 00000244: EEB80AC0 vcvt.f32.s32 s0, s0 00000248: F000FAFE bl #0x848 0000024c: ED8D0A01 vstr s0, [sp, #4] Edit, added sqrtf disassembly: sqrtf(): 00000848: EEB50AC0 vcmpe.f32 s0, #0 0000084c: B508 push {r3, lr} 0000084e: EEF1FA10 vmrs apsr_nzcv, fpscr 00000852: D206 bhs $C$L1 46 _Feraise(_FE_INVALID); 00000854: 2001 movs r0, #1 00000856: F7FFFFA3 bl #0x7a0 47 return NAN; 0000085a: 4803 ldr r0, [pc, #0xc] 0000085c: EE000A10 vmov s0, r0 00000860: BD08 pop {r3, pc} 51 return TYPED_SQRT(x); $C$L1: 00000862: EEB10AC0 vsqrt.f32 s0, s0 00000866: BD08 pop {r3, pc} Compiler switches: -mv7M4 --code_state=16 --float_support=FPv4SPD16 --abi=eabi -me --advice:power="all" -g --float_operations_allowed=all --gcc --define=__MSP432P401R__ --define=TARGET_IS_MSP432P4XX --define=ccs --diag_warning=225 --display_error_number --diag_wrap=off -k tripwire 1 Quote Link to post Share on other sites
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.