Jump to content

Newbie question on MSP432 FPU

Recommended Posts

Got my MSP432 a couple of days and have been playing around with it, mainly just porting across simple stuff I have previously done on the Arduino.


I am getting a bit confused by the FPU and how it operates, I can from commands I see in the status window when I compile and upload that it appears the FPU is being enabled but I do not seem to be getting any benefit from it. For example the following code actually runs faster on an Arduino (an 8 bit based 16 MHz atmega328 Arduino) than on my 32 bit 48 Mhz M4F based MSP432. 1750ms v 1654ms:-


for (int i=1;i<50000;i++)
I see that the sqrt() and pow() functions in Energia return a double and that the M4F only has a single precision FPU, I had tried casting results and/or operands as floats but it makes no difference. I have had a look through the forums and can find nothing on this so I am guessing it is something really silly. Can anyone cast any light on what'a going on here? Are there special functions I should use that keeps in single precision or something I need to set up first? I haven't tried writing similar code and compiling in CCS for comparison yet.
I have by the way the noticed that long integer math is substantially faster just from the increased clock rate and not having to emulate the long datatype in software.
Thanks in advance
Link to post
Share on other sites

I have done a bit more digging around and found a similar question on here, I had just been using the wrong search terms




So using the sqrtf() function this is down to 500ms, this still doesn't seem very fast when it takes 1654ms on an 8 bit processor at a third of the clock speed having to emulate floating point in software. My guess is that the compiled code from Energia is still producing instructions to emulate floating path math in software rather than producing code to execute floating point instructions on the FPU.


What am I missing? 

Link to post
Share on other sites

Could someone explain why an emulated double on msp432 would be slower than an emulated double on arduino?


Maybe it has something to do with wait state and pipeline?


Arduino being a single cycle zero wait state with a very simple pipeline, so in a loop, the pipeline don't slow thing down.


For m4, i don't remember if it has pipeline optimization in loop, but zero wait state is probably what slow thing down.

Link to post
Share on other sites

So what can I do to ensure that relevant code is being run on the FPU? 


Looking at what opcodes there are for the M4F FPU and I see that there is VSQRT.F32 which is basically sqrtf() in 14 cycles, what is clear is that my code is not running sqrtf() on the FPU because it is far too slow. Am I missing an additional #include or something else?


Perhaps as a sanity check someone would like to put a few lines of code together to time running 50k x sqrtf(float) and see what results they get. 

Link to post
Share on other sites

Hmm, compiler switches seem to be ok at the first glance. From the Energia verbose output:

-mfloat-abi=hard -mfpu=fpv4-sp-d16 -fsingle-precision-constant  -fno-exceptions -fno-rtti

(in Energia verbose output during compiling can be enabled under File > Preferences)


Maybe worth comparing these to what CCS sets. Also make sure to verify in the debugger, that CCS (and Energia) didn't optimize away the sqrt operation.

Link to post
Share on other sites

CCS did not optimize the sqrtf call away:

	.dwpsn	file "../main.c",line 117,column 3,is_stmt,isa 1
        LDR       A1, [SP, #0]          ; [DPU_3_PIPE] |117| 
        VMOV      S0, A1                ; [DPU_LIN_PIPE] |117| 
        VCVT.F32.S32 S0, S0             ; [DPU_LIN_PIPE] |117| 
$C$DW$76	.dwtag  DW_TAG_TI_branch
	.dwattr $C$DW$76, DW_AT_low_pc(0x00)
	.dwattr $C$DW$76, DW_AT_name("sqrtf")
	.dwattr $C$DW$76, DW_AT_TI_call
        BL        sqrtf                 ; [DPU_3_PIPE] |117| 
        ; CALL OCCURS {sqrtf }           ; [] |117| 
        VSTR.32   S0, [SP, #4]          ; [DPU_LIN_PIPE] |117| 
	.dwpsn	file "../main.c",line 116,column 27,is_stmt,isa 1
        LDR       A1, [SP, #0]          ; [DPU_3_PIPE] |116| 
        ADDS      A1, A1, #1            ; [DPU_3_PIPE] |116| 
        STR       A1, [SP, #0]          ; [DPU_3_PIPE] |116| 
	.dwpsn	file "../main.c",line 116,column 13,is_stmt,isa 1
        LDR       A2, $C$CON10          ; [DPU_3_PIPE] |116| 
        LDR       A1, [SP, #0]          ; [DPU_3_PIPE] |116| 
        CMP       A2, A1                ; [DPU_3_PIPE] |116| 
        BGT       ||$C$L14||            ; [DPU_3_PIPE] |116| 
        ; BRANCHCC OCCURS {||$C$L14||}   ; [] |116| 


0000023e:   9800                ldr        r0, [sp]
00000240:   EE000A10            vmov       s0, r0
00000244:   EEB80AC0            vcvt.f32.s32 s0, s0
00000248:   F000FAFE            bl         #0x848
0000024c:   ED8D0A01            vstr       s0, [sp, #4]

Edit, added sqrtf disassembly:

00000848:   EEB50AC0            vcmpe.f32  s0, #0
0000084c:   B508                push       {r3, lr}
0000084e:   EEF1FA10            vmrs       apsr_nzcv, fpscr
00000852:   D206                bhs        $C$L1
 46                   _Feraise(_FE_INVALID);
00000854:   2001                movs       r0, #1
00000856:   F7FFFFA3            bl         #0x7a0
 47                   return NAN;
0000085a:   4803                ldr        r0, [pc, #0xc]
0000085c:   EE000A10            vmov       s0, r0
00000860:   BD08                pop        {r3, pc}
 51               return TYPED_SQRT(x);
00000862:   EEB10AC0            vsqrt.f32  s0, s0
00000866:   BD08                pop        {r3, pc}

Compiler switches:

-mv7M4 --code_state=16 --float_support=FPv4SPD16 --abi=eabi -me
--advice:power="all" -g --float_operations_allowed=all --gcc --define=__MSP432P401R__
--define=TARGET_IS_MSP432P4XX --define=ccs --diag_warning=225 --display_error_number --diag_wrap=off -k

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

  • Create New...