Jump to content
43oh

Tiva/CC3200 DMA performance information?


Recommended Posts

Does anyone have any good sources of performance information for the PL230 uDMA used in Tiva and CC3200 MCUs (or any MCU using the same DMA controller)?

 

I'm used to using MSP430 DMA which is quite clearly specified. Each transfer spends two MCLK cycles accessing the bus (which halts the CPU), plus an extra two cycles each time it's triggered. The synchronisation time required to wake up from various low power modes is also given.

 

The Tiva and CC3200 TRMs don't give any details of this type. I'd like to know how many cycles each transfer takes in the best case, with no contention, arbitration or wait states involved. I'd also be interested to know how achieveable that is in the real world. Additionally I'd like to know whether all those cycles involve bus access. As I understand it the Tiva DMA can only access the SRAM when the CPU isn't using it, so I'm wondering how easy it is to run the DMA and CPU in parallel without the DMA being starved of SRAM access.

 

I've looked at ARM's documentation for the PL230 and it's not much better. The only timing information appears to be in the signal timing diagrams:

 

post-30355-0-63867000-1426632629_thumb.png

This diagram suggests it takes 7 cycles per transfer, but the description in the document implies it's set up to arbitrate after each transfer. Perhaps it goes faster if arbitration is less frequent?

 

post-30355-0-78024900-1426632866_thumb.png

This one shows four cycles per transfer using waitonreq and arbitration every two transfers. That's two differences to the configuration in the previous diagram, so I don't know which change is responsible for the different timing. Also both diagrams have unexplained idle periods at the start, and the waitonreq diagram has one at the end too.

 

Any first-hand info or recommended reading on this would be much appreciated. Thanks!

 

EDIT: I found an interesting appnote from Silicon Labs for their Gecko line (AN0013). It gives timing for a single transfer from ADC to SRAM and explains how arbitration rate affects the flow through the DMA state machine. Still, it leaves a lot of questions unanswered. How long do about transfers to/from other peripherals and SRAM->SRAM transfers take? Also it's not clear whether the timings given are purely due to the DMA controller itself, or the specific slave devices it's communicating with (ie does this information transfer across to other MCUs using the same controller).

Link to post
Share on other sites
  • 2 weeks later...

I got a CC3200 launchpad and did some testing in comparison to the MSP430F5529 launchpad. The test programs configure three DMA channels to run in sequence. The first channel writes to a GPIO port setting an output high, the next does the measured transfer and the third writes the GPIO again to take the pin low. The CPU is asleep during the transfers, preventing any delay due to contention.

 

The output pin can be traced on a scope to see how long the transfer took, and is also fed back into two timer capture inputs. This allows timing to be measured automatically and printed to the debug console after each test. The overhead of the extra transfers used to toggle the output pin is measured and cancelled out from the results.
 
The programs test a series of DMA lengths from 0 to 32 bytes, counting the MCLK cycles taken for each. The CC3200 DMA is set up in auto-request mode, arbitrating every 16 bytes (hence the slight bump in the middle of the graph). The MSP430 DMA copies a single block, but I tweaked the results to emulate an arbitration after 16 bytes. The CC3200 is running at 80MHz (no choice there), but the F5529 is at 3MHz so there should be no wait states required.
 
Here's what I get:
 
post-30355-0-55102600-1427563513_thumb.png

 

The MSP430F5529 results are as expected: two cycles to start/stop plus two cycles per transfer in the block.

 

The CC3200 DMA timings exactly match those implied by the ARM documentation. It takes eight cycles for the initial transfer after a channel is triggered. Subsequent transfers take four cycles each, as long as no arbitration occurs. If the channel pauses for arbitration and continues uninterrupted, then the next transfer takes seven cycles.
 
On the SRAM->GPIO test the DMA is configured for 16-bit transfers, which costs an extra four cycles for every 32 bits. The reduced transfer rate is entirely due to the fact that it's performing twice as many transfers; there are no additional wait states required by the GPIO AHB slave. SRAM to SRAM DMA with a 16 bit transfer size takes the same number of cycles.
 

I expect the same will be true for TM4C: as I understand it, the CC3200 "apps" CPU is heavily based on Tiva.

 

In terms of cycles taken the CC3200 DMA is marginally slower than MSP430 DMA, despite the fact it can copy twice as many bits on each transfer. As yet I don't have any results on the effect of running the CPU and DMA in parallel. In theory that should be a win for the CC3200, but it appears to spend a lot of time accessing the channel control struct in SRAM. That may leave it vulnerable to blocking by CPU instruction and data fetches.

 

Also I need to get a comparison of these timings against an optimised memcpy on Cortex-M4. I think that might be faster than using the DMA alone. Power measurements for both cases might be interesting too...

Link to post
Share on other sites

Could you post your code as a learning tool for everyone here who is DMA deficient ?

 

I'm happy to post the code to show how I got the results, but be warned: it ain't pretty!

 

I used direct register access instead of driverlib so I could keep things consistent with the MSP430 version. Also the test setup doesn't use interrupts directly related to the DMA operation. Instead it wakes up when the timer module measuring the DMA transfer captures the falling edge of its input.

 

For those reasons it's a pretty bad example of how to write DMA code on CC3200. This post is probably a better place to look for that.

 

Anyway, enough excuses, here it is:

 

DMATest.c

 

(Does anyone get the feeling TI deliberately made direct register access unpleasant on CC3200 to "encourage" everyone to use DriverLib?)

Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...