Jump to content
43oh

tripwire

Members
  • Content Count

    193
  • Joined

  • Last visited

  • Days Won

    14

Posts posted by tripwire

  1. I remember using the DMA for 800Khz bit-banging and it was with 4 DMA transfers happening at the same time, never really tested the limits (humm maybe I should when I get some time and access to the logic analyzer)

     

    I took some measurements for CC3200 DMA in this thread. The peak speed is four cycles per transfer (whether byte, halfword or word). The first transfer takes eight cycles and the transfer following an arbitration check is seven cycles. I think Tiva will follow the same pattern, since this matches the timings indicated by the ARM documentation.

  2. If for some reason you need faster I/O access (sometimes it's useful for multiple SPIs or weird protocols) normally M cores are really good at that, I'm actually surprised it's only about 2Mhz. In a Tiva you can get half the system clock (so at 80Mhz you get a 40Mhz bit-banged I/O)

     

    That IO performance on Tiva is impressive. It looks like it's due to the use of an AHB bus for the GPIO on the Tiva rather than APB bus used on MSP432. That said, I think it can only achieve that rate when not performing RMW operations. Toggling one pin with XOR would presumably have a maximum frequency of 13.33MHz (one cycle to read, one to XOR and one to write back).

     

    I'm actually curious what would be the bit-banged speed with the DMA on the MSP432

     

    The uDMA on Tiva, CC3200 and MSP432 isn't great in terms of raw performance. It takes a lot of cycles to read the channel data structure and source data before each write to the destination. It's quicker to write using the CPU with values held in registers.

  3. Thanks for the tips, it works now, but in the meantime I installed some updates for CCS and now the speed reaches 2 MHz even without the VCORE1 command, seems strange:

    Yeah, I was surprised it worked at all without VCORE1. The chances are that the MCU would flake out and crash or reset if your program did anything too demanding at 48MHz with VCORE0.

     

    I'm now sitting at 2.19 MHz, so it's looking pretty alright. Could I go much higher than this? Thanks!

     

    You can go a lot higher by using the timer peripherals instead of bit-banging the port with the CPU. You can also output one of the clocks (MCLK or SMCLK, can't remember which) to a particular port pin. That's useful if you just need a fast square wave for some reason.

     

    There's also the option of using DMA to write to a port, but then you you have to set 8 pins at a time (ie the whole port has its eight bits overwritten).

     

    Similarly, the CPU can toggle a pin faster if you don't make it read the current state of the port. Read-modify-write operations (as they're called) are relatively slow on ARM cores because you have to load values into a CPU register to modify them, then store them back to where they came from.

     

    For instance, you could replace your loop with:

      while (1){
        P6OUT = BIT4;
        P6OUT = 0;
      }
    
    or something like this:

      unsigned char P6Bit4Off = P6OUT & ~BIT4;
      unsigned char P6Bit4On = P6Bit4Off | BIT4;
    
      while (1){
        P6OUT = P6Bit4On;
        P6OUT = P6Bit4Off;
      }
    
    In either case the CPU doesn't read the current state of the P6OUT during the loop, it just writes to P6OUT. The other pins end up as zero in the first case, or retain the value they had before the loop started in the second.
  4. I'm not familiar with the internals of Energia, but from your plain C version I can see a few possible problems. First of all you're not setting the Power Control Manager to VCORE1, which is required for MCLK frequencies above 24MHz.

     

    Second, you're probably getting hit with excess flash wait states. You only need 2 wait states for ordinary flash reads at 48MHz, but the default is 3. See this thread for more information: http://forum.43oh.com/topic/8435-msp432-sram-retention-and-flash-waitstates-check-your-settings/

  5. Ok, I took a look at the preferences.txt file for my Processing PDE install, and noticed the line "editor.laf.linux=com.sun.java.swing.plaf.gtk.GTKLookAndFeel".

     

    I'm running Windows, and normally my menus look like this:

     

    post-30355-0-86884400-1439413720_thumb.png

     

    I shut down PDE, and changed that preferences line to "editor.laf=com.sun.java.swing.plaf.gtk.GTKLookAndFeel", which gave this result:

     

    post-30355-0-30048000-1439413798_thumb.png

     

    What's annoying is that the default windows look and feel class isn't specified, so I don't know what it is. Anyway, I think it will be possible to change menu fonts by finding an appropriate look and feel or editing the current one.

     

    EDIT: Looks like it's com.sun.java.swing.plaf.windows.WindowsLookAndFeel. See https://docs.oracle.com/javase/tutorial/uiswing/lookandfeel/plaf.html for more info on Swing Look & Feel.

  6. Another possible interpretation would be that once you buy it, all upgrades are free forever. That would be awesome too. The $500 price is cheap compared to the competition.

     

    I think it might be this. Until now, when you buy a licence you also get 1 year of subscription which entitles you to free major upgrades (5->6, 6->7 etc). Minor version updates are always free. Once the subscription runs out you needed to renew to get any major updates. It sounds like TI are dropping the subscription aspect, but the initial licence still needs to be paid for.

     

    EDIT: Confirmed at http://www.ti.com/tool/CCSSUB:

     

    Previously, annual subscription was used to determine if a user would receive major upgrades to Code Composer Studio (CCS). When you purchased CCS, it came with 12 months of subscription and it could then be renewed on a yearly basis afterwards. Minor updates were provided regardless of subscription status, but upgrades from CCS v4 to v5, or to v6 required active subscriptions. This is no longer required.

  7. Which font is used in the Energia IDE menu in Windows ?

    Asking because I normally disable OS font smoothing on large, medium DPI monitors and the menu font looks particularly ugly without smoothing. Is there a way to change the IDE menu font via the preferences.txt ? If not - I can do font substitution in the registry, but I need to know which font the IDE uses for menus.

     

    As I understand it, the Energia IDE is based on the Arduino IDE, which in turn is based on the Processing IDE (aka PDE). I think the PDE is written in Java using Swing/GTK. It might be that someone has had the same issue with one of the other IDEs and has a fix that will work for Energia too.

  8. This with GPS would be cool.. or you could use the IMU acceleration and integrate it twice.

    Nice work!

     

    Thanks!

     

    Position tracking would indeed be cool, but I think I'm going to stick with distance/speed/altitude logging for this project. That's mainly to keep the power and storage requirements down.

     

    I've got a target of 128 hours for the logging duration which will cover eight days (assuming I'm stationary for at least eight hours per day).

     

    That works out as about 9 bits per second for the 4Mbit flash on the SensorTag. Currently I'm using about 36 bits per second just for altitude, but I have a plan for getting that down to about 11 and a few viable options to go further from there. I'll post more on that later...

     

    I have no idea what the battery life will be at present, or even what's feasible. I'd really like to stick with using a coin cell because that would let me use the case that's included with the SensorTag.

  9. ...One thing that maybe I misunderstood while listening to a youtube video was that the newer MSP432 has an on die DSP ?!

     

    EDIT:

    Seems this is FUD. Checked out the webpage quick and nothing there mentions a DSP.

     

    MSP432 uses a Cortex-M4F core, which includes an instruction set extension designed for signal processing tasks. It has faster MAC and SIMD instructions with support for saturation arithmetic. I think there might be some marketing at work (from ARM) in calling it DSP, but it's nice to have nonetheless.

     

    The DSP instruction set reference is here: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.100166_0001_00_en/ric1417449098079.html

     

    I suspect, but do not really know 100% this may have to do with the new MSP432 launchpads ? I was kind of curious what the real difference were between the new launchpads ( MSP432's ), and the older LM4F120 boards.

     

    I don't think the MSP432 is in the same niche as Tiva. MSP432 is lower power and has peripherals derived from MSP430 family. The peripherals are low-power, but less full-featured than those on the tiva. CPU clock speed is lower and peripherals clock speeds are even lower than that.

  10. I've been playing with the new CC2650 SensorTag over the past few weeks, and am just getting started on my first proper project using it. It uses the onboard BMP280 sensor to measure atmospheric pressure, works out the altitude and continuously logs the data to the external SPI flash.

     

    I'm trying out TI-RTOS for the first time which is interesting. It's pretty easy to throw together a proof of concept, but I'm not entirely sure I'm doing things right :)

     

    Anyway, I got it working well enough to try some real-world testing over the weekend, so I fitted the coin cell and went for a bike ride. The results were pretty good!

     

    Here's a profile of the route as taken from a mapping site (height in m):

     

    post-30355-0-34348600-1439246093_thumb.png

     

    The red bands are where the altitude drops to zero because the mapping site has no elevation data for a bridge I crossed.

     

    Here's what I got after pulling the data from my SensorTag and feeding it into a spreadsheet (again, height in m):

     

    post-30355-0-61716000-1439250895_thumb.png

     

    The results are a lot better than I was expecting. Measuring altitude from pressure can be inaccurate due to changes in temperature, pressure at sea level and humidity, amongst other things. The dashed line shows the altitude at the start of the trace, which should match up with the end since the route is circular. It's not perfect, but it's only 1.25m out over a period of 150 minutes.

     

    All the hills are recognisable, but you can see that my trace appears slanted to the right. That's because I'm taking measurements at intervals of time rather than distance. The climbs look less steep than they should because I'm travelling slower so get more data points per unit distance. Likewise the descents look a bit closer to vertical than they ought to. The top of the first 125m hill is extended because I stopped there for a while and the logger kept going.

     

    At some point I'd like to hook it up to a cycle speedometer reed switch so it can record speed and auto-start/stop when moving/stationary. I also want to compress the trace data better: this trip generated about 41kb of data at 4Hz sample rate, so the flash would be full after ~30 hours. Finally I want to try using energytrace on the sensortag, because I'm pretty sure that I don't have it set up properly for low power consumption.

     

  11. The 2MHz output signal freezes for 12us in either "1" or "0" state every 1ms.

    WDTHOLD bit does not help (in fact, any WDTCTL actions seem to be ignored).

     

    Are you sure the pauses are being caused by the watchdog?

     

    Perhaps try toggling another LED a few times before the while(1) in your loop function. That will confirm whether the code is resetting and running from the start or not.

  12. C has this cool thing that you can use a variable as the size for your arrays. Since variables can only be declared at the start of a block, you mostly use function parameters for this.

    int foo (unsigned int bar)
    {
      int baz[bar] = {0}; // only legal in C, not it C++
      ...
    }
    

     

    FYI, that was introduced in C99. It's not part of the K&R, ANSI C or C90 standards.

  13. I'm getting all these warnings complaining about integer overflows (comparing 8-bit definitions and assigning values to uint8_t)...

     

    Do you mean something like this?

     

    #define SOME_CONSTANT 123
    #define OTHER_CONSTANT 34
    
    uint8_t fun(uint8_t arg)
    {
       if(arg == SOME_CONSTANT)
       {
          arg = OTHER_CONSTANT;
       }
       return arg;
    }
    
    That can result in warnings as the integer constants are of type "int" or larger by default.

     

    The type of an integer constant is the first of the corresponding list in which its value can be represented:

     

    Unsuffixed decimal: int, long int, unsigned long int

    Unsuffixed octal or hexadecimal: int, unsigned int, long int, unsigned long int

    Suffixed by the letter u or U: unsigned int, unsigned long int

    Suffixed by the letter l or L: long int, unsigned long int

    Suffixed by both the letters u or U and l or L: unsigned long int

    You can fix that by casting the constant to the appropriate type as part of the definition:

    #define SOME_CONSTANT ((uint8_t)123)
    #define OTHER_CONSTANT ((uint8_t)34)
    
    ...or by using type suffixes if appropriate (long/unsigned types only).
  14. Strange though, performance wise, this bitbanding does not seem to help any thing, 

    this code, built around the example msp432p401_cs_03, "configuration for 48Mhz".

     

    - first pulse exactly the same time as second one

    - much slower than I expected: 330nS per pulse, that is 16 clock cycles 

    - yes, the device runs at 48Mhz, checked at P4.3 and measured around 7.5mA supply current

     

    I got caught out by this too. Looking at the TRM I got the impression that bit-banding is good for performance, but in general that's not the case.

     

    I found that a bit-band write took at least as many cycles as the equivalent (interruptible) read-modify-write instruction sequence. I think it basically just implements the RMW sequence inside the bus controller, so the CPU kicks it off with a single instruction and then waits until it finishes.

     

    Bit-banding has two clear advantages over the RMW sequence - it's non-interruptible and it occupies less space in flash memory.

     

    If you don't need those advantages it may be possible to optimise for speed by avoiding bit-banding. For example, if you're toggling a GPIO pin then maybe you don't care what value the other pins in the same PxOUT register are set to. In that case you can just write the whole register in a single instruction. Perhaps you do care what the other pins are set to, but know that they have a fixed (but unknown) value during the toggling; then you can do the read once and just modify/write on each toggle.

  15. TI recently updated the MSP432 Technical Reference Manual to "revision A". Looking at the revision history shows some particularly interesting changes...

     

    First of all, the documented default value of SYS_SRAM_BANKRET has changed from 0x000000FF to 0x00000001. That means that only bank 0 retains its contents in LPM3 and LPM4. I've seen some mention of this here already, but I'm repeating it because it's quite important to know. The program stack lives at the top of SRAM by default; from bank 7 down. If you don't enable retention on the banks containing the stack then bad things will happen when the MCU wakes from LPM3 or LPM4.

     

    The next one is less dangerous to get wrong, but kills performance if you aren't careful. The original release of the TRM said that the flash controller defaults to zero wait state mode on startup (FLCTL_BANKx_RDCTL.WAIT=0). That would make sense because the MCLK speed on startup doesn't require any wait states, and it also matches the MSP430 behaviour. Normally you don't need to even think about wait states unless you set the clock speed above a certain level specified in the datasheet.

     

    Unfortunately the updated TRM now says that the default setting is three wait states (FLCTL_BANKx_RDCTL.WAIT=3)! That's one more than the datasheet says you need for ordinary reads at the maximum MCLK frequency of 48MHz. In other words, to get best performance you need to change that setting irrespective of your chosen MCLK frequency.

     

    For code execution the effect is hidden somewhat by the 128-bit read buffer in the flash controller, meaning that the extra wait states are only applied when crossing a 128b boundary. By default this buffer is only used for instruction fetches, so code reading large blocks of constant data from flash (with the CPU or DMA) will suffer badly by comparison.

     

    Luckily neither issue causes any problems as long as you're aware and set the correct values in your program. I guess it's all part of the fun of using preview silicon ;)

  16. Could you post your code as a learning tool for everyone here who is DMA deficient ?

     

    I'm happy to post the code to show how I got the results, but be warned: it ain't pretty!

     

    I used direct register access instead of driverlib so I could keep things consistent with the MSP430 version. Also the test setup doesn't use interrupts directly related to the DMA operation. Instead it wakes up when the timer module measuring the DMA transfer captures the falling edge of its input.

     

    For those reasons it's a pretty bad example of how to write DMA code on CC3200. This post is probably a better place to look for that.

     

    Anyway, enough excuses, here it is:

     

    DMATest.c

     

    (Does anyone get the feeling TI deliberately made direct register access unpleasant on CC3200 to "encourage" everyone to use DriverLib?)

  17. I got a CC3200 launchpad and did some testing in comparison to the MSP430F5529 launchpad. The test programs configure three DMA channels to run in sequence. The first channel writes to a GPIO port setting an output high, the next does the measured transfer and the third writes the GPIO again to take the pin low. The CPU is asleep during the transfers, preventing any delay due to contention.

     

    The output pin can be traced on a scope to see how long the transfer took, and is also fed back into two timer capture inputs. This allows timing to be measured automatically and printed to the debug console after each test. The overhead of the extra transfers used to toggle the output pin is measured and cancelled out from the results.
     
    The programs test a series of DMA lengths from 0 to 32 bytes, counting the MCLK cycles taken for each. The CC3200 DMA is set up in auto-request mode, arbitrating every 16 bytes (hence the slight bump in the middle of the graph). The MSP430 DMA copies a single block, but I tweaked the results to emulate an arbitration after 16 bytes. The CC3200 is running at 80MHz (no choice there), but the F5529 is at 3MHz so there should be no wait states required.
     
    Here's what I get:
     
    post-30355-0-55102600-1427563513_thumb.png

     

    The MSP430F5529 results are as expected: two cycles to start/stop plus two cycles per transfer in the block.

     

    The CC3200 DMA timings exactly match those implied by the ARM documentation. It takes eight cycles for the initial transfer after a channel is triggered. Subsequent transfers take four cycles each, as long as no arbitration occurs. If the channel pauses for arbitration and continues uninterrupted, then the next transfer takes seven cycles.
     
    On the SRAM->GPIO test the DMA is configured for 16-bit transfers, which costs an extra four cycles for every 32 bits. The reduced transfer rate is entirely due to the fact that it's performing twice as many transfers; there are no additional wait states required by the GPIO AHB slave. SRAM to SRAM DMA with a 16 bit transfer size takes the same number of cycles.
     

    I expect the same will be true for TM4C: as I understand it, the CC3200 "apps" CPU is heavily based on Tiva.

     

    In terms of cycles taken the CC3200 DMA is marginally slower than MSP430 DMA, despite the fact it can copy twice as many bits on each transfer. As yet I don't have any results on the effect of running the CPU and DMA in parallel. In theory that should be a win for the CC3200, but it appears to spend a lot of time accessing the channel control struct in SRAM. That may leave it vulnerable to blocking by CPU instruction and data fetches.

     

    Also I need to get a comparison of these timings against an optimised memcpy on Cortex-M4. I think that might be faster than using the DMA alone. Power measurements for both cases might be interesting too...

  18. Normally I'd say it's an ARM device so I'd use CMSIS structures and the standard PERIPH_REG_FIELD_Tag macros.  But on closer look these structs aren't really standard CMSIS-style definitions.

     

    Out of interest, which manufacturer(s) actually get this right?

  19. Does anyone have any good sources of performance information for the PL230 uDMA used in Tiva and CC3200 MCUs (or any MCU using the same DMA controller)?

     

    I'm used to using MSP430 DMA which is quite clearly specified. Each transfer spends two MCLK cycles accessing the bus (which halts the CPU), plus an extra two cycles each time it's triggered. The synchronisation time required to wake up from various low power modes is also given.

     

    The Tiva and CC3200 TRMs don't give any details of this type. I'd like to know how many cycles each transfer takes in the best case, with no contention, arbitration or wait states involved. I'd also be interested to know how achieveable that is in the real world. Additionally I'd like to know whether all those cycles involve bus access. As I understand it the Tiva DMA can only access the SRAM when the CPU isn't using it, so I'm wondering how easy it is to run the DMA and CPU in parallel without the DMA being starved of SRAM access.

     

    I've looked at ARM's documentation for the PL230 and it's not much better. The only timing information appears to be in the signal timing diagrams:

     

    post-30355-0-63867000-1426632629_thumb.png

    This diagram suggests it takes 7 cycles per transfer, but the description in the document implies it's set up to arbitrate after each transfer. Perhaps it goes faster if arbitration is less frequent?

     

    post-30355-0-78024900-1426632866_thumb.png

    This one shows four cycles per transfer using waitonreq and arbitration every two transfers. That's two differences to the configuration in the previous diagram, so I don't know which change is responsible for the different timing. Also both diagrams have unexplained idle periods at the start, and the waitonreq diagram has one at the end too.

     

    Any first-hand info or recommended reading on this would be much appreciated. Thanks!

     

    EDIT: I found an interesting appnote from Silicon Labs for their Gecko line (AN0013). It gives timing for a single transfer from ADC to SRAM and explains how arbitration rate affects the flow through the DMA state machine. Still, it leaves a lot of questions unanswered. How long do about transfers to/from other peripherals and SRAM->SRAM transfers take? Also it's not clear whether the timings given are purely due to the DMA controller itself, or the specific slave devices it's communicating with (ie does this information transfer across to other MCUs using the same controller).

×
×
  • Create New...