Jump to content


  • Content Count

  • Joined

  • Last visited

  • Days Won


Posts posted by GrumpyOldPizza

  1. Shipping within the United States   FedEx Ground=$1.00 0000 0001   FedEx Saver (3-day delivery)=$2.00 0000 0010   FedEx Express Economy (2-day delivery)=$4.00 0000 0100   FedEx Overnight PM Delivery=$8.00 0000 1000   FedEx Overnight AM Delivery=$16.00 0001 0000   Shipping outside the United States   International Economy=$4.00 0000 0100   International Priority=$16.00 0001 0000


    Just saying ...

  2. This is very interesting. We were discussing the rationale for the MSP432, and this interesting video came out from Mike Szczys, the senior editor of Hackaday. He discusses the feud between the 8-bit and 32-bit camps and after watching the video, I can get an appreciation for why TI has created the MSP432.




    The feeling I get is that the MSP432 fills the gap or bridges between an 8-bit device, with their simple and easy to learn peripherals, and a 32-bit design with their complex peripherals that are intimidating and confusing to a person just starting out.


    You get the speed and benefits of a 32-bit CPU core, but easy to program peripherals that also likely keep the die size lower and the chip costs down.


    Think of it more like taking a MSP430 and making the clock speed faster, adding an FPU, and giving it a 32-bit BUS. The rest remains simple and easy to learn, or migrate to.


    Thanx  for sharing.Interesting presentation.


    Though I don't buy your analysis. The peripherals on MSP432 are not any simpler than say on STM32L4 or NXP1549.


    Actually, what's funny is that the talk brings up one of the key weaknesses of MSP432 (and the idea of simply taking crufty MSP430 peripherals). ANY other Cortex-M3/M4 controller I know has a way that I can set a group of GPIO pins on a port atomically without affecting other pins. TM4C uses some address bits to generate an update mask. STM32 has a separate SET/RESET mask, LPC chips have a set of masks that can be used (3 on LPC1549). MSP432 has only a "PxOUT". So no atomic updates  unless you get creative with bit-band addressing or use LDREX/STREX/CLREX type atomic primitives. If you are at that level, complexity would not be something that concerns you at all.


    I do get the argument, that if you come from MSP430 it's a step up. But it's a massive step down from any other ARM Cortex-M3/M4 controller other than the very first ones from Luminary Micro in 2005, like LM3S811.


    EDIT: It seems it's now possible to actually order the XMS432P401RIPZR variant for $6.781 (256k flash, 64k ram, no USB). The XMS means it's still not a production part. Don't know how that compares to the similar ST part (STM32L433RC), which is just hitting the distributors as well.

  3. The biggest flaw with standardizing on an Arduino Driver lib is precisely because it encourages people to NOT learn about the underlying hardware. As soon as they try to connect up a peripheral with a subtle difference, they get completely frustrated and complain about the lack of support for their particular sensor or application. Not everyone is like this, but it is a generalization of a broad swath of Arduino users.



    The goal of the Arduino API is precisely to be usable for the dumbest possible user so that they can get simple things going quickly and then tinker around with the more complex stuff. While it is trivially possible to do more complex stuff (like a flight controller), it is not the goal of the Arduino/Wiring mindset.


    Not learning about the underlying hardware is often what I want. There is no good reason to know about all the details of an I2C peripheral on the 23rd different ARM controller. All I really want is to transmit/receive bytes. 

  4. Maybe the driverlib should look like the Arduino API, then it would be the most efficient implementation available.  Direct support from ROM, so zero wait state and ready to go without any layers on top of what most people are doing with msp430ish chips.


    What you are suggesting is more higher level than driverLib, which IMHO is a macro wrapper layer around the device registers. 


    The problem with such an approach is that coming up with a good abstraction is very hard. There is no real good one-size-fits-all. A good example is ST's HAL (CubeL4 for example). It assumes some higher level locking that a RTOS could supply. The problem is that if you want to use part of the code in both the ISR and the Task/Threads, this will fall apart, and ... well ... you need something different. The Arduino abstraction is nice, but falls short on asynchronous operations. 


    The best I have seen up to now (but admittedly not used) is Nordic Semi's SDK for nRF5, which splits up things into a HAL and into a driver layer.

  5. Of all the components in an MCU, only the CPU itself is hidden by the compiler; changing anything else would not be portable.


    By combining an ARM CPU with the MSP430 peripherals, they allow existing MSP430 program(mer)s to use a faster CPU, without having to do much porting.

    The MSP432 is useful for somebody coming from MSP430, not from any other ARM.


    Well, there I have to play devils advocate. Isn't part of the value proposition the driverLib ? With the explicit goal of abstracting the peripherals enough so that it should not matter ?


    Also the CPU cannot be hidden that well by the compiler (but to a large extend). Things like NVIC are really, really different. So at some level there is a massive delta that one could take advantage of.

  6. Great discussion! I never used DMA much, so I wonder what the problem is with using half the DMA channels in this scenario. How often do you actually need more than 8 channels?


    Good question. For hobby projects it's probably a non-issue. But if you want to be aggressive with power savings it is an issue. Typically i'd always attach a DMA channel to any I2C RX, and any UART RX/TX, as well as SPI RX/TX (for sensor reading up to 4MHz). For my hobby use (Autonomous Rovers), that would be 5 to 7. So, still within 8 channels, but not a lot of spares for software triggered DMA. Perhaps it's just that 16 feels like a more appropriate number (32 definitively too much).


    Couple of use cases to illustrate:


    (1) UART, GPS input. I'd use a ping-pong buffer scheme on RX with a receive timeout. Each of the buffers 16 bytes. This way I get an interrupt only every 16 bytes, or when a sequence of reports is done. At 115200 baud this brings the interrupt rate from about 10000 down to less than 800.


    (2) I2C sensor reading. DMA on RX (with TI also on TX). That means I take 2 interrupts for a typical "send an index, and then read n bytes of data from a sensor". If this is 16 bytes of sensor data, then without DMA you take 19 interrupts.


    (3) TFT on SPI. Here a double buffer scheme is nice. In one buffer you generate data for the new scanline you are working on, while using DMA on the other buffer to send over data/commands for the previous scanline that had been already generate. One can nicely overlap CPU and SPI. Of course that is not beneficial for all operations.


    (4) SD on SPI. If you send more than one 512 byte block and want to use CRC16, then you can let the CPU compute this CRC16 on the next block you are about to send, while DMA takes care of sending the current block without CPU interaction ...



    So a lot of uses, at least for me, mainly centered around communication.

  7. I think that actually, it depends on what the wording " more efficient" is actually meant to mean. I've seen all kind of micro's claim that they're so efficient, it *seems* they use less power than another MCU, but it's really just a smokescreen.


    If you had a msp430g2 do one single thing, and compare it to any cortex part. On paper it can be made to *seem* that the cortex part will use less power. But I bet it wont. Which is of course is much different from comparing an M0/0+ to an M4. Technically though the M0/0+ should use less power.


    I really do not know the architectures well enough to know one way or another. However, I will say that doing something faster, and then sleeping. Is not an end all be all recipe for lower power usage. It's just one small factor in the mix.


    You are mixing up my statements. I said that a M4 is more efficient than a M0. Somebody asked about M4 vs. M0, and I tried to give an answer to that. It has nothing to do with a 430 CPU. That one is different, uses a different internal bus (not AHB/APB and crossbar), does not have nested interrupt vector and so on. It's a different class. It has substantial less CPU horsepower, but if you get away with it, it might consume less power.


    And yes, with sleeping you are right. It's one piece of the puzzle, but it's in a lot of cases the 75% piece. DMA is another one, clock gating ...


    - Thomas

  8. Sounds like there are some misconceptions as to what the MSP432 *is*. I will say however that being more power efficient does not mean that efficiency translates into using  less power. An MCU that is more efficient, but does more will very likely use more power - Always.


    Actually if you design your software stack the right way, it does mean exactly that (some caveats apply though). If you need to do some CPU work in response to an external stimulus, then a M4 will do that work faster at the same clock than a M0/M0+. Consequently you can put the CPU to sleep earlier for a longer amount of time. This longer sleep time is what saves you most of the power.  The second main trick to save power is to use DMA for IO transfers, so that you can avoid waking up the CPU from a sleep mode as much as possible (there is a nice application note available from ST, which analyses where clock frequencies are the sweet spot for a given voltage range, assuming that you have to spend a fixed amount of CPU clock cycles; their result was that alway the upper limit for the voltage range was the sweet spot ...).


    So yes the M4F core in the MSP432 make it more efficient than the MSP430 core, which means in theory it should consume less power (yes, AHB and the efficiency of the sleep modes will affect that as well). But MSP432 is missing or hampering the second cruical part, which is adding peripherals that can offload the CPU to do things like batch acquisition (or sleep-walking as Atmel calls it). 


    It's also quite telling that the current leader in ULPbench, Ambiq Micro also chose a M4 over a M0 (http://www.extremetech.com/computing/198285-new-microprocessor-claims-10x-energy-improvement-thanks-to-subthreshold-voltage-operation this article contains some of their rationale).


    Back to MSP432. With it's peripherals, one could argue that this is not the same target market as say a STM32L4 as you need fewer peripherals, less horsepower, and such. But that then raises the question, why upgrade the CPU core from MSP430 at all ? 


    - Thomas

  9. I think that was the idea.


    ?I'm a fan of ARM-M parts and I like TI parts but with only ARM-M4 on the portfolio I began to use more STM32. I would like to see a bigger range on ARM-M parts, including lower priced and lower power (ARM-M0 and such).



    The Cortex-M4 only is not a big grief for me, probably the other way around. I like the FPU over the M3. The M0 is a nightmare, as a lot of the good debugging features that M3/M4 have get deleted on M0. Even fundamental things like the DWT_CYCCNT ... As far as I understand the M4 is also more power-efficient than M0, because it can get work done faster, after which you can put it to sleep longer. The price different, I really don't know. It's my hobby, and a few bucks either way will not hurt me.


    The MSP430 peripherals, I simply cannot understand. The older Stellaris parts had been designed with HW FIFOs in place, so you could use UART/SPI without DMA. Of course TM4C now has a 32 channel DMA controller, where you can background a lot of the IO handling. And along comes MSP432, which does away with most of the usable HW FIFOs, and just so that the software cannot compensate, reduced the number of DMA channels to 8. If you have one UART (RX DMA), one SPI (RX/TX DMA) and one I2C (RX DMA), and half of your DMA channels are gone ...


    Just looking at the feature set (not the crummy HW implementation), STM32L0 & STM32L4 seem to be the better choice. Not that I want promote somebody else's product here, it's just that I don't understand where MSP432 fits there (besides the fact that you still cannot buy the chips).


    So in a nutshell, please TI, focus on TM4C, bring some new parts and launchpads, perhaps a Cortex-M0+ if that saves costs. 


    - Thomas


    - Thomas

  10. Here my 2 cents (after having used a bunch of the parts going back to the Luminary Micro days).


    - get rid of the MSP432. It's a mess compared to other Cortex-M4 parts (yes, even low power ones). If you want a good low power part, please use the same peripherals as on TM4C, so that code can be reused.


    - a TM4C129 Launchpad with the dimensions of the TM4C123 launchpad


    - add CMSIS-DAP to the ARM Cortex based launchpads. Nobody really wants to see yet another vendor specific protocol like the LMICDI (not that gdb remote serial was not a nice idea, it's just it did not pan out).


    Not sure what else to say. I fundamentally think TI took the wrong turn with MSP432, which it seems to have left to die a lonely death. Perhaps folks coming from MSP430 see that different, but it's a massive turnoff for everybody coming from more grown up microcontroller.



    - Thomas

  11. Just my 2 cents here.


    MSP432 does not seem to go anywhere, because there is STM32L476. Both devices are fairly similar in power consumption and power saving modes. But MSP432 inherited it's peripherals from MSP430, which just feel very outdated compared to STM32L476 (actually also a massive step back from TM4C). And there is just starts. The TI product goes only up to 48MHz wth 256kB flash, the ST product can go up to 80MHz with 1024kB flash. The ST product has dedicated SAI support was well as a PDM decoder for MEMS microphones ... Ah, and the ST product has a CODE/DATA cache (mislabled IMHO as ART Accelerator ;-)). The only thing that I can see the TI product having going for it is the 14bit SAR ADC. 


    I simply suspect that either TI went back to the drawing board, or gave up.


    Ah, and there is this issue. A TM4C123GH6PM costs $6.23 in units of 1000. A similar STM32F401 is to be had for $3.54. That implies that the TI product is probably price wise nowhere near the competition.

  12. Thanx for posting this. Lot's of inspiration.


    Couple of questions. The ESCs and the protocol used via PWM does not seem to be the normal RC servo pulse protocol. Is there any documentation for the modified variant ?


    Did you run accross some simpler code for handling a HC-06 on Android (haven't programmed too much in Java, so a simple starter would be nice) ?


    - Thomas

    Hi everyone,


    Just wanted to share my flight controller I wrote some time ago :)


    Here is a video of it:



    The code is available here: https://github.com/Lauszus/LaunchPadFlightController.


    You can read more about it at my blog: http://blog.tkjelectronics.dk/2015/01/launchpad-flight-controller/.




    Kristian Sloth Lauszus

  13. The LCD ui looks really cool. Good on the CMSIS too.


    Yes, the LCD UI was tricky for more than one reason, but also the most rewarding. Last year we had a screwup because the display did not convey enough information. So this year I asked the two race engineers what *should* be displayed, why, how ... So we went throu a whole list of scenarios of how you could diagnose hardware malfunctioning, or software issues (like how do I know the RPM sensor is working ?) It came down to the level of "I cannot see yellow in bright sunlight". 


    Given that the HW setup was as simple as I could get it, there were a lot of ideas my kiddos could contribute (like placement of components, wireing, io port assignment). While that sounds perhaps simple for us adults (and perhaps us engineers), it's something else for a 10 year old and a 12 year old. 


    - Thomas

  14. Haven't posted in a while ...


    So there was AVC 2015. Less successfull for us (with 2 rovers this year). I was relegated to be the SW guy, while my kids actually build and run the rovers. 


    Anyway, I thought it might be interesting to post a link to the source code that was used (which of course is utterly outdated, probably ;-)).




    There are a couple of interesting pieces that might be of use outside the autonomous rover domain.


    First off the concept (besides being as cheap as possible) was to take an R/C car, a TI Launchpad, RobG's TFT/SD boosterpack, hook up a GPS, a MPU-9150, a RPM sensor, a 3 channel R/C reveiver, and 2 buttons. The TFT is displaying all status information, which is pretty handy before starting the rover, so one can see whether the GPS is actually working ;-)


    Here some of the pieces that might be of interest:


    - the MPU-9150 is samples at 1kHz triggered by the INT output and properly timestamped relative to a wallclock; the builtin AK8975 is sampled at 100Hz; i2c is interrupt driven


    - the GPS code supports NMEA as well as UBLOX binary; support for GPS+GLNONASS is there; MTK3333, MTK3339, UBLOX6/7/8 are supported; full initializitation at runtime so that this can be used without backup battery or external flash that would store the configuration; ah, and there is proper timestamping via the PPS input


    - of course there is full speed logging to a microSDHC card ... this time DMA driven to free up more processor cycles


    - stack checking via the MPU; handy to detect stack overflows (yes, saved my backon ;-))


    - there is a profiling system in place that buckets cycles spend on various logical tasks (like display, record, navigation ....); very handy to find out how much processor power is still left


    - lots of interesting code; stared to play with atomics and bitband accesses


    - the whole system is bare metal, CMSIS based; so if one looks for a CMSIS setup for TM4C123, that might be a good starting point


    - no RTOS in use; things are either interrupt driven, or timer driver (systick callbacks), or via the PendSV exception as kind of a deferred interrupt; did this, so I could half way explain the system to my son who was running one of the rovers ;-))


    Here a few pictures, and a link to my son's entry








    - Thomas




  15. Thomas,


    Correct.    Only one interrupt request can be mapped to each channel.  But you could arrange the channels as you said in a group, linearly, so that all of the

    I2C related interrupts in your example can easily be masked/unmasked in one operation.


    The problem with that approach in general is that you cannot re-enable interrupts early. You have to either write a generic ISR shell that masks (and unmasks on return) the proper channels. This is 3 DWORDs you have to write. You also have to read based upon the channel number the mask you want to apply. 


    Or you could generate proper ISR shell for each channel and use hardcoded values to mask & unmask.


    So you are essentially introducing a long latency before you get to the very first useful instruction of your ISR.


    But it's actually worse. Suppose you want to allow a ISR handler to enable/disable channels ? Then a hardcoded unmask will not work. You actually need to keep a softcopy of what is supposed to be enabled and restore that anded with the bits that you want to re-enable.



    All I am saying is that it's a big pain, compared to NVIC or GIC. I always felt that exactly this, the better interrupt handling was a major plus for Cortex-R/Cortex-M over Cortex-A.


    - Thomas

  16. Hi,


    Full disclosure - I work on Hercules at TI and worked on the new launchpads - thrilled to see the first post not from a TI press release.


    On VIM - you can actually map the interrupt requests to channels arbitrarily through the CHANMAP fields.  This is even avaiable through HalCoGen.


    Doesn't give you automatic priority based masking of lower priority interrupts for nesting purposes, but it does let you group as you like and by doing this it should be easier to implement the nesting code as the bit masks can be straight-forward.     VIM is used instead of GIC because we try hard to keep compatibility all the way back to the TMS470 family which was introduced in the late 1990s although it was really not a catalog product.   



    Anthony, mind checking the TRM for me there ? I might have misunderstood them VIM documentation ... but the CHANCTRL[0:23] registers imply that you can map exactly one interrupt per channel. You cannot have multiple interrupt sources for one single channel. So you cannot really group, you just can establish an linear priority sequence.


    - Thomas 

  17. Thanks for info.


    So VADD/VMUL take 1 cycle, but you have to load/store the operands... So unless these instructions can read operands and write results into SRAM without needing extra cycles, each operation will take more than 1 cycle (maybe 3 or 4, right?).


    In the past I have worked with some TI DSPs (like C55XX ones) that had a MAC instruction that could read two operands from RAM, multiply-accumulate them (and shift result, and do stuff with operand pointers) and write back the result to internal RAM in 1 cycle(*). I suppose this is not the case here




    (*): Not really one cycle, it would take 5 cycles, but because of the pipeline you can count it as a 1 cycle instruction unless you are calculating latencies.


    This is all tricky, and to be honest there are somethings I have not understood (especially how VFMA is implemented with 3 cycles + 1 cycle latency as opposed to 1 cycle + 3 cycles latency).


    VFMA is a MAC, but it takes 3 cycles. Why ? If VMUL and VDD take 1 cycle each, what do the various MAC variants help ?


    Anyway lets' say you do matrix operations (which are typically MAC operations). Let's say you multiply a 4 element vector by a 4x4 matrix. Then you have 20+2 loads, 4+1 stores, 16 multiplications, and 12 additions. This is 55 operations, hence 55 cycles, whereby you crammed in 28 floating point operations. Thus about 0.50 Mflops/MHz. (The +2 & +1 is the overhead of the VLDM/VSTM where no data gets transferred).


    The data is from the Cortex-M4 TRM, section 7.2. It also points out a latency of 1 cycle.


    Back to the example above. Say you want to multiply an array of 4 element vectors by a 4x4 matrix, and the matrix is preloaded, then you'd spend 4+1 loads, 4+1 stores, 16 multiplies and 12 adds. 38 cycles to do 28 cycles math, or 0.74 Mflops/MHz.

  18. @@BizzaBoy


    Yes, the two cores are 'lockstep', as in both run the same code, one a clock or two behind the other, and both are passed the same values from ISR and such. This is the nature of 'safety' things, where things are double-checked, and any error throws a fault.


    I'd argue that any error or event that warrants processor attention needs addressed regardless how many clock cycles are involved; that is the nature of safety, no?


    Tedious as it may be, the architecture forces you to consider, and deal with, most every possible fault / failure.


    I'd also consider the differences between ARMv4, which you seem to be familiar with, and ARMv7, which is what the R4F cores are more closely related to.


    I am rather familar with ARMv7-AR (as opposed to ARMv6-M and ARMv7-M and ARM4vt [what a mess ;-)]). I would argue that the code you'd need to implement priority groups in software is rather identical for all non-M profiles.


    Just looking at other Cortex-R4/R5/R7 implementations, like the Spansion FCR4, the all have interrupt controllers that have maskable priorities.


    Again, I am just negatively surprised.


    Here is what I usually do, where I'd need nice grouping. Say I have a bunch of I2C devices and some of them use time triggered reads, and some signal DRDY via a separate interrupt line to the MCU. If I put all of the interrupt sources for the I2C devices at the same priority level, then I can guarantee exclusive I2C access (say to my I2C transaction queue). If the interrupts are on different levels, I need to make sure thou other means that I2C transactions get queued properly, as a higher priority ISR could corrupt the transaction queue ... While it is not that outlandish tricky to get that done, the interrupt priority solution is easier and safer from a software point of view. 


    - Thomas

  19. Because these devices are meant to be "fail safe", so a generic interrupt controller wouldn't satisfy that, maybe?


    You could write your own interrupt control routines which could, within themselves, mask and / or prioritize things to your liking.


    The 2 cores are in lockstep. I'd assume that the interrupts would get broadcasted properly.


    Doing this by hand as you suggested adds a huge overhead to the ISR shell. I did that ages ago for a ARMV4t device with a VIC, and there the cost was about 20 clock cycles on entry and an exit. That device had only 32 IRQs, hence less registers to touch.

  20. I am tempted to use the LAUNCHXL2-RM46 for some rover tinkering. It had a lot of FLASH/RAM, and double precision floating point. But it has this utterly useless VIM as interrupt controller. Why couldn't they just use a GIC with interupt priority levels and priority masking ...

  21. Good question.


    VADD/VMUL take 1 clock cycle each (to issue). Then there is VFMA (fused multiply add) which takes 3 clock cycles, which implies a latency of 1 clock for the multiply.


    So I'd say you have 1 MFLOP/MHz, assuming perfect code. That however does not take into consideration that you have to load the operands and then store the results again.


    - Thomas

  22. @@BizzaBoy

    I read your post before (when I was googling HC-SR04). I do not understand part related to WTIMER3AB.


    And I wanted my system even simpler:

    Single sensor. Trigger pulse once per second. Interrupt on both edges. Calculate distance based on timer readouts.


    Good questions. The setup is a tad more complicated than what you may need.


    The code is using TIMER2 to provide a 20Hz lm4f120_timer2a_interrupt() interrupt. There are 2 HC-SR04 in my system. One has it's trigger on PE2 and one on PE3. The trigger fires on a falling edge. So this code interleaves the 2 HC-SR04, whereby each HC-SR04 gets a 50ms timeslot.


    Then WTIMER3AB is setup up to be capture/compare on both edges. This is via PD2/PD3.  


    So I got a trigger for my left HC-SR04 on PE2, and the return pulse on PD2. The right HC-SR04 is triggered via PE3 and the return pulse is on PD3.


    Now there are 2 interrupt handlers for WTIMER3 (WTIMER3A on PD2, WTIMER3B on PD3), one for the left HC-SR04 and one for the right HC-SR04. Let's look at  lm4f120_wtimer3a_interrupt(), which deals with the left HC-SR04. It first checkd PE2. If that is 0, then there was a trigger pulse active. If it's one, the pulse seen on PD2 was random noise. Now PD2 is checked. If that is 1, then there was a 0 -> 1 edge, which means that start of the return pulse. If it is 0, it's a 1 -> 0 edge, hence the end of the return pulse. So on a start of the pulse, the 32bit timer values is stored away in lm4f120_sonar_left_time, and the end of the pulse the delta time is calculated. I am using 32 bit timers, because the maximum return pulse is 38ms and since the timers are driven by the core clock (80MHz), the capture/compare should not overlow the timer, which means it needs about 22 bits. Seemed easier to do that with a 32bit timer than with a 16 bit timer and messing with the prescaler. 


    Anyway, at the end you have lm4f120_sonar_left_pulse which is either 0 if there was no echo, or the number of clock ticks the pulse was in core clock units. 


    In my real system, this was used in a tad more complex way. First off, the interrupt handler would send the data to a task that would process it. It would actually look at the start and end time for the pulse. With a 38ms maximum echo, the rover I used this for would travel about 20cm. Thus the time the sound travels does not represent to true distance at the point in time when the sound pulse was triggered. But I guess that is somewhat esoteric ;-)


    For the simpler case you are after you could use half of the logic, PD2/PE2, use only WTIMER3A to count, and setup the TIMER2 interrupt to be 2Hz, and toggle only PE2. 

  • Create New...