Jump to content
L.R.A

WS2812B Matrix

Recommended Posts

Hi everyone,


 


So i've been trying to control ws2812/ws2812B led strips with my Tiva launchpad, the tm4c1294xl. First i will explain what i've been doing. Later when i have a clean code for you all to read i will post it. I use only the WS2812B led strip.


 


 


I wanted to make a big RGB matrix so i wanted alot of outputs with the least processor usage, taking advantage of the ARM peripherals.


First i tried using the SSI module, it worked but it could be better, plus it used alot of RAM. Here it is working:



 


Then i saw alot of controlers using DMA transfers, some change PWM duty values and others just changed the state of GPIO. I went for the second aproach of sending data to the GPIO.


It's the same method as the teensy uses. The idea is to send 3 values per bit. A 0xFF, data values, 0x00. This should explain better:


https://www.pjrc.com/teensy/td_libs_OctoWS2811.html


 


Well this uses 2 timers interrupt and a GPIO interrupt. Well the guys at TI E2E teached me that the TIVA PWM module has inverting capabilities with 2 comparators. So what do i do? Well i just use 1 PWM output and 1 GPIO interrupt. The PWM inverts the PWM state (HIGH or LOW) at 0.4uS, 0.8uS and 1,25uS (end of PWM period). The GPIO triggers the DMA for both edges so it always sends the 3 values needed at the right timing.


 


With this i can control 8 outputs for the WS2812B. But wait! The tm4c1294 has 15 GPIOs! Unfortunaly just 4 of them have the 8 pins available in the breakout. So i use the same PWM signal and 3 more GPIO pins for interrupt. With this i control 32 outputs using only 4 GPIO interrupts and 1 PWM module output. So if you use 512 LEDs per output like he teensy 3.1 then you have control over 16384 WS2812B. 


 


 


Well, now problems:


This method uses 1 byte values, since it sends the 8 bits for the GPIO pins right? But i need 3 values per each brightness bit (0xFF, 0xXX, 0x00) so i require 24*3 bytes to control 1 WS2812B per 8 output (so total of 8 WS2812B are being controled). This method uses alot of RAM.


Second problem, the Tiva DMA can only transfer 1024 itens per transfer set. So that means it can only control 14 WS2812B before the processor needs to set the transfer again. Since this takes alot of time (relative to the timing of the ws2812b), i am going to implement DMA ping-pong mode to solve this.(alredy solved)


 


 


TODO:


Do the code to receive new data and update, possibly from UART or USB.


Optimize the control with Scatter-Gather, this would solve both problems i have with the control but it's realy complex and there isn't much information about Scatter-Gather.


 


 


 


Hope it wasn't too boring to read the explanation 


Share this post


Link to post
Share on other sites

Continuation of discussion from 43oh http://forum.43oh.com/topic/5809-ws2812-matrix/ (about Tiva, so more germane here)
 

However, I would use Flash instead of RAM to store 2048 0xFF and 0x00s.

 
Unfortunately, I don't think you can use the uDMA from Flash.
 
Page 540 lm4f120h5qr Data sheet
9.2 Functional Description
"The μDMA controller can transfer data to and from the on-chip SRAM. However, because the Flash memory and ROM are located on a separate internal bus, it is not possible to transfer data from the Flash memory or ROM with the μDMA controller."

 

As far as using a separate DMA channel to send the 0xFF and 0x00's 

 

It does not seem necessary to synchronize the feeding of the 0xFF/0x00 channel with the feeding of the main data channel.  All you have to do is keep it spewing out alternating bytes in time.  So set it up as a ping-pong (both data sets coming from the same buffer of alternating FFs and 00s).  When it interrupts, just reset the data to the same buffer.

 

As was pointed out, limited to at most 1024 items, so would be at most 512 0xFF and 512 0x00's.  - 1024 bytes total.

If 1024 bytes is still too much overhead - use less (just a tradeoff of memory vs. processor time - the shorter the buffer the more interrupts have to service).  

 

If one is running several ports, then all the channels handling 0xFF/0x00 can use the same buffer.  Might want to feed some of them different amounts of data for just the very first cycle so that the ping-pong interrupts are not all bunched up.  e.g. if give one of them only 100 bytes to start with, and one of them 200, etc.  Then the interrupts will be spread out, rather tan all of them needing service at same time.  On subsequent refreshes they will each get the same amount of data, so they will hopefully remain somewhat spread out.  Could do same with the pixel data (so pixel data pumps are not as likely to compete for processor, either with each-other, or with the 0xFF/0x00 pumps).

 

The more DMA channels you have going, the more likely they are to compete for memory access, so using just 2 DMA channels per port, rather than 3 might help reduce timing glitches?  (I don't know how tight the timing requirements are on the LEDs.)

 

The data handling DMA can still churn out 1024 data items before have to reset it, as long as it can keep up a continuous flow of bytes, it should not matter where it is in its' count relative to the other DMA.  Could do ping-pong on pixel data DMA as well, or else may have to stop the clock for the DMA channel that is sending the 0xFF and 0x00's (otherwise might not have enough time to just update the pointers on the fly.)  If need them to expire together - keep the 0xFF/0x00 channel byte count a divisor of the pixel data channel byte count, and stop reloading the 0xFF/0x00 channel after right number of refreshes.

Share this post


Link to post
Share on other sites

Right, I was under the impression that the memory limitation was only for transfers between memory locations. 

So, you can use 3 uDMAs and use 2 to send 0xFF and 0x00 without increasing src address, or use scatter-gather and have small array in RAM with 0xFF & 0x00 and just send the same block multiple times.

Share this post


Link to post
Share on other sites

@@igor The timing doesn't need a realy big precision, it can have an error of 150nS. Timing gliches i rarely have them, i did some test with all the 4 GPIO (so it's 32 outputs) being fed from 1 buffer.

 

i don't realy know what you mean with : "It does not seem necessary to synchronize the feeding of the 0xFF/0x00 channel with the feeding of the main data channel."

Btw what i meant by overhead is CPU ocupation. Sorry if i mixed-up the names. I want the CPU free the most i can.

 

I'm still considering the way i'm going with this but, besides scater-gather, i'm betwen using 3 DMA chanels so i just use 2 bytes for both 0xFF and 0x00 or use 2 DMA chanels and use 512 byte for 0xFF and 512 bytes for 0x00. Of course the last i can reduce the number of bytes but i would mean the CPU would interrupt more times

Share this post


Link to post
Share on other sites

Yes the use of scater gather in this case would just be worth it with alot of LEDs. like 2000.

 

I belive the scatter-gather interrupts at the end. But seriosly that is still like magic to me.

Share this post


Link to post
Share on other sites

This is how you can use scatter-gather to send 1024 0xFFs and 1024 0x00s. This code uses 256 bytes of RAM for the table and 128 bytes for the task list.

Just add another timer and uDMA to handle data byte and you are all set.

(Timer is running very slow to make changes visible.)


#include "inc/tm4c123gh6pge.h"
#include <stdint.h>
#include <stdbool.h>
#include "inc/hw_types.h"
#include "inc/hw_memmap.h"
#include "inc/hw_gpio.h"
#include "inc/hw_udma.h"
#include "driverlib/sysctl.h"
#include "driverlib/pin_map.h"
#include "driverlib/rom.h"
#include "driverlib/rom_map.h"
#include "driverlib/gpio.h"
#include "driverlib/udma.h"
#include "driverlib/timer.h"
#include "driverlib/interrupt.h"

void GPIOInit();
void uDMAInit();
void TimerInit();


#pragma DATA_ALIGN(ui8ControlTable, 1024)
uint8_t ui8ControlTable[1024];

#define PORTF_OUT (void *) (GPIO_PORTF_BASE + 0x3FC)

uint8_t startStop[256] = { 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF,
		0x00, 0xFF, 0x00, 0xFF, 0x00, 0xFF, 0x00 };

tDMAControlTable uDMATaskTable[] = {

uDMATaskStructEntry(256, UDMA_SIZE_8, UDMA_SRC_INC_8, &startStop, UDMA_DST_INC_NONE, PORTF_OUT, UDMA_ARB_1, UDMA_MODE_PER_SCATTER_GATHER),
uDMATaskStructEntry(256, UDMA_SIZE_8, UDMA_SRC_INC_8, &startStop, UDMA_DST_INC_NONE, PORTF_OUT, UDMA_ARB_1, UDMA_MODE_PER_SCATTER_GATHER),
uDMATaskStructEntry(256, UDMA_SIZE_8, UDMA_SRC_INC_8, &startStop, UDMA_DST_INC_NONE, PORTF_OUT, UDMA_ARB_1, UDMA_MODE_PER_SCATTER_GATHER),
uDMATaskStructEntry(256, UDMA_SIZE_8, UDMA_SRC_INC_8, &startStop, UDMA_DST_INC_NONE, PORTF_OUT, UDMA_ARB_1, UDMA_MODE_PER_SCATTER_GATHER),
uDMATaskStructEntry(256, UDMA_SIZE_8, UDMA_SRC_INC_8, &startStop, UDMA_DST_INC_NONE, PORTF_OUT, UDMA_ARB_1, UDMA_MODE_PER_SCATTER_GATHER),
uDMATaskStructEntry(256, UDMA_SIZE_8, UDMA_SRC_INC_8, &startStop, UDMA_DST_INC_NONE, PORTF_OUT, UDMA_ARB_1, UDMA_MODE_PER_SCATTER_GATHER),
uDMATaskStructEntry(256, UDMA_SIZE_8, UDMA_SRC_INC_8, &startStop, UDMA_DST_INC_NONE, PORTF_OUT, UDMA_ARB_1, UDMA_MODE_PER_SCATTER_GATHER),
uDMATaskStructEntry(256, UDMA_SIZE_8, UDMA_SRC_INC_8, &startStop, UDMA_DST_INC_NONE, PORTF_OUT, UDMA_ARB_1, UDMA_MODE_BASIC)

};

int main(void) {

	// clock setup
	// run @80MHz, use 16MHz xtal
	MAP_SysCtlClockSet(
			SYSCTL_SYSDIV_2_5 | SYSCTL_USE_PLL | SYSCTL_OSC_MAIN
					| SYSCTL_XTAL_16MHZ);

	GPIOInit();
	uDMAInit();
	TimerInit();

	MAP_TimerEnable(TIMER0_BASE, TIMER_A);

	while (1) {

		if (!MAP_uDMAChannelIsEnabled(UDMA_CHANNEL_TMR0A)) {
			MAP_uDMAChannelScatterGatherSet(UDMA_CHANNEL_TMR0A | UDMA_PRI_SELECT, 8, uDMATaskTable, 0);
			MAP_uDMAChannelEnable(UDMA_CHANNEL_TMR0A);
		}
	}
}

void GPIOInit() {
	MAP_SysCtlPeripheralEnable(SYSCTL_PERIPH_GPIOF);
	MAP_GPIOPinTypeGPIOOutput(GPIO_PORTF_BASE, 0x000000FF);
	MAP_GPIOPadConfigSet(GPIO_PORTF_BASE, 0x000000FF, GPIO_STRENGTH_8MA, GPIO_PIN_TYPE_STD);
}

void uDMAInit() {
	MAP_SysCtlPeripheralEnable(SYSCTL_PERIPH_UDMA);
	MAP_uDMAEnable();
	MAP_uDMAControlBaseSet(ui8ControlTable);
}

void TimerInit() {
	MAP_SysCtlPeripheralEnable(SYSCTL_PERIPH_TIMER0);
	MAP_TimerConfigure(TIMER0_BASE, TIMER_CFG_PERIODIC);
	MAP_TimerLoadSet(TIMER0_BASE, TIMER_A, 80000000);
	MAP_TimerMatchSet(TIMER0_BASE, TIMER_A, 20000000);
	MAP_TimerDMAEventSet(TIMER0_BASE, TIMER_DMA_TIMEOUT_A | TIMER_DMA_CAPMATCH_A);
}

Share this post


Link to post
Share on other sites

ooooo...i think i saw what i did rong. i'll later study this better next week. RobG to the rescue once again :P

 

Btw i have it right now in ping-pong mode with 32 outputs. Still using 3x RAM. For pratical purposes that will probably not be a problem since i can't realy make a matrix that big. I can stop worrying about it for the project but i can improve it for the chalenge 

 

1 more video of me having a bit of fun:

Share this post


Link to post
Share on other sites

Well, here is a small test to show that the brightness control is actualy working. I had to give back the strip i had so now it's just with the 8 LEDs, hoping tomorrow i'll have 1 strip of 60 LEDs with this patern:

Share this post


Link to post
Share on other sites

@@RobG i still haven't got time to get into your scater gather example but may i sugest you add it to the code vault? i realy think there should be more examples of the DMA.

 

Also i'm thinking of getting together a folder with various examples in IAR from peripherals and then share it here. I'm probably going to add stuff like that code. Is it ok to add your examples?

I realy think there should be that kind of examples saved together.It would help alot beginers.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×