Should be a fun post
I’ve been playing around with WS2812b LEDs lately, and have been using my trusty TI Tiva C Series and Stellaris Launchpads as a controller for them. The first thing I did with these was to use the SPI peripheral to drive the data lines on the LED strip. This left me with sample code that will keep track of a 720 byte array of SPI data to send on the bus, an index into that array, and an interrupt handler that will get called whenever the SPI output buffer has room for more data. The problem with this approach is that the SPI buffer is pretty shallow (8 bytes), which means a non-trivial amount of processor time is spent in my interrupt handler just grabbing data from a static memory location and transferring it into the SPI output register. This presents us with an ideal opportunity to talk about the micro direct memory access, or uDMA, engine. If you don’t care about my attempts at explaining how DMA works on these micro controller and just want to reference my WS2812b over SPI using uDMA library, feel free to ignore everything past this and head over to my github.
First, some basics: what is a DMA engine? In its simplest form, a DMA engine is a peripheral that has access to the address bus and data bus in a chip, and the ability to initiate memory transfers. It has a register interface just like any other peripheral that can be used by the processor to tell the DMA engine where to read from, where to write to, and how to set up the transfer.
In my example, I can set up the DMA engine to start reading data from a known memory location (where my output data array lives) and write it to a known memory location (the SPI data out register) whenever certain conditions occur (the SPI transmit buffer is empty). This means that my processor is free to interface with other peripherals, crunch numbers, or even enter a sleep state while the DMA engine handles the trivial business of copying all the data.
So now let’s look at how to set up the DMA engine itself. Fortunately, we have an incredibly well designed software interface (driverlib) that, if past experience is any indication, will be so intuitive that we can get by entirely by looking at nothing more than sample code!
uDMAChannelControlSet(UDMA_CHANNEL_SSI1TX | UDMA_PRI_SELECT, UDMA_SIZE_8 | UDMA_SRC_INC_8 | UDMA_DST_INC_NONE | UDMA_ARB_4); uDMAChannelTransferSet(UDMA_CHANNEL_SSI1TX | UDMA_PRI_SELECT, UDMA_MODE_BASIC, pui8SPIData, (void *)(SSI1_BASE + SSI_O_DR), ui16DataSize);
Hrm. Maybe not. The downside to uDMA is that, while it is very flexible and powerful, this unfortunately leads to complexity. The API is pretty straightforward once you read up on the documentation, so lets start digging in!
The first argument to all of these functions is the DMA channel number. It is pretty common to want to use DMA for multiple peripherals at the same time, especially in a system where you’re trying to crank out as much work from the processor as possible. Just like the processor, the DMA engine can only transfer one chunk of data at a time, but having multiple DMA channels allows the engine to have a way to keep track of multiple transactions to multiple peripherals, called DMA channel arbitration. In the Stellaris and Tiva C microcontrollers, the DMA engine has 32 separate channels. Each of these channels has four unique hardware peripherals it can be used to interface with, or can be used to do a software DMA transfer (moving data from one place in memory to another, as opposed to moving between memory and a peripheral). The datasheet for the microcontroller found on the launchpad contains a table of all the possible configurations for all the possible DMA channels (Table 9-1 in the TM4C123GH6PM datasheet)
Once we’ve determined which DMA channel we want to use and what peripheral we want to use that channel for (represented by the UDMA_CHANNEL_* and UDMA_*_SELECT macros), we need to set up how the DMA transaction is going to work. This is accomplished by calling the uDMAChannelControlSet function. The first argument to this function is the channel we want to configure or’d with the hardware peripheral we want to tie that channel to. The second argument is a binary or of the data size, source increment amount, destination increment amount, and arbitration size. Data size is pretty simple; it refers to how much data we’re going to move on each transfer. This is normally dictated by the peripheral you’re trying to interact with. The SPI data out register is 8 bits wide, so we use the UDMA_SIZE_8 macro. The source increment amount tells us how much, if at all, we should increment the source pointer by on each transaction. Since we’re reading from an 8 bit array, we use the UDMA_SRC_INC_8 macro to cause the DMA engine to increment the address it’s reading from by 8 bits each time it performs an 8 bit transfer. The destination increment amount tells the DMA engine how much to increment the write address by on each transfer. We’re writing into the SPI data out register, and want each DMA transfer to write into that register, so we use the UDMA_DST_INC_NONE macro to keep the DMA engine from incrementing the destination address. Finally, the arbitration size. This one is kind of tricky: it tells the DMA engine how many transfers it should execute before before performing bus arbitration. The SPI peripheral has a transmit FIFO that is 8 entries deep, so it makes sense to use an arbitration size of 8 for our scenario. If we were using multiple DMA channels in parallel, this would cause the DMA engine to write 8 bytes of data from our source array into the SPI TX FIFO, then check to see if any other channels had data ready to transmit. That way, the DMA engine won’t be wasting time waiting for the SPI TX FIFO to drain when it could be using that time to transfer data for other DMA channels. This can be tricky though, as it allows for the possibility of a low speed DMA transfer with a large transfer size causing our SPI taking longer to complete than it would take our SPI TX FIFO to drain, which in our example would cause the WS2812b LEDs to see a sustained 0 until the slower DMA transfer completed, which would be interpreted as an end of frame! If we were using this in a system where we were worried about such an event, we could set the arbitration size to be the length of our entire SPI transmit array, which would guarantee our DMA transfer wouldn’t get interrupt (at the cost of latency to every other DMA channel in the system).
The last function called, uDMAChannelTransferSet, is used to set up more details about the DMA transfer. Again, the first argument is the channel we want to configure or’d with the hardware peripheral we want to tie that channel to. The second argument is the DMA mode you want to use. This article just covers the basic DMA mode, which is just a straight transfer to or from single, static memory locations or registers. The third argument is the source address. For our example, this is going to be the array we’re wanting to transmit. The fourth argument is the destination address, which for us is the SSI transmit register. The final argument is the number of bytes that should be transferred before the DMA engine considers the transaction complete. For us, this is the size of the array we’re transmitting.
Setting up the initial DMA transfer is the most difficult part of this. Once those two functions are called, all you have to do is enable the DMA channel (uDMAChannelEnable(UDMA_CHANNEL_SSI1TX)), and the DMA engine will start moving data. An interesting gotcha to watch out for with this is that when the DMA transfer is complete, the interrupt handler for the peripheral assigned to the DMA channel will be called, as opposed to the DMA interrupt (assuming you have interrupts enabled). So in our case, once the SPI data array has been fully transmitted on the SPI peripheral by the DMA engine, the SPI interrupt will be called. From here, I can either set up the interrupt handler to inform the main code that another frame is ready to be transmitted, or I can just set up the DMA transfer to start again, which would cause any changes to the SPI data array to propagate to the LED strip without having to use any precious processor cycles.
As always, the source code for this example can be found on my github. Currently, I only have a simple version of the library running, which is hard coded to use the SSI1 peripheral, DMA channel 25, and pin PF1 for the SPI output. I’m currently working on splitting this into a generic version of the library, which will allow for running multiple SSI peripherals in parallel, customization of choice in TX pin, and the option to specify a callback function used every time a frame is done being sent to the LEDs.
I’ve got a pretty cool new library just about ready to publish on github. I’ve been playing around with WS2812b LEDs lately, and have been using my trusty TI Tiva C Series and Stellarisware launchpads.
The first thing I did with these was to hijack the SPI peripheral to drive the data lines on the LED strip. I thought myself rather clever for realizing that the proprietary one wire protocol the 2812s use could be functionally implemented using the SSI protocol given the right combination of bit packing and frequency setting, but a bit of googling showed me that this is pretty normal nowadays, especially on the Arduino platform. Ah well… If you haven’t seen it, it’s novel to you? Still feel accomplished for that, but nothing groundbreaking enough to post about (hence why I haven’t mentioned it here, despite having it working since last April).
What to do next then? Well, my library for the 2812s relied on waiting for the SSI interrupt to signal another byte was ready for the bus, filling the byte, and waiting for the next TX done signal to come in. It was ok for parallel processing in the sense that it at least didn’t just sit in a spinloop while the data was transferring, but I still felt my solution was rather inefficient. Enter the uDMA engine. My sample code is now to the point that the uDMA engine is set up to be constantly running on the transmit array. This means you can update the output color array at any point in software, and the uDMA engine will cause the change to “automatically” propagate onto the LED strip, with an extremely minimal amount of software overhead. The only processor overhead for this method is an interrupt subroutine executing each time the LED strip is ready to be refreshed, which consists of about eight lines of code.
My next step is going to be integrating my uDMA sample code into a proper uDMA library. I really like the writeup ADAfruit has on interrupts, and I think it would be cool to do a similar writeup on what DMA is and how it works. So step one will be getting a proper uDMA based library up on github. Step two, use that library as the basis for a post on what DMA is, how it works, and how to set up the uDMA engine on a Stellaris/Tiva C microcontroller.
On a more personal note, I’ve gotta say it was unspeakably enjoyable getting this up and running. From starting to look at datasheets to having a constantly changing rainbow pattern sent out to the LED strip via uDMA, it took about two hours to get everything up and running. Two hours, start to finish, to implement something pretty damn cool. I’m still working on finding my footing in the post-silicon validation job I took back in March for an ARM based server chip, and it was so, so nice to just open up a well written datasheet, grab some documentation for a software library written by folks who specialize in customer facing software libraries, and knock out some application code. I miss that
Welcome to my web page. There isn’t much here yet. Eventually, I’ll use this as a location for project summaries and videos, Eagle/KiCad part libraries I use, and maybe even a place to purchase some of the random boards I’ve designed that have been useful.
For now, I’m working on making writeups for my projects, which can be found in the menu along the top of this page.