Intro to DMA

20141204_122810

I’ve been playing around with WS2812b LEDs lately, and have been using my trusty TI Tiva C Series and Stellaris Launchpads as a controller for them.  The first thing I did with these was to use the SPI peripheral to drive the data lines on the LED strip.  This left me with sample code that will keep track of a 720 byte array of SPI data to send on the bus, an index into that array, and an interrupt handler that will get called whenever the SPI output buffer has room for more data.  The problem with this approach is that the SPI buffer is pretty shallow (8 bytes), which means a non-trivial amount of processor time is spent in my interrupt handler just grabbing data from a static memory location and transferring it into the SPI output register.  This presents us with an ideal opportunity to talk about the micro direct memory access, or uDMA, engine.  If you don’t care about my attempts at explaining how DMA works on these micro controller and just want to reference my WS2812b over SPI using uDMA library, feel free to ignore everything past this and head over to my github.

First, some basics: what is a DMA engine?  In its simplest form, a DMA engine is a peripheral that has access to the address bus and data bus in a chip, and the ability to initiate memory transfers.  It has a register interface just like any other peripheral that can be used by the processor to tell the DMA engine where to read from, where to write to, and how to set up the transfer.

0311bcfig1

In my example, I can set up the DMA engine to start reading data from a known memory location (where my output data array lives) and write it to a known memory location (the SPI data out register) whenever certain conditions occur (the SPI transmit buffer is empty).  This means that my processor is free to interface with other peripherals, crunch numbers, or even enter a sleep state while the DMA engine handles the trivial business of copying all the data.

So now let’s look at how to set up the DMA engine itself.  Fortunately, we have an incredibly well designed software interface (driverlib) that, if past experience is any indication, will be so intuitive that we can get by entirely by looking at nothing more than sample code!


uDMAChannelControlSet(UDMA_CHANNEL_SSI1TX | UDMA_PRI_SELECT, UDMA_SIZE_8 | UDMA_SRC_INC_8 | UDMA_DST_INC_NONE | UDMA_ARB_4);

uDMAChannelTransferSet(UDMA_CHANNEL_SSI1TX | UDMA_PRI_SELECT, UDMA_MODE_BASIC, pui8SPIData, (void *)(SSI1_BASE + SSI_O_DR), ui16DataSize);

Hrm. Maybe not.  The downside to uDMA is that, while it is very flexible and powerful, this unfortunately leads to complexity.  The API is pretty straightforward once you read up on the documentation, so lets start digging in!

The first argument to all of these functions is the DMA channel number.  It is pretty common to want to use DMA for multiple peripherals at the same time, especially in a system where you’re trying to crank out as much work from the processor as possible.  Just like the processor, the DMA engine can only transfer one chunk of data at a time, but having multiple DMA channels allows the engine to have a way to keep track of multiple transactions to multiple peripherals, called DMA channel arbitration.  In the Stellaris and Tiva C microcontrollers, the DMA engine has 32 separate channels.  Each of these channels has four unique hardware peripherals it can be used to interface with, or can be used to do a software DMA transfer (moving data from one place in memory to another, as opposed to moving between memory and a  peripheral).  The datasheet for the microcontroller found on the launchpad contains a table of all the possible configurations for all the possible DMA channels (Table 9-1 in the TM4C123GH6PM datasheet)

DMA channels

 

Once we’ve determined which DMA channel we want to use and what peripheral we want to use that channel for (represented by the UDMA_CHANNEL_* and UDMA_*_SELECT macros), we need to set up how the DMA transaction is going to work.  This is accomplished by calling the uDMAChannelControlSet function.  The first argument to this function is the channel we want to configure or’d with the hardware peripheral we want to tie that channel to.  The second argument is a binary or of the data size, source increment amount, destination increment amount, and arbitration size.  Data size is pretty simple; it refers to how much data we’re going to move on each transfer.  This is normally dictated by the peripheral you’re trying to interact with.  The SPI data out register is 8 bits wide, so we use the UDMA_SIZE_8 macro.  The source increment amount tells us how much, if at all, we should increment the source pointer by on each transaction.  Since we’re reading from an 8 bit array, we use the UDMA_SRC_INC_8 macro to cause the DMA engine to increment the address it’s reading from by 8 bits each time it performs an 8 bit transfer.  The destination increment amount tells the DMA engine how much to increment the write address by on each transfer.  We’re writing into the SPI data out register, and want each DMA transfer to write into that register, so we use the UDMA_DST_INC_NONE macro to keep the DMA engine from incrementing the destination address.  Finally, the arbitration size.  This one is kind of tricky: it tells the DMA engine how many transfers it should execute before before performing bus arbitration.  The SPI peripheral has a transmit FIFO that is 8 entries deep, so it makes sense to use an arbitration size of 8 for our scenario.  If we were using multiple DMA channels in parallel, this would cause the DMA engine to write 8 bytes of data from our source array into the SPI TX FIFO, then check to see if any other channels had data ready to transmit.  That way, the DMA engine won’t be wasting time waiting for the SPI TX FIFO to drain when it could be using that time to transfer data for other DMA channels.  This can be tricky though, as it allows for the possibility of a low speed DMA transfer with a large transfer size causing our SPI taking longer to complete than it would take our SPI TX FIFO to drain, which in our example would cause the WS2812b LEDs to see a sustained 0 until the slower DMA transfer completed, which would be interpreted as an end of frame!  If we were using this in a system where we were worried about such an event, we could set the arbitration size to be the length of our entire SPI transmit array, which would guarantee our DMA transfer wouldn’t get interrupt (at the cost of latency to every other DMA channel in the system).

The last function called, uDMAChannelTransferSet, is used to set up more details about the DMA transfer.  Again, the first argument is  the channel we want to configure or’d with the hardware peripheral we want to tie that channel to.  The second argument is the DMA mode you want to use.  This article just covers the basic DMA mode, which is just a straight transfer to or from single, static memory locations or registers.  The third argument is the source address.  For our example, this is going to be the array we’re wanting to transmit.  The fourth argument is the destination address, which for us is the SSI transmit register.  The final argument is the number of bytes that should be transferred before the DMA engine considers the transaction complete.  For us, this is the size of the array we’re transmitting.

Setting up the initial DMA transfer is the most difficult part of this.  Once those two functions are called, all you have to do is enable the DMA channel (uDMAChannelEnable(UDMA_CHANNEL_SSI1TX)), and the DMA engine will start moving data.  An interesting gotcha to watch out for with this is that when the DMA transfer is complete, the interrupt handler for the peripheral assigned to the DMA channel will be called, as opposed to the DMA interrupt (assuming you have interrupts enabled).  So in our case, once the SPI data array has been fully transmitted on the SPI peripheral by the DMA engine, the SPI interrupt will be called.  From here, I can either set up the interrupt handler to inform the main code that another frame is ready to be transmitted, or I can just set up the DMA transfer to start again, which would cause any changes to the SPI data array to propagate to the LED strip without having to use any precious processor cycles.

As always, the source code for this example can be found on my github.  Currently, I only have a simple version of the library running, which is hard coded to use the SSI1 peripheral, DMA channel 25, and pin PF1 for the SPI output.  I’m currently working on splitting this into a generic version of the library, which will allow for running multiple SSI peripherals in parallel, customization of choice in TX pin, and the option to specify a callback function used every time a frame is done being sent to the LEDs.

1 Comment

  1. Hello Njneer,
    I’ve been reading your blog, just want to ask you regarding this library you mentioned at the end, that you are working on it. Library that will allow you to read multiple SSI in parallel via uDMA, since I’ve been having some problems with that in current project, can you just give me some kind of directions for uDMA and SSI configuration, just to see if I am on the right track, so no real code, just some kind of directions. If you get to have time for that. Thank you in advanced! If you don’t want to waste your time that way, thank you again at least for this post since is very well written and gives developer great starting point for uDMA.

    Kind regards,
    Jelena

Leave a Reply

Your email address will not be published.

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>