Christmas Ornament 2015

Christmas Ornament 2015 Writeup

Each year, Silicon Labs (my current employer) puts together a Christmas ornament to give to all our employees. I was asked to help out this year, and had the idea to use finger joints to make a mini-present out of circuit boards. When we were prototyping it, I couldn’t help but notice the bare copper board looked a lot like the brick blocks from the NES Mario Brothers game. So I started playing around with making another ornament in my free time.

There were a few major features I wanted the ornament to have: capacitive sensing for touch interaction, audio playback, a set of lamp LEDs inside the ornament, and an acrylic mock coin with an embedded LED above the ornament.

Capacitive Sensing

I was very, very happy with how easy the capacitive sensing portion of the project was.  A different group at Silicon Labs makes a micro controller that’s designed to handle cap sense applications, and they release a library that made firmware integration really easy.  It’s a precompiled binary, which I’m not overly fond of, but the resulting API required only four function calls to make everything work, which was an overall pleasant experience:

//Configures all peripherals controlled by capsense
//Initialize the capsense firmware
//Gather new capsense data
//Check for a press
if (CSLIB_isSensorDebounceActive(0)) {
/* do stuff */

Audio Playback

The easiest way to play audio from the MCU was by utilizing the PWM peripheral.  The PWM channel was set up to run in 8 bit mode, using the system clock (24.5 MHz) as its source.  A timer was then set up to activate an interrupt at 22.1 KHz (the sampling frequency of the audio clips from the original NES Super Mario Brothers).  On each timer interrupt, the next PWM value was loaded from memory into the register that controls the PWM active width.  The digital PWM signal was then routed to a line buffer, which boosts the signal strength to ~25 mA.  I routed the audio output from the MCU to each of the four inputs on the line buffer I was using, and placed solder pads by the line buffer that can be used to enable each of the three extra channels (in case louder volume is desired).  For my ornament, I only ended up using the one channel, but theoretically all four channels can be used to significantly amplify the audio signal if the ornament will be placed in a loud environment.

Once the PWM signal has been generated and amplified, it is passed through an LC circuit that acts as a bandpass filter to get the digital PWM signal to more accurately represent an analog signal for the speaker.

Audio Storage

The FM970 MCU I used has 32k of flash on chip, which equates to a bit more than one second of audio (when using 1 byte per sample and a 22.1 KHz sampling frequency).  This is obviously not nearly enough storage, so I decided to use an external I2C EEPROM to add more storage.  This presented me with three additional challenges.

EEPROM writing

I wasn’t able to find any simple lab tools to just take an array of data and write it over I2C (at least, not on a DIY budget).  To fix this, I added UART pins to the MCU, and came up with a simple protocol that can be used to send data from a host PC to the MCU over UART, which the UART then translates to I2C writes to the EEPROM:

# Protocol:
# [0] denotes LSB. Host will assume idle on start, and send 0xFE when it is
# ready to send a message. Device will respond with 0xED when it is ready to be
# fed. Host will then send header, then data, then footer (0xF00D). Device will respond
# with 0xBEEF when it thinks the message is done, or 0xBAAD followed by one byte
# of error code if it got the message but saw something squirrely.
# Idle->Active: Send 0xFE, wait for 0xED from device
# Header: chunk size[0]
# chunk size[1]
# chunkLocation[0]
# chunkLocation[1]
# chunkLocation[2]
# chunkLocation[3]
# Data: data[0]
# ...
# data[chunk size - 1]
# Footer: 0xF0, 0x0D


EEPROM File System

I wanted to be able to generically determine where the files were, so I came up with a simple file system to allow the MCU to determine at runtime where all the files on the EEPROM lived and how to interact with them.  I allocated the first page of the EEPROM as a file table, using the following protocol to pick things up from there:

# FS Protocol:
# header is always one EEPROM page, EEPROM_ALIGNMENT bytes
# each block has a 32 bit start address and a 32 bit size.
# byte 00: number of entries in header block (files in system)
# byte 01: reserved
# byte 02: reserved
# byte 03: reserved
# byte 04: block 0 start address (byte 0, LSB)
# byte 05: block 0 start address (byte 1)
# byte 06: block 0 start address (byte 2)
# byte 07: block 0 start address (byte 3, MSB)
# byte 08: block 0 size in bytes (byte 0, LSB)
# byte 09: block 0 size in bytes (byte 1)
# byte 10: block 0 size in bytes (byte 2)
# byte 11: block 0 size in bytes (byte 3, MSB)
# byte 12: block 1 start address (byte 0, LSB)
# byte 13: block 1 start address (byte 1)
# byte 14: block 1 start address (byte 2)
# byte 15: block 1 start address (byte 3, MSB)
# byte 16: block 1 size in bytes (byte 0, LSB)
# byte 17: block 1 size in bytes (byte 1)
# byte 18: block 1 size in bytes (byte 2)
# byte 19: block 1 size in bytes (byte 3, MSB)
# byte EEPROM_ALIGNMENT: block 0 byte 0
# etc etc etc etc


Once I had a file system protocol in place, I wrote a few python scripts: one to take a 16 bit .wav file and convert it to 8 bit unsigned values (with a 22.1 KHz sampling frequency) and store those values as a .hex file, a script to take a number of hex files and use them to generate a single hex file containing a formatted file system, and a script to take a file system and transfer it over UART to the MCU.  This gave me a quick, simple workflow to take audio files and transfer them in a playable format to the MCU.

Real time EEPROM reading

The final challenge to playing audio was making some customizations to an I2C driver.  I was using a sample driver from a Silicon Labs app note that assumed all transactions would take place at once.  You give it an address, a buffer, and a size, and it would fill the buffer with data as fast as possible.  I had to make a slight modification to allow the driver to instead start a read transaction, read a single byte, and hold the bus idle (as opposed to ending the transaction) until another API call informed the driver that the data had been digested and the next byte could be read.  Nothing horrible, but not trivial.

LED Routing

With cap sense working and audio playing, the only remaining feature was getting the LEDs to turn off and on at the right times.  My idea for the enclosure was to have portions of the sides of the cube made with bare FR4 (no copper, no soldermask), so that if I turned on an LED inside the cube, it would give a lampshade effect, allowing light to only pass through the portions of the cube that didn’t have copper.

In order to get electrical signals from the MCU at the bottom of the cube to the LEDs at the top of the cube, I added four exposed pads to each panel of the cube.  I then used a small bit of stiff wire to connect these pads to the top and bottom of the cube to one of the sides.  This gave both electrical conductivity and structural integrity to the cube.  Four LEDs provide the internal “lamp” light, with an additional LED used to illuminate the coin on top of the cube.  To make sure the LEDs don’t draw too much current from the GPIOs, a transistor is used to switch them on and off.


Each board was milled with finger joints along the edges, which allowed the sides of the cube to be press fit together.  The top, front, and bottom of the cube are held in place with the solder joints mentioned above.  Two of the remaining sides were held in place with superglue.  The last side was meant to be press fit, so that it could be removed to access the debug headers and batteries.  The press fit did not hold over repeated uses, though, so an alternate method had to be used to keep the last panel in place.  I ended up placing small pieces of solder wick on the exposed pads in order to create a simple hinge to hold the panel in place.

Once the board was assembled and the components soldered, the last step was to create a coin to suspend above the cube.  This was done using a laser cutter to etch and cut a piece of 1/4 inch clear acrylic and a drill to create a mounting hole for the LED.

Once the LED was soldered in place, a Christmas ornament hook was used to suspend the box from a branch of the tree.  Overall, the effect was exactly what I was going for, and I was very pleased with the final product.

Binary Watch Production

I’ve been spending a decent bit of my free time lately on my binary watch project.  I have a full writeup that I added to the projects section (which goes into more detail about the hardware and software used, and contains information about my github and storefront), but I figured I’d also do a post on the production flow I’ve been using, since I think that’s kind of cool (thus reaffirming that I am an embedded systems nerd : )

To start with, I made the conscious decision at the design phase of the project to use a QFN package for the microcontroller in order to save space and force myself to improve my soldering skills and tools.  This package has two big hurdles to overcome.  For one, it does not have leads extending from the package like I’m used to with, say, a TSSOP part.


Instead, the exterior of the package has exposed pieces of metal on the outside and underside of the package that the solder must adhere to.  The second big hurdle is that the package has a very large ground pad in the center.  Since the ground pad is not exposed, it is nearly impossible to get solder to flow on it without using either a hot air rework station or a reflow oven.


To overcome these hurdles, I ordered a stencil along with my circuit board when sending the files off to China for fabrication.  The stencil is basically just a 13 x 15 cm piece of metal that has holes cut to align with the exposed solder pads on the board.  The stencil allows you to very quickly put solder paste on the entire board, by using a spatula like piece of metal to spread paste over all the pads on the board at once.  At only $20, this was a very good deal in terms of money spent to time saved.  For my first attempt (while waiting for the boards and stencils to arrive), I picked up a stencil for just the QFN part and used a toothpick to apply solder paste to the remaining pads.  I had an older version of the PCB sitting around that worked well for practice.  The process took about 45 minutes, and it was very obvious at the end which paste was applied by the stencil and which was applied by hand.


It’s important to note here that it’s perfectly fine if the solder paste is all over the place.  The circuit board is covered with a solder resistant mask, so when the board heats up, the solder will flow to the nearest exposed pad.  Once the board was covered in paste, it took another half hour or so to manually place all of the components.  I was worried at first about making sure that all of the (very tiny) pins on the QFN package were aligned with the solder pads, but it turns out that the ground pad is so large that once the solder paste gets hot and liquefies, it exerts enough force on the part to pull it into the right orientation (assuming it’s not off too horribly to start).

Once the board was covered in solder paste and the parts were placed, it was time to get the board hot enough to flow the solder.  To do this, I picked up a few new toys.  From craigslist, I found a toaster oven that needed a new home.  From Amazon, I picked up a  PID Controller that will read the temperature in the oven via Type K themocouple and carefully pulse the oven off and on to achieve and maintain a stable temperature.  I then picked up a relay and heat sink that were rated high enough to handle the high amperage that the heating coils of the toaster oven pull.  Put these together, and you have a basic reflow oven for under $75 that you can put together in about an hour.

Oven running and carefully controlled, it was time to drop the board in and see how it handled the heat.


I dropped the board onto the rack, then shot for a target temperature of 180 degrees celsius.  Once I observed the paste start to flow on the pins of the QFN part, I started a two minute timer.  At first, I pulled out the board immediately, not realizing that the paste that was sandwiched between the board and the QFN’s center ground pad would need much more time to heat up before it would flow, hence the additional two minutes.

The board looked pretty good, but I wanted to be certain that nothing went wrong with those tiny QFN pads.  I plugged in a USB microscope that I got off ebay for ~$15, and was able to get a closer look at the pads.


I don’t think those are supposed to be bridged like that…


Neither should those… and that capacitor is probably supposed to be touching that pad.

Easy fixes to make, but important fixes to make.  Fire up the soldering iron with a tiny tip attached, and reflow the pads.

Once the components were all in place, it was time to solder the back of the board.  Since there were just a few 0603 components on the back, I just soldered everything by hand.


By the end of the night, I had a fully working watch that took me about three hours to assemble.  Nice for a oneoff, but not so great if you’re looking at making, say, eight of them at once.


Flash forward about a month and a half, and I had received my batch of boards and real stencil.  I quickly found that it took an unpleasant amount of time to get the stencil perfectly aligned to the board, and it was difficult to keep that alignment when swiping the spatula and solder paste across the stencil.  Fortunately, I’ve been playing around with a CNC router lately, so I made a very simple mount for them.


With this, the board and stencil are automatically aligned, allowing me to very quickly apply paste to multiple boards.  I settled on 8 boards for my trial run, which took me about 45 minutes to get paste and parts on.  Next, into the oven


I still had to do some manual cleanup on the pins (especially the ones near the RTC crystal… for the next batch, I’m definitely going to use less paste on that section to try and prevent bridges).  After cleanup, it was time to program the boards.  I’m using a PIC micro controller here, so I was able to utilize the programmer to go functionality of the pickit 3 to just power the programmer and hit the button to load firmware to a new board.


All told, it took me about an hour and a half to get eight boards done.  This is not including the bottom of the board, which will probably add an extra half hour to the process.

Overall, I am pretty happy with my process flow.  This is the first time I’ve tried to do this many boards with parts this small out of my garage, and I’m happy with the quality and time commitment required to do batches of ~10 boards.  Plus, I got to take apart a toaster oven, and that’s always fun :)


New hardware

Coming soon:

toaster oven  + controller + relay + heat sink + solder paste = easy soldering

Should be a fun post :)

Intro to DMA


I’ve been playing around with WS2812b LEDs lately, and have been using my trusty TI Tiva C Series and Stellaris Launchpads as a controller for them.  The first thing I did with these was to use the SPI peripheral to drive the data lines on the LED strip.  This left me with sample code that will keep track of a 720 byte array of SPI data to send on the bus, an index into that array, and an interrupt handler that will get called whenever the SPI output buffer has room for more data.  The problem with this approach is that the SPI buffer is pretty shallow (8 bytes), which means a non-trivial amount of processor time is spent in my interrupt handler just grabbing data from a static memory location and transferring it into the SPI output register.  This presents us with an ideal opportunity to talk about the micro direct memory access, or uDMA, engine.  If you don’t care about my attempts at explaining how DMA works on these micro controller and just want to reference my WS2812b over SPI using uDMA library, feel free to ignore everything past this and head over to my github.

First, some basics: what is a DMA engine?  In its simplest form, a DMA engine is a peripheral that has access to the address bus and data bus in a chip, and the ability to initiate memory transfers.  It has a register interface just like any other peripheral that can be used by the processor to tell the DMA engine where to read from, where to write to, and how to set up the transfer.


In my example, I can set up the DMA engine to start reading data from a known memory location (where my output data array lives) and write it to a known memory location (the SPI data out register) whenever certain conditions occur (the SPI transmit buffer is empty).  This means that my processor is free to interface with other peripherals, crunch numbers, or even enter a sleep state while the DMA engine handles the trivial business of copying all the data.

So now let’s look at how to set up the DMA engine itself.  Fortunately, we have an incredibly well designed software interface (driverlib) that, if past experience is any indication, will be so intuitive that we can get by entirely by looking at nothing more than sample code!


uDMAChannelTransferSet(UDMA_CHANNEL_SSI1TX | UDMA_PRI_SELECT, UDMA_MODE_BASIC, pui8SPIData, (void *)(SSI1_BASE + SSI_O_DR), ui16DataSize);

Hrm. Maybe not.  The downside to uDMA is that, while it is very flexible and powerful, this unfortunately leads to complexity.  The API is pretty straightforward once you read up on the documentation, so lets start digging in!

The first argument to all of these functions is the DMA channel number.  It is pretty common to want to use DMA for multiple peripherals at the same time, especially in a system where you’re trying to crank out as much work from the processor as possible.  Just like the processor, the DMA engine can only transfer one chunk of data at a time, but having multiple DMA channels allows the engine to have a way to keep track of multiple transactions to multiple peripherals, called DMA channel arbitration.  In the Stellaris and Tiva C microcontrollers, the DMA engine has 32 separate channels.  Each of these channels has four unique hardware peripherals it can be used to interface with, or can be used to do a software DMA transfer (moving data from one place in memory to another, as opposed to moving between memory and a  peripheral).  The datasheet for the microcontroller found on the launchpad contains a table of all the possible configurations for all the possible DMA channels (Table 9-1 in the TM4C123GH6PM datasheet)

DMA channels


Once we’ve determined which DMA channel we want to use and what peripheral we want to use that channel for (represented by the UDMA_CHANNEL_* and UDMA_*_SELECT macros), we need to set up how the DMA transaction is going to work.  This is accomplished by calling the uDMAChannelControlSet function.  The first argument to this function is the channel we want to configure or’d with the hardware peripheral we want to tie that channel to.  The second argument is a binary or of the data size, source increment amount, destination increment amount, and arbitration size.  Data size is pretty simple; it refers to how much data we’re going to move on each transfer.  This is normally dictated by the peripheral you’re trying to interact with.  The SPI data out register is 8 bits wide, so we use the UDMA_SIZE_8 macro.  The source increment amount tells us how much, if at all, we should increment the source pointer by on each transaction.  Since we’re reading from an 8 bit array, we use the UDMA_SRC_INC_8 macro to cause the DMA engine to increment the address it’s reading from by 8 bits each time it performs an 8 bit transfer.  The destination increment amount tells the DMA engine how much to increment the write address by on each transfer.  We’re writing into the SPI data out register, and want each DMA transfer to write into that register, so we use the UDMA_DST_INC_NONE macro to keep the DMA engine from incrementing the destination address.  Finally, the arbitration size.  This one is kind of tricky: it tells the DMA engine how many transfers it should execute before before performing bus arbitration.  The SPI peripheral has a transmit FIFO that is 8 entries deep, so it makes sense to use an arbitration size of 8 for our scenario.  If we were using multiple DMA channels in parallel, this would cause the DMA engine to write 8 bytes of data from our source array into the SPI TX FIFO, then check to see if any other channels had data ready to transmit.  That way, the DMA engine won’t be wasting time waiting for the SPI TX FIFO to drain when it could be using that time to transfer data for other DMA channels.  This can be tricky though, as it allows for the possibility of a low speed DMA transfer with a large transfer size causing our SPI taking longer to complete than it would take our SPI TX FIFO to drain, which in our example would cause the WS2812b LEDs to see a sustained 0 until the slower DMA transfer completed, which would be interpreted as an end of frame!  If we were using this in a system where we were worried about such an event, we could set the arbitration size to be the length of our entire SPI transmit array, which would guarantee our DMA transfer wouldn’t get interrupt (at the cost of latency to every other DMA channel in the system).

The last function called, uDMAChannelTransferSet, is used to set up more details about the DMA transfer.  Again, the first argument is  the channel we want to configure or’d with the hardware peripheral we want to tie that channel to.  The second argument is the DMA mode you want to use.  This article just covers the basic DMA mode, which is just a straight transfer to or from single, static memory locations or registers.  The third argument is the source address.  For our example, this is going to be the array we’re wanting to transmit.  The fourth argument is the destination address, which for us is the SSI transmit register.  The final argument is the number of bytes that should be transferred before the DMA engine considers the transaction complete.  For us, this is the size of the array we’re transmitting.

Setting up the initial DMA transfer is the most difficult part of this.  Once those two functions are called, all you have to do is enable the DMA channel (uDMAChannelEnable(UDMA_CHANNEL_SSI1TX)), and the DMA engine will start moving data.  An interesting gotcha to watch out for with this is that when the DMA transfer is complete, the interrupt handler for the peripheral assigned to the DMA channel will be called, as opposed to the DMA interrupt (assuming you have interrupts enabled).  So in our case, once the SPI data array has been fully transmitted on the SPI peripheral by the DMA engine, the SPI interrupt will be called.  From here, I can either set up the interrupt handler to inform the main code that another frame is ready to be transmitted, or I can just set up the DMA transfer to start again, which would cause any changes to the SPI data array to propagate to the LED strip without having to use any precious processor cycles.

As always, the source code for this example can be found on my github.  Currently, I only have a simple version of the library running, which is hard coded to use the SSI1 peripheral, DMA channel 25, and pin PF1 for the SPI output.  I’m currently working on splitting this into a generic version of the library, which will allow for running multiple SSI peripherals in parallel, customization of choice in TX pin, and the option to specify a callback function used every time a frame is done being sent to the LEDs.

Tiva C Series/Stellarisware WS2812b library

I’ve got a pretty cool new library just about ready to publish on github.  I’ve been playing around with WS2812b LEDs lately, and have been using my trusty TI Tiva C Series and Stellarisware launchpads.

The first thing I did with these was to hijack the SPI peripheral to drive the data lines on the LED strip.  I thought myself rather clever for realizing that the proprietary one wire protocol the 2812s use could be functionally implemented using the SSI protocol given the right combination of bit packing and frequency setting, but a bit of googling showed me that this is pretty normal nowadays, especially on the Arduino platform.  Ah well… If you haven’t seen it, it’s novel to you?  Still feel accomplished for that, but nothing groundbreaking enough to post about (hence why I haven’t mentioned it here, despite having it working since last April).

What to do next then?  Well, my library for the 2812s relied on waiting for the SSI interrupt to signal another byte was ready for the bus, filling the byte, and waiting for the next TX done signal to come in.  It was ok for parallel processing in the sense that it at least didn’t just sit in a spinloop while the data was transferring, but I still felt my solution was rather inefficient.  Enter the uDMA engine.  My sample code is now to the point that the uDMA engine is set up to be constantly running on the transmit array.  This means you can update the output color array at any point in software, and the uDMA engine will cause the change to “automatically” propagate onto the LED strip, with an extremely minimal amount of software overhead.  The only processor overhead for this method is an interrupt subroutine executing each time the LED strip is ready to be refreshed, which consists of about eight lines of code.

My next step is going to be integrating my uDMA sample code into a proper uDMA library.  I really like the writeup ADAfruit has on interrupts, and I think it would be cool to do a similar writeup on what DMA is and how it works.  So step one will be getting a proper uDMA based library up on github.  Step two, use that library as the basis for a post on what DMA is, how it works, and how to set up the uDMA engine on a Stellaris/Tiva C microcontroller.

On a more personal note, I’ve gotta say it was unspeakably enjoyable getting this up and running.  From starting to look at datasheets to having a constantly changing rainbow pattern sent out to the LED strip via uDMA, it took about two hours to get everything up and running.  Two hours, start to finish, to implement something pretty damn cool.  I’m still working on finding my footing in the post-silicon validation job I took back in March for an ARM based server chip, and it was so, so nice to just open up a well written datasheet, grab some documentation for a software library written by folks who specialize in customer facing software libraries, and knock out some application code.  I miss that :(

The pipeline is a flowin

Couldn’t have timed that better if I tried… only a few days after sending out the layout for my next project, the prototype boards for the smart business card project came in.  The shipping took about 2 weeks longer for these than the last order I placed at DirtyPCBs, but they made up for it by sending about eight extra boards, which was a nice touch :)

Hopefully I’ll have one populated and functional by the end of next week, at which point I’ll start working on a proper writeup.  In the meantime, here’s a few pics of the bare PCBs.


The board is composed of a bottom piece, where all the components reside,


and a top piece, which has holes milled out for all the high profile parts.


Each piece has a large exposed trace around the edge, which will be used to solder the boards together.  For now, I’m using the design Mathieu Stephan came up with, which he posted about on his blog.

Once I get this up and running, I’m planning on making a few design tweaks and personalizing it a bit for myself.  If nothing else, I’m hoping this project will be a good introduction for me to the ATMega32 line of micro controllers, which I haven’t yet worked with.

Done with the old

I finished pulling in the old writeups I have for existing projects.  That means the marble maze and frequency analyzers are now up.  Now, it’s time to start with the new writeups.  I have one project close enough to done that I can make a writeup for it, with a second hopefully getting to that point by the end of September.  Fun times.

Hello world!

Welcome to my web page.  There isn’t much here yet.  Eventually, I’ll use this as a location for project summaries and videos, Eagle/KiCad part libraries I use, and maybe even a place to purchase some of the random boards I’ve designed that have been useful.

For now, I’m working on making writeups for my projects, which can be found in the menu along the top of this page.