The Cosmic Inquirer: Short-Term Memory Loss

Yesterday I ran into problems fitting the firmware into a RAM small enough to fit the entire design onto the FPGA. Some possible lines of attack:

Tweak the RAM size a little more, so that everything (just barely) fits. However, two problems with this: (a) We will be very vulnerable to stack overflows; (b) there may not be any room for additional input-capture datapaths (for multiple PMTs, not to mention the timing input).
Create a ROM (for program text) and a separate RAM (for working memory), hoping that this will fit to the FPGA resources more efficiently. Risks: (a) I'm not sure yet how easy this is to set up; (b) it may make updating the firmware slightly more difficult; (c) it may not save that much space.
Simplify our software, taking out unnecessary features, such as its ability to handle simultaneous command-line input from both the JTAG debug port (stdin) as well as the main serial port (uart_0). Cons: I always hate to remove potentially useful features, if we can avoid it...
Reduce resource usage of the input-capture datapath. This can be done by, for example, pre-compressing the pulse data, prior to buffering, by using a maximum pulse-width assumption. However, the present resource usage should not be too large (mainly just a couple K of RAM, I think for the FIFO) . However, I should check this to make sure. Cons: This requires pretty extensive changes to the gelware that we have already spent a long time developing & testing; it will set us back a couple of weeks at least.
Reconfigure the Nios CPU in a way that might reduce its own resource usage, and/or the size of the program code compiled for it. This is a relatively quick change, but I don't yet know how much space I can save in this way.
Change the set of libraries used in the C code, to reduce space used by the libraries. However, using fewer libraries then makes our own programming job more difficult.

I think I will start by attempting 1, 5, and 2, in roughly that order.

Darrel and David should be here later and can finish the output stub. Maybe by then I will have worked around the space issue sufficiently for now, so that we can integrate all the code together and test.

I realized that, right now, with the waveform data not yet compressed, everything ought to work fine even for very long input pulses. At 1 Hz on the Tektronix wave generator, the minimum pulse width is 10 us, which should translate to a pulse width, as measured by our system, of about 5,000 of our 2-ns time units. This will allow us at least to test the icdp_driver using pulses that are rare enough so that they don't overwhelm the throughput capacity of the system, generating diagnostic messages at a watchable rate.

OK, memory size 72K didn't quite fit... Next to try: 70K. However, let me first look at simplifying the CPU and setting up separate instruction and data memories, since that might give us enough slack to avoid stack overflow hazards.

I'm going to try the following configuration:

Nios II/s, 64K ROM, 32K RAM. (This ROM size is plenty large enough to hold current program code; while the RAM is plenty large enough to hold the stack & heap.)
Instruction master connected to ROM, but not to RAM. (So hopefully all program code will be located in ROM, leaving the maximum RAM free for user data.)
Data master connected to both ROM and RAM.

I'm hoping this will help the fitter, since much of the ROM can be implemented in LUTs rather than using up the scarce RAM resources on the FPGA.

OK, system generation was successful; now let's try rebuilding the firmware.

It compiled, but the linker failed because it is trying to locate the .text section in the RAM region. Let's see if we can change the linker settings.

This is problematic, because the IDE crashes whenever I select "C/C++ Build" in the Properties dialog.

Trying the newer Eclipse environment instead. It pops up a bunch of Vista administrator authorizations, annoying! Also, there was a syntax error in a shell script while doing "make clean."

Oh well, the hello world template compiled anyway. And, the default BSP seems to locate the program text in the ROM memory region as desired. Changing ".rodata" to also be in that region. Now regenerating the BSP and rebuilding. Now we have 24K free for stack+heap in the RAM. Sweet. Ah, but damn! It doesn't fit on the device. I'm not sure that splitting the ROM and RAM even helped, especially since the unused ROM may be taking up space...

Now trying a 24K RAM (8K smaller), which will hopefully still leave us with 16K of application RAM. Still doesn't fit.

Let's try reducing the ROM size. According to the linker, program size is 55K, but that was just for the Hello World template. Let's throw our own code in there...

I had to update the interrupt module to use the newer interrupt API, since the new tool makes it difficult to get at the old API.

New size of program code is 67K, so it spills out of the 64K ROM, and leaves only 13K for working memory. So, no point now in making the ROM smaller.

Now making the ROM 68K and the RAM 20K, which should leave us with 12K. No fit.

David finished his code and we started working together to figure out what to do.

Since splitting up the ROM and RAM didn't help the fitting, I merged them back together.

After a while, we finally figured out that we could pass the "-Os" (optimize for code size) option to gcc in the "Nios II BSP Properties" for the BSP project (FEDM_ctrl_fw_bsp) and the "Nios II Application Properties" for the user project (FEDM_ctrl_fw). This helped quite a bit, and now we have about 5K of breathing room.

The software runs now. The interrupts are working, and the driver seems to somewhat be working, although we have some debugging to do.

We have an issue that although sometimes we get outputs that are *nearly* right, like:

6                                              <--- number of thresholds crossed
0 5016                                      <--- rise/fall times for threshold #1
1 5018                                      <--- etc...
1 5005
6 5001
5 4998
-1000000956 -999995970

where the values near 5,000 correspond well to the actual trailing edge times of our input pulse, the value of the last threshold always seems to be way off, and furthermore, more often we get almost all bad values, like:

6
0 16782369
16777218 16782355
16777219 16782349
16777220 16782349
16777222 16782342
360729523 360729524

In this example, in the first 5 lines, bit 24 seems to have been toggled, and in the last line (for threshold 6) we have some other problems (unsurprising perhaps, since the last line always has extra problems, and plus we noticed problems with threshold 6 previously, in our gelware-only testing, when the pulses were very wide).

Anyway, we need to go through and systematically debug this by inspecting different outputs, etc.

By the way: We really ought to add a reset function to the datapath, so that we don't have to reload the design every time it gets hung up.

Right now it's hanging up whenever the FIFO momentarily gets full. Although I think this is just due to a problem with multiple interrupts in short succession happening too quickly to get detected. Can fix this with a simple change to the event-handler routine (keep reading as long as have_data is high).

It occurred to me that I had problems previously when compiling code that used long longs in this environment, when I had lots of local variables. This could potentially be the cause of the problems. Rewrote the driver code to more closely imitate the style I used successfully last time.

OK, the datapath is no longer hanging up now, but data values are still getting mangled in pretty much the same way even though the code is now different. So I don't think the C code is at fault. I think something must be going awry earlier in the datapath. We should remember that we never actually checked anything besides the low byte before. So there is lots of stuff still to check.

From latest run, first packet:

6
0 1051
11 1042
16 1036
24 1028
33 1021
-50709543 -50709542

I tried some more code changes, but this is just making things worse. Therefore, I think we're having stack overflow problems. Need to work some more on reducing memory usage.

The Cosmic Inquirer

Tuesday, August 2, 2011

Short-Term Memory Loss

No comments:

Post a Comment