Wednesday, October 12, 2011

Filter On Board

Plan for today: Continue implementing optional on-board coincidence filtering.

Ray and I couldn't visit CLC this morning, and he's busy tomorrow and I'm busy Friday, but he's going to go over there by himself on Friday.

Samad came by and we talked a little about the battery life analysis.  He is going to come by again next Tuesday so we can investigate the power draw for the DE3 board when running the GPS app.

Juan came by and recommended SVN for version control in Quartus (which I had asked him to investigate).  It costs money to get a private hosted server, but we could set up our own server on the xserve.  I asked Juan to investigate what ports are needed and ask Antony how hard it would be to get the ports open.

Juan is going to see if he can run Quartus under VirtualBox on the center Mac, and access the Quartus license server through wireless via our router.  However, when he tried accessing the Internet through the wireless connection, it was only working sporadically for some reason.  We added the DNS hosts, but it still had problems.

Juan found that SourceForge supports private SVN repositories, so we are probably just going to use that.

I finished my changes to the pulse-buffer module in the firmware (pulsebuf.{h,c}) to support on-board coincidence filtering.  Compilation of this new version of the code is enabled by defining the FILTER_COINS preprocessor symbol in pulsebuf.h, and we can revert to the old version of the code easily by commenting out that line (leaving FILTER_COINS undefined).  The new version adds only about 1K to the code size.  It still fits!

Can't do a test right now, because I think Ray may still be in the middle of collecting coincidence data.  Speaking of that, the SciLab run I started yesterday to count coincidences in that million-line dataset is still running!  It's at 23,657 coincidences so far, but it is just CRAWLING!!!  This just goes to show how doing the coincidence filtering in real-time on the board will be much better...

Anyway, I'll test my new code next time I'm here...  Probably next Monday PM.

Tuesday, October 11, 2011

Sigh Lab

Ray texted me that one of the scintillators is producing pulses 1,000x wider than the other, but I don't understand how that's possible?!?!?  I didn't see that behavior.  Ask him to show me what he is seeing when he gets in.  Aha, turned out one of the scope inputs was set to the wrong impedance (1 Mohm instead of 50 ohm).

Sigh, stupid SciLab ran out of memory again even, when processing only 1/4th of the data file, a mere 2 million lines.  Let's try cutting the file in half again...

Cutting at Wed Oct 05 22:41:00 2011.

The new data file "node0.uart-cut.1st-egth.trnscr" has 931,499 lines.  Changing Scilab script (anal-pulses.sce) to preallocate space for 1,000,000 data records.

In case this is going to work this time, let's generate some summary data.

The length of this (1/8th) run is 3,518,106,163,688 time steps (5 ns each), which is 17,590.53081844 seconds or 293.17551364067 minutes or 4.886258560678 hours.

Start time:  Wed Oct 05 17:47:49 2011 + 629 ms
End time:    Wed Oct 05 22:40:59 2011 + 978 ms

Length of run (according to when data was logged on the server) was thus 5 hours, -7 minutes, 9 seconds, 349 ms, or in other words 293 minutes, 9.349 seconds, i.e., 293.1558167 minutes, or 17,589.349 seconds, so in other words, the server's idea of the length of the run differed from the board's idea of the length of the run by only 1.182 seconds out of ~17,590, or in other words, by only 0.0067% or 67 parts per million.  Not too shabby... Sometime, I might want to figure out what accounts for most of the discrepancy (board clock drift or communication delays), but it's not a priority at the moment.

PMT #1 had 718,080 pulses over the 17,590 secs., for an average pulse rate of 40.82 pps.
PMT #2 had 213,413 pulses over the 17,590 secs., for an average pulse rate of 12.13 pps.

Ray is using the LeCroy scope to collect histogram data for the pulse height in the coincidences.

We are thinking maybe we really should start doing coincidence detection on the board, so that we can lower the thresholds without exhausting our data rate.  I may start working on this code.

Went back and reviewed my notes on this from an earlier blog post.  Started making the changes, but in the context of an "#ifdef FILTER_COINS" preprocessor option, so that we can easily revert to the old version of we have trouble fitting the new code.

When I left, I had just modified pb_add() in pulsebuf.c (in Q:\software_v4\FEDM_ctrl_fw\), and was about to modify pb_get(), which is where most of the work of the new algorithm will be.

The SciLab code is still running; over 11,000 coincidences so far on this dataset...

Monday, October 10, 2011

Memory Lapse

Ray and I are planning to go down to CLC Wednesday morning to re-establish that connection.  Hope someone there still remembers us!

Used 'wc -l' in Cygwin to count lines in the data file ("C:\SHARED\Server Code\logs 2011-10-07\node0.uart-cut.trnscr").  It is 8,619,504.  Increased array size to 8,620,000.  Now Scilab complains it needs 146,540,153 memory, and we only set stacksize to 100 million previously.  Let's then increase stacksize to 150,000,000.  Now SciLab complains it can't allocate that much memory!  OK, let's go back to 100 million, and split the data file in half.

Splitting into data before and after Thu Oct 06 15:56:00 2011, which is roughly in the middle of the data file.  Now have two files:
  • node0.uart-cut.1st-half.trnscr - 4,143,039 lines
  •  node0.uart-cut.2nd-half.trnscr - 4,476,465 lines
I'm processing the first half now.  I'm worried, though, that I might still run out of memory partway through the script.   Yes, indeed it did.  Let's see if we can increase the stack size to 125,000,000.  Nope.

Let's now extract just the 1st 1/4th of the data, before Thu Oct 06 04:20:33 2011:
  • node0.uart-cut.1st-qrtr.trnscr- 1,984,492 lines
The analysis script anal-pulses.sce is running now, and hasn't run out of memory yet.  But it hasn't finished yet.  Check on it tomorrow...

Friday, October 7, 2011

Fried Day

My brain is fried... Long day.

The run we started on Wednesday was still running.  Guess the crashing problems were caused by the heartbeat function!  I wonder if it just needs its own reentrancy structure - try that change sometime.

Anyway, I stopped the run and cropped the data file for analysis.

One of the Senior Design students (Michael Dean) came by and I spent a while with David taking him through the Quartus design.  We also went over the Scilab script.  I don't think the script will actually complete on the latest dataset, because it is too big.  Maybe just extract a limited-time-period excerpt from the data file?  Worry about that next week.

Wednesday, October 5, 2011

Up the Creek...

Paddle assembly #1 (in gun case #2), which was responding more weakly than the other one, has been removed and replaced by the other paddle assembly (not yet yested).  Call this paddle assembly #3.  I put masking-tape labels on paddles #2+3 to tell them apart.

Let's now begin a new data-collection run, to see if the new paddle is performing any better than the old one.  Let these notes also serve as a guide for students as to how to start a new run.

First, I moved the previous set of server log & transcript files (COSMICi.server.log, COSMICi.node0.log, node0.auxio.trnscr, node0.uart.trnscr) to "C:\SHARED\Server Code\logs 2011-10-05" (labeling them with today's date, as is my convention).

Now, restarting "C:\SHARED\Server Code\COSMICi_server.py".  The Python interpreter's Windows console ("C:\Python31\python.exe") and the main TikiTerm window "COSMICi Server Console [Main Window]" open as expected, and the server starts generating heartbeats.

Now switching on the Wi-Fi board.  The TikiTerm windows for the 3 automatic connections from the board open up:
  1. MAIN connection window ("Main Server Connection #0 from 192.168.0.6:49646") for log messages and other commands from the Wi-Fi board to the server.
  2. AUXIO connection window ("Node #0 AUXIO Bridge #0") for auxilliary text output from the Wi-Fi script for diagnostic messages, user command prompt, etc.
  3. UART connection window ("Node #0 UART Bridge #0") for data bridged directly from the FEDM UART.
Now, power up the cooling fan, currently powered at ~4.5 V.  The thermoelectric plate is outputting ~ 2-3 mA (wonder why?).

Next, power up the FEDM at 6 V (5 V is also OK).  Current is 1.86 A.  An old version of the firmware is programmed in.  Before going on, I want to update it.  The Quartus license server is still running.  Starting Quartus.  Selecting Q:\COSMICi_FEDM.qpf.  Starting Programmer.  Selecting mode "Active Serial" to reprogram EPROM.  Switching JTAG connector from U7 to J1.  Selecting "New_with_Nios_trim.pof" programming file ("New_with_Nios_trim" being the current project revision).  Selecting "Program/Configure" and "Verify" options, and clicking "Start".  We get the expected startup message from the latest firmware:

FEDM_STARTING,v0.6
DAC_LEVELS,-0.300,-2.500,-0.400,-0.500,-0.600,-0.700
FEDM_READY
BADUMP,1
BADUMP,2

Now Juan is here, and we plugged in the detectors.  The new paddle assembly (#3), in gun case #2, plugged into SMA connector #1, is now responding more strongly than the others!

Juan is going to set up the Macs so that VirtualBox runs only in one dedicated account which we all share.

I shared the FEDM_design Dropbox folder with Juan.  He is going to look into how to do version management in Quartus development.

Samad came by and we verified that the current draw of the detector is about 20 mA, which is what we had expected based on the quiescent power dissipation figure from the manual for the bases.

We also probed the voltages on the DE3 power supply.  Pins 2 (black) and 3 (green) have to be connected to cause the supply to turn on.  We determined the voltages supplied on the other pins: Altogether, the levels available are +3.3V, +5V, +12V, and -12V.  Samad and I discussed that in stage 1 of the design we might be able to just use this supply to power all our electronics from that supply.

After Samad left, I hooked up both the Wi-Fi board and the FEDM to the +5V output from the DE3 supply.  That is working fine.

Meanwhile, since the pulse rate on the new paddle is high, we are getting hanging problems again.  I ran the firmware under the debugger, and during a hang I suspended execution and observed that it was in the output code under the heartbeat callback.  The "reentrant" stdio routines aren't really re-entrant!  Perhaps because uC/OS-II isn't running.  Anyway, I commented out the heartbeat setup and am running again.  If that is the only problem, maybe it won't crash now!

Tuesday, October 4, 2011

Board, Interrupted

Came in at 3:00 pm on Tuesday to see how run was going, only to find it had stopped at some point.

Looking back at the transcript file (C:\SHARED\Server Code\node0.uart.trnscr), the run started at 5:14 pm, with a 250 mV threshold and 100 mV ladder steps.  (Note that DAC level 2 is unset.)

Mon Oct 03 17:14:32 2011 + 209 ms: < DAC_LEVELS,-0.250,-2.500,-0.350,-0.450,-0.550,-0.650

The end of the run, when the firmware hung, was 6:31 pm - around the time I left?

Mon Oct 03 18:31:19 2011 + 668 ms: < PULSE,1,30100,804013730552,1,(0,5)
Mon Oct 03 18:31:19 2011 + 669 ms: < PULSE,2,56923,804017705582,1,(0,4)
I'm wondering if something I did when I was leaving caused the run to halt.  Maybe I closed Eclipse while it was being used for STDOUT?

Anyway, let's try restarting.  I started a Quartus compile before I left, and it succeeded, so we can just reload everything from there.

BTW, I hooked the Peltier cooler to the ammeter function on the multimeter, and am getting readings like -0.450 A (-45 mA).  Actually, that was when the FPGA was in the crashed state; now that we're running again, the reading is decreasing (in its absolute value).  At this moment, it's -41.2 mA.

It makes sense that the reading is negative, because the device is passively responding to an outwards heat flow, so the bottom side of the plate is warmer than the top side.  To actively cool the system, we have to push current in the opposite direction from the way it flows naturally, so that the bottom side of the plate then becomes *cooler* than the top side.

Pulse rates for that 1-hour run were:

PMT#1:  30,100 pulses / 77 minutes = 390.1 ppm =   6.51 pps
PMT#2:  56,923 pulses / 77 minutes = 739.3 ppm = 12.32 pps

Stopped the current run, because Ray wants to replace the paddle assembly (scintillator + PMT) for SMA channel #1 (which is currently the one in Gun Case #2) with the third one to see if that improves things.

Monday, October 3, 2011

Work-Around

Today David & Darryl came in and we tested the modifications David made on Friday to stub out the broken DAC #2.  All works fine.

We were seeing too many events at a 250 mV (1st) threshold, causing the system to hang, so we upped the 1st threshold to 300 mV.

There is a weird problem where the third comparator output isn't transitioning on the scope display, but it seems to be working fine in the actual code, so we are not worried about it too much at the moment.  (The 2nd one also isn't transitioning, but that is expected due to the bad DAC.)

We manually checked all the DAC levels to be sure they are still good, and they are all fine except for DAC #2 which is (still) stuck at about +70 mV.  But that doesn't matter now that we are skipping it.

Started a new data-collection run in the perpendicular hodoscope configuration (finally!).  We are running with the cooling fan only (thermoelectric plate disconnected) to prevent condensation/drips.  TO DO:  Hook up plate terminals to a resistor, measure voltage across resistor; this then gives a measurement of heat flow through the plate.  :)