Saturday, January 21, 2012

Sat., Jan. 21st

Since time is short, today I am starting the code review of Darryl & Juan's new code from home.  Currently working in the Dropbox, since I don't have remote access to the office PC with its "Q:\" network share.

First, I renamed tcdp_driver.{c,h} to tsdp_driver.{c,h} for consistency with how this module is named in the files themselves.  Either name would have been OK (tc="time capture" or ts="time sync") but at minimum, we just need to be self-consistent, please!

Going through tsdp_driver.h, I noticed that the students duplicated the DoubleWord and time_val typedefs in the new tsdp_driver.h file.  That is poor coding practice; since those definitions are needed in more than one module, they need to be pulled out into their own module, to eliminate the redundancy.  This is easily done!  If you don't do this, then you are just setting yourself up for extra work later if you ever need to change the definition, and you'll have major, hard-to-debug problems if one version of the definition gets changed and you forget to make the same change in the other one!  So, I created a new module "timeval.h," included by both tcdp_driver.h and tsdp_driver.h, and moved the DoubleWord and time_val typedefs into there.

The TSDP_status data type looks useful.  I added a comment pointing out that the pulse count will roll over every 20 days.  I also renamed the struct to tsdp_status (all lowercase) to distinguish it from the typedef.

There seems to be a redundancy between the extern int timing_pulse_count and the pulse_accum field in the TSDP_status structure.  I'm not sure yet why both are needed, or where TSDP_status is used.  We'll see. - NOTE ADDED LATER:  It looks like neither of these is even used anywhere yet!

tsdp_init() declaration looks useful.
tsdp_reset() looks useful.
removed obsolete comments about icdp_have_data() and icdp_notify().
tsdp_run(), _pause(), _handle_have_data(), and _handle_sync_error() declarations all look fine.

Now going through tsdp_driver.c.

Again there is inconsistency between the "TS" and "TC" names; changing all to "TS".

report_buf_full is commented out, but it should have been replaced with something like report_sync_err to ensure that sync errors get reported to the server.

get_next_word() is mostly repeated code from icdp_driver.c.  If we are really going to use the same interface to both datapaths, it might make sense to try to abstract out the common functionality to reduce code size.  However, doing this gets a bit tricky since each datapath has its own control bits.  You'd have to do a parameterized macro, or pass additional arguments, or do a class hierarchy.  Probably best to leave it as-is for now.

In get_next_word(), the parens were missing after the invocation of the HAVE_DATA macro.  These are needed to convey that it is a function-like macro and not a constant.  Fixed that.

get_longword() is completely identical, textually, to the one in icdp_driver.c, and this common functionality should probably be abstracted away.  E.g., you could pass a function pointer argument pointing to the appropriate get_next_word() method to use.  But really, the "right" approach to express the shared functionality here would be to use a class hierarchy, with different classes representing the different types of datapaths, in which shared methods like get_longword() could be inherited from a common base class, while things that are different like bit numbers, base addresses, etc. could live in a more specific derived class.  However, this would represent a major refactoring of the code, and is probably overkill for now.

tcdp_pull_pulse() is completely wrong, because it is written as if it were pulling an entire pulseform with a variable number of levels crossed, whereas really, all we need is a single leading-edge time.  This kind of simplification was the whole point of creating the new datapath; otherwise, we could have just added another channel to the existing datapath!  This goes along with my earlier critique of a few days ago, which is that I had asked (& expected) the students working on the gelware to simplify the new timing-sync datapath to remove all the unnecessary information from it, including the number of levels crossed.  Also, this C code doesn't even match what the datapath actually does, because the datapath only provides the leading-edge time, and this code is looking for both leading- and trailing-edge times.

I don't know why it's so hard to get students to just THINK, and do what makes sense!  This goes for both the gelware and firmware groups!  Designing the new datapath was supposed to have been an EASY task, certainly it was easier than designing the old, more complicated one was to begin with!  All they had to do was to figure out what kinds of simplifications were appropriate to the new timing-sync application, in contrast to the pulse-form capture application needed for the PMT input channels, and then DO those simplifications!  ALL of them!

Going forwards, we have a couple of choices here.  We can (a) revise the datapath gelware to get rid of the extra unnecessary information (number of levels crossed, which is always 1), or (b) leave that garbage there, and just skip over and ignore it (or else verify that's it's 1, as a sort of pointless error-check) in the C code.  In either case, the C code has to be changed to no longer pull the falling edges, since they are not present, and not relevant.  Also, the result should not even go into a PulseForm data structure, since that is inappropriate for the timing-sync pulses.  PulseForm wasn't even defined in tsdp_driver.h (you had commented it out, and I deleted it), nor does it belong in there!  Ugh.

In tsdp_notify()... WTF?  Why the heck are you still doing pb_add()?  That was only relevant for the actual cosmic-ray pulse data, which needs to be buffered in RAM so that it can be streamed out to the server!  There are too many timing-sync pulses (2,000/sec.) to possibly do that with them!  If you put them in the RAM buffer, you'll just quickly fill up the buffer!  And they aren't even the same kind of information as the cosmic ray pulses, so it wouldn't make sense to mix them into the pulse buffer anyway!  You need to THINK about what this timing-sync data is for, and use it for that!!!

Finally, you commented out tsdp_handle_buf_full()... It's true that you don't have BUF_FULL interrupts per se, since there is no FIFO buffer any more, but in its place, you have the SYNC_ERROR interrupt, for if the datapath gets stalled and loses a pulse!  You need to do something about that!

In interrupt_timing.c - you didn't define a separate INTERRUPT_MASK for the tsdp!  You are still using the one for the original icdp, which is not applicable to the new PIO!

in tsdp_isr(), cur_mask is not needed at all, since you only have one HAVE_DATA bit.

In conclusion: The job of updating the firmware to integrate support for the new timing-sync datapath is not finished by far, and substantial changes are still needed before the code will be complete, apparently correct, and ready for testing.  I made a few minor corrections as I went along in the code review, but the major changes still need to be done.  Here is a list of the most important changes needed:
  • [ ] Ideally, the gelware should be changed to remove all of the "number of thresholds crossed" stuff, which is completely unnecessary and inappropriate to even include in the timing-sync edge-capture datapath.
  • [ ] tsdp_pull_pulse() needs to be changed to do what's appropriate for the new datapath.
  • [ ] tsdp_notify() needs to USE the received timing-sync edge information in a sensible way.  For example, remember the data for the most recent pulse, together with the number of timing-sync pulses received, and (elsewhere in the code) tag each individual cosmic-ray pulse with this information as it comes in.
  • [ ] Sync errors need to be handled appropriately.  Presently they are not handled at all.
If we are going to meet our goal of having the timing-sync datapath fully integrated & tested by the end of the month, these code changes need to be completed within the next 3 working days (by Wednesday) so that I can have a few more days to review, test & debug if necessary.  Students: If you can't complete all these changes by end of Wednesday, I will have to go ahead and do them myself.  Let me know ASAP which it will be.  We have no more time to wait.  If necessary, ask questions if you don't understand what you need to do.  

    No comments:

    Post a Comment