The strange thing is, when this happens (this last time, at least) the board rebooted itself and opened new server connections... But by then, the DE3 board firmware was totally hung up (probably in the serial library).
Now, I could try to deal with this by adding capabilities to detect the serial peer going down, and try to buffer up data so that we can stream it out when the connection is re-established.... But (a) that's a lot of extra complexity, and (b) the Wi-Fi board shouldn't be rebooting itself in the first place.
What else can I try? I can't put the Wi-Fi board inside a shielded enclosure (in case RF noise is causing the problem), because that would block the Wi-Fi signal, since it uses an embedded antenna.
Perhaps it is cosmic rays causing the crashes? But it will be hard to block those, too...
The mystifying thing, though, is that this spontaneous rebooting never happened before, till just recently, which makes me think that some change to the Wi-Fi script is triggering it... I added a little bit of stuff to the script (to handle the new pass-thru commands to the CTU), but I doubled the main stack size in case it was overflowing, and that didn't help with the problem... And the auxilliary stacks should already be plenty big enough (1,000 entries).
Here are the stack sizes I am using now:
- AT+SET 42="256" - Program counter stack.
- AT+SET 40="1000" - Space for simple variable stack frames.
- AT+SET 41="1000" - Space for complex variable stack frames.
Aha! Some insights gained from watching the RX (data in) and RTS (flow control out) signals carefully on the scope:
- RTS is deasserted (raised) briefly after a fixed time delay after the end of a transmission; this is consistent with the EZURiO docs (this delay is set by the _UARTRCVTMO() function, and defaults to 255*4 = 1020 bit periods).
- It stays high for an amount of time that varies somewhat (possibly because of other threads) but is (normally) at least a certain minimum amount of time. This is also consistent with the EZURiO docs; this delay is set by the _UARTSLEEPCOUNT() function, and defaults to 400 bit periods.
- While RTS remains high (deasserted), usually the Nios UART core does not send any data - indicating that it is indeed paying attention to this signal. It waits until RTS goes low (is asserted) before sending data.
- However, occasionally (I saw this at least once) the Nios UART will have already started a transmission when RTS goes high, but does at least manage to turn it off shortly before the end of the sleep count.
- Finally, one time I happened to see the RTS glitch high for an extremely brief interval (possibly as small as one bit period) while the Nios UART was sending, and by the next second, the module had crashed.
This all makes sense, because this crashing problem started after I turned down the baud rate of the GPS->Nios connection, which resulted in more (& larger) gaps in the echoed data stream from the Nios->WiFi. The potential problem always existed, but it only began manifesting after these big gaps became present, since they allowed the possibility that the RTS might deassert at about the same time that the next data burst was starting.
This raises several alternative possibilities regarding now to proceed:
- Try turning down the baud rate of the Nios --> WiFi connection to match the rate (56,700) of the other connection - this should reduce the gaps in the data stream during which the RTS may possibly be raised.
- Try turning up the _UartSleepCount(), giving the Nios more time to respond to the raised RTS by halting the data flow. (However, not knowing exactly how the Wi-Fi board's receive buffer is working, I am uncertain whether this would really solve the problem.)
- Try turning down the _UartRcvTmo(), so that the RTS pulses will happen sooner, and hopefully not be as likely to overlap with the start of the next transmission burst. However, this seems like an unreliable method, and it may in fact lead to more RTS pulses (since more of the transmission gaps will be large enough), and more problems.
Emailed EZURiO to report this apparent firmware bug, so hopefully they can fix it in a future version of the firmware.
No comments:
Post a Comment