[syslinux] truncated files on write with tftpd-hpa

Thu May 1 15:52:46 PDT 2003

I've been seeing a problem lately that I thought I'd run by this list to
see if it's familiar to anyone.

We've been running tftpd from tftp-hpa (at first version 0.29, but
recently version 0.34) on DigitalUNIX 4.0F, serving files off of an NFS
filesystem (from a very busy server).

The daemon is started as:
    /noc/bin/tftpd -v -p -l -m /noc/etc/tftp.filename-translations \
        -u noc-tftp -s /noc/tftp

Occasionally, when one of our switches uploads its config with TFTP, we
end up with a 0-byte file.

In syslog, we see something like this:

| May  1 16:23:09 [server] tftpd[14250]: WRQ from [client] filename switch.config remapped to /switch.config
| May  1 16:23:10 [server] tftpd[14255]: WRQ from [client] filename switch.config remapped to /switch.config
| May  1 16:23:10 [server] tftpd[14255]: tftpd: read: Connection refused

If I watch the file while this is happening, I can see it grow from 0
bytes to some size, then get truncated back to 0.

I don't know too much about TFTP, but my theory as to what is happening
here is this:  something (probably NFS) is making the tftp daemon slow
to respond to the first WRQ, so the switch sends another one.  In the
meantime, the tftp daemon responds to the first WRQ, opens the file,
sends an ACK, and the switch sends its data.  Then, the daemon responds
to the second WRQ, opens the file, sends an ACK, but gets an error
(because the switch has already finished the transfer) and closes the
file.  Unfortunately, the opens are done with O_TRUNC, so the second
open truncates the file.

A tcpdump seems to support this idea:

| 17:23:07.090411 [client].1036 > [server].69: 29 WRQ "widener-le-sw.config"
| 17:23:08.532809 [client].1036 > [server].69: 29 WRQ "widener-le-sw.config"
| 17:23:09.451769 [server].1334 > [client].1036: udp 4
| 17:23:09.458605 [client].1036 > [server].1334: udp 516
| [...]
| 17:23:09.634386 [client].1036 > [server].1334: udp 516
| 17:23:09.634386 [server].1334 > [client].1036: udp 4
| 17:23:09.643175 [client].1036 > [server].1334: udp 174
| 17:23:09.661730 [server].1334 > [client].1036: udp 4
| 17:23:10.651003 [server].1335 > [client].1036: udp 4
| 17:23:10.656862 [client] > [server]: icmp: [client] udp port 1036 unreachable

Does that sound reasonable, or does anyone have some other explanation?
Has anyone seen this behavior before?

It seems to me like the problems I'm seeing could be avoided on the
tftpd side in one (or both) of two ways:
    1) not truncating the file until the first data packet is received
    2) not responding to a second WRQ with the same host+port within some
period (since the TIDs should be chosen such that they are unlikely to
repeat, if I read the RFCs correctly).

But, like I said, I'm no TFTP or tftp-hpa expert, which I why I'm
emailing the SYSLINUX list.

--Alan Sundell