[syslinux] tftpd: don't use AI_ADDRCONFIG to resolve addresses to bind(2)

Uwe Kleine-König uwe at kleine-koenig.org
Sun Feb 5 12:56:23 PST 2017


Hello Ron,

On Sun, Feb 05, 2017 at 05:05:16PM +1030, Ron wrote:
> On Sat, Feb 04, 2017 at 11:28:08PM +0100, Uwe Kleine-König wrote:
> > On Sat, Feb 04, 2017 at 03:46:46PM +1030, Ron wrote:
> > > > [...]
> > > That would seem to be a pretty good summation of how we're failing to
> > > converge here ...
> > 
> > I mixed too many things that IMHO improve the code but actually only
> > care about one of those. So I suggest we restart the discussion with
> > focusing on that one thing only.
> 
> Just repeating the same things, while ignoring the options I've shown
> you that do properly fix the problem(s) you're claiming to care about,
> isn't actually advancing this toward a workable solution in any way.

Note I repeated less than before. I hope this will simplify the
discussion and stop both of us arguing about stuff that doesn't matter
much.

> My previous replies to you were already focussed on the part of your
> patch that removed AI_ADDRCONFIG, and why it was not needed at best,
> and harmful at worst.

Yes, you told me in the situations you care about the modification
doesn't help you. I seem to care about different situations where the
patch is beneficial. So if the patch doesn't make things worse for you
(and all others out there) and it improves the situation for me, IMHO we
should apply the patch.

I didn't understand yet, in which situations it is harmful as you claim
above. The best claim matching this is: It papers over other problems.
Is it that why you want to keep AI_ADDRCONFIG? I don't understand though
what this buys for you. Consider you have a network problem on the
machine tftpd is supposed to run at. The result is that eth0 doesn't
have an ipv4 address. You notice that because a tftp-client tries to
contact the tftpd server and doesn't get an answer. So what to do next?
I assume your next step is logging into the tftpd-machine and check if
tftpd is running. (If instead you try ping $tftpdmachine, or check the
network config of the tftpd-machine, it doesn't actually help you that
tftpd isn't running as it is already obvious then that the network
configuration is at fault and not tftpd.) You see it doesn't run, check
the log and see

	cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not known

. Is it obvious now what the problem is? Maybe yes if the network is
still broken. But if not, it's harder to understand.

If instead tftpd would have started successfully, you see this after
login and the IMHO obvious next step is to check the network config.
If you ask me, that's not any harder to debug.

> I can read the actual code, and understand how gai works, and I'm pretty
> sure Mike understood all of that too when he first reported this bug.

I'm not sure about Mike, but that doesn't matter here.

> I'd already long ago checked that there wasn't a real bug being triggered
> somewhere here, and that the code itself really was working as expected,
> and you haven't indicated anything to the contrary here.

I did. I showed two examples where the use of AI_ADDRCONFIG breaks more
than necessary or expected.

> In the subset of cases where gai is used to resolve a string into a
> (set of) network address(es), it is not wrong to tell it that it's
> useless to return any address (family) which can't possibly work with
> the current machine configuration/state.

Using AI_ADDRCONFIG suppresses options that actually work. After all I
can bind to 0.0.0.0 even if no interface but lo has an ipv4 address.
After that I can connect to 127.0.0.1:69 and talk to tftpd. Or I can
hotplug a network interface, configure that and talk to tftpd from a
remote machine.

I understood that none of these is a scenario you depend on. But with
your refusal to see this patch as useful you assume that the above is
not a use case for anybody.

> That doesn't become less wrong if your expectation or interpretation of
> how it should work is different to reality and the specification of how
> it is defined to work.  If you explicitly say "bind to 0.0.0.0", you're
> saying you want to service global IPv4 requests.

The semantic of binding to 0.0.0.0 is not only "serve on global ipv4
addresses that currently exist". The semantic also includes: "serve on
lo" and "serve on interfaces that come up after the socket is bound".

> If you have no global IPv4 interfaces, that should fail and warn the
> admin of a problem, not silently ignore that what they explicitly
> requested isn't going to work.

The admin has a problem if a server that is supposed to serve files via
tftp to the world doesn't have a ipv4 address. That alone is a problem
big enough to notice. It doesn't help the admin that tftpd isn't running
or there is a cryptic error message in the log. If the network interface
is ready only later in the boot process everything works as expected as
soon as the interface is up. You say it's bad in this situation that the
admin doesn't notice there is a problem. I wonder if it's not a valid
approach to claim that there isn't a real problem. After all the world
can access the files now.

> If what you meant to request was "bind to whatever *is* there, I don't
> care what", then just don't pass an explicit address.
> 
> If you're allergic to that possibly also binding to IPv6 addresses,
> then pass the -4 flag too.

Do you consider it obvious that

	tftpd -a 0.0.0.0

is semantically different to

	tftpd -4

? I don't. Sure, I can change my setup accordingly now that I know. But
this feels much more like a kluge than making the two commands above
behave identically. And sure, we can even document this behaviour and
claim it to be a feature. But that's not intuitive and I bet people will
stumble about this in the future and wonder.

> We don't need to disable the sanity check for users who do configure
> an explicit address for you to get the behaviour you say you want
> from the current code, without any change to it at all.

I failed to understand this. If someone uses:

	tftpd -a 1.2.3.4

my patch doesn't make tftpd magically work. The difference is that tftpd
in the "no ipv4 address available" case with my patch says:

	failed to bind to 1.2.3.4

instead of

	cannot resolve local IPv4 bind address: 1.2.3.4

without the patch. I say this is an improvement.
Maybe I didn't understand you correctly here.

> If what you care about is "faster boot", then the answer to that isn't
> "speculatively start things that will fail (or just be useless) if they
> lose a race", it's to actually not waste time and resources doing that
> until their known prerequisites have been satisfied.

tftpd when started with -a 0.0.0.0 with eth0 coming up later isn't
failing or useless. It does exactly what it is supposed to do when
AI_ADDRCONFIG is dropped. That is, tftpd binds to 0.0.0.0.

> Where it is expected that the available interfaces and/or configured
> addresses might change dynamically long after boot, then whatever you
> have doing that needs to notify, or reload, or (re)start, (or stop),
> anything you are or want to be (not) running, based on that new state
> of the system.

If you enlarge the problem I want to solve, my patch isn't suitable any
more. That is applicable to all improvements and so it's not sensible to
use it against an improvement. So it also wasn't helpful to retitle the
Debian bug to 'please add a restart hook for hotplugged interfaces' from
'tftpd-hpa fails to start properly if network is unavailable'. If
handling hotplugging is your wish, I suggest opening a separate bug for
that without highjacking this one.

Making a software behave better in some situations without making it
worse in any other is a good enough reason to consider the change. That
is still true even if after patch application there are still some
situations that could be improved.

> If you want that, propose a patch to add a hook.  Otherwise, what you
> say you want is already possible without needing the kludge you did
> send.
> 
> The only bit I'm still struggling to understand, is why you are still
> pushing this patch hard instead of using what is already available that
> has exactly the behaviour you say you desire, or looking at a more
> complete and working solution for dynamic interfaces in general.

I do this because tftpd doesn't behave as I expect. I struggled because
I didn't understood what the error message want to tell me. And I assume
that others struggle in the same way in this situation. And then there
is that idea of open source, where you can actually improve the stuff
you work with ... So my suggested patch might eventually save others
spending time understanding tftpd-hpa as it doesn't fail for them with
strange error messages.

> The options here and the actions of the code don't look very complicated
> to me, so you don't need to "simplify" it on my behalf.  I'm just not
> seeing you show any new problem that isn't contrived and that doesn't
> already have a good and/or already working solution which doesn't depend
> on needing this patch.
> 
> What problem isn't satisfied by the options I've shown above and earlier?

It makes the behaviour of tftpd match what I expect. And I believe
others expected (or will expect) the same.

I think it doesn't make sense to continue discussing between the two of
us. If this mail isn't enough for you to see my point I suggest we try
to find a few other people (ideally hpa among them) to give their
opinion. I added the people that commented the Debian bug so far
explicitly to Cc:. Maybe you can speak up. (If you're missing context,
reading https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=771441#161
should be enough.)

Best regards
Uwe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://www.zytor.com/pipermail/syslinux/attachments/20170205/8139f98b/attachment.sig>


More information about the Syslinux mailing list