[syslinux] Bug#771441: [PATCH tftpd-hpa] tftpd: don't use AI_CANONNAME and AI_ADDRCONFIG to resolve addresses for bind

Ron ron at debian.org
Thu Feb 2 21:10:32 PST 2017


On Thu, Feb 02, 2017 at 02:22:47PM +0100, Uwe Kleine-König wrote:
> On Thu, Feb 02, 2017 at 04:08:49PM +1030, Ron via Syslinux wrote:
>
> > The use of AI_PASSIVE here is a placebo.  That flag has no effect unless
> > address was NULL, and if that was true, neither of the hunks here would
> > actually be executed in the first place.
> 
> Right. Today it only has an effect if the first argument to getaddrinfo
> is NULL. The intension (IIUC) is that it should be used when you plan to
> feed the result to bind (opposed to connect).

What do you mean by "Today"?  Both SuSv4 and the Linux man page are
unequivocal about the _only_ use of that flag being to special case
a NULL address (meaning 'this machine') to return either the wildcard
address or the LOOPBACK address.

That you'd use the wildcard address to bind a service to all addresses
of 'this machine', or the loopback address to connect to a service on
'this machine' is illustrative.  There's no deeper distinction or
fundamental difference related to what functions you might later pass
the address(es) you obtain to.


> > Using AI_CANONNAME here should be harmless at worst.  So the only actual
> > change is to drop AI_ADDRCONFIG - the flag which limits getaddrinfo to
> > returning only the address families that are actually supported by the
> > configured interfaces on the system.  And ordinarily that would seem to
> > be a fairly uncontroversially Good Thing to do, for both connecting and
> > listening sockets.
> 
> The downside of using AI_ADDRCONFIG is that it makes binding to 0.0.0.0
> (or ::) fail when no interface is up yet.

That seems to be where we disagree.  I don't see it as a 'downside' that
if you explicitly say "I want to bind to IPv4 addresses" (or IPv6), and
you don't actually have any, that this should fail early and loudly to
warn you about either misconfiguration, or some other more serious
failure, occurring.

If you passed a name instead of a numeric address, you'd only get the
address families the machine actually supported.  If you pass a numeric
address in a particular family, you get a sanity check that it's valid.

If you (personally) don't care about that, just don't pass an explicit
address.

The downsides of not using AI_ADDRCONFIG can't be remedied so easily.


> If we can agree in principle I can rework the patch to make one change
> per patch:

I'm far less concerned about the format of the patch, than the details
of what it's actually hoping to achieve.

If all of the reasons for doing this are just different ways of saying
"if we do less sanity checking, then misconfiguration and broken tools
won't annoy me as often" - that's not very compelling.

Doubly not so when there already is a way you can configure this for
your own use which does already bypass them.  Simply disabling that
for everyone and every configuration isn't a good answer.


> > So unless upstream sees this differently, I still think we'd need to see
> > some stronger rationale for why that isn't a Good Thing in this particular
> > case than just "Dropping that flag hides a real bug in NetworkManager".
> 
> This is not the (only) reason for me. This is mostly only how it showed
> up for some people, but still there are more IMHO good reasons to fix
> it:
> 
>  - inconsistent behaviour when no interface is up: -a 0.0.0.0 fails,
>    -a :: fails, not passing -a doesn't fail and makes tftpd bind to all
>    interfaces.

I don't see how that is 'inconsistent'?  If you ask for an explicit
address (family) you get a sanity check that (support for it) is
available.  If you say you don't care, then tftpd doesn't either.

>  - The "no interface is up" also happens with ifupdown with no auto
>    interface is used (only hotplug)

Not defaulting to auto (and systemd not respecting it for a while) were
both bugs that broke lots of services on lots of people's machine.

And I already explained to you in #d-d that the established 'solution'
for that, for services which don't monitor netlink events, is to add a
hook which restarts them when interfaces appear or disappear, if you
really want them to bind to interfaces which might genuinely be expected
to be hotplugged.

Because if you're not actually using wildcard addresses, then this will
_still_ fail, even with your patch, if that interface isn't already up.

>  - The "no interface is up" also happens if your laptop has no network
>    connection during boot

This doesn't need a link up state to work, it just needs a local
interface to be brought up with the needed address (family) assigned
to it.  If your only address is assigned by DHCP or something similar,
and you aren't blocking waiting on that - then as above lots of services
are going to fail to start if your boot sequence blindly tries to start
them anyway.

That isn't the fault of, or a bug in, those services.  Your system is
just misconfigured for what you actually want it to do.

>  - It's more robust to try what the admin requested. It is possible even
>    if no interface is up to bind to 0.0.0.0. So I suggest to do that and
>    not try to know it better than the admin.

See above.  Your patch is *taking away* the ability of the admin to
have the choice to specify exactly what behaviour they want ...

The behaviour that you want is already supported.  You just need to
explicitly "request" that, rather than redefine the historical
default to now mean that as well, taking away any other option that
some other admin might want to request.

>  - The error message
> 	cannot resolve local IPv4 bind address: 0.0.0.0, Name or service not known
>    is misleading.

Well, you can file that bug against glibc :)  We're just reporting
what getaddrinfo and gai_strerror returned ...


> > Because it could hide or introduce real problems in other cases too, and
> > if the bug in NM is fixed, then the only reason I'm so far aware of for
> > you proposing this patch (based on the discussion on #d-d) also goes away
> > too ...
> 
> See above. Which problems do you see introduced by my patch?

You're assuming that the only configuration anyone would want is to bind
to any address, without any checking of what address (families) are
available, even if the user explicitly specifies them - and that silently
ignoring any error with being able to do that is a Good Thing.

If you're running a toy service on your laptop, that might be ok.
If you're really using your laptop as a 'portable' server for some
special use case, you probably don't want it binding to random wifi
hotspots wherever you may go with it.
If you're running a Real Server, then silently ignoring real problems
is pretty much the opposite of what you'd want happening in most cases.

The lessons of:
https://tools.ietf.org/html/draft-thomson-postel-was-wrong-00
aren't entirely inapplicable here.


> IMHO "we don't do it right because it might paper over other problems"
> is a poor reason for not patching. ("I don't need seatbelt or a helmet
> because if my head gets hurt there is a different problem.")

I don't think defining "do it right" as "I'd rather you disable checking
in the software for everyone than configure it locally to do what I want"
is very helpful here.  Especially not if the only reason you want that
is because NM has a bug, or you don't want to configure it to properly
support using this service with hotplugged interfaces.

By your analogy, what you're saying with this patch is "I want you to
remove the seatbelt completely, even though I already have the option
to choose not to wear it myself".


> > Assuming that at some point the NM bug will be fixed, why would we still
> > want to make this change in this code?
> 
> See above. FTR: The NM bug is already reported upstream and a first
> (though not fixing) patch was suggested by them.

Cool, though it would be nice if there was a (possibly RC?) debian bug
also tracking that.

I think if you want to change the behaviour of this program, you're also
going to have to get your proposed patch past upstream before I apply it.
When I first adopted this package, the first thing I did was get all of
the patches we were carrying upstreamed, and I'm not keen to diverge
from them again over something like this.

Since they hadn't chimed in on it yet I've given you some feedback on
why _I_ don't think this is the right solution - but ultimately it is
them you'll need to convince otherwise, not me.

I am grateful for you digging into this, confirming that the core of
what is really making people most unhappy here is an NM bug, and
reporting that bug to them.  And as I said in #d-d, I will take a
patch to include an NM hook if you really do want support for NM
managed interfaces that really are expected to be hotplugged some
time long after boot.  If people really want support for late hotplug
in ifupdown we can add that too, but so far nobody has ever reported
that as being a problem for them ("use auto or expect pain is fairly
well known these days).


I do still consider the question of what the default config should
be an open one - it's something I inherited from the previous maintainer
too - but I'll follow up on that separately.  It's bad enough that we've
already got multiple issues conflated together in this one bug, so it
would be nice to at least still keep them to separate email (sub)threads.

  Cheers,
  Ron




More information about the Syslinux mailing list