|Johannes Kimmel e61557048b
vx46 translates statelessly between many IPv4-only endpoints and a single IPv6-only VXLAN endpoint.
VXLAN has support for one-to-many communication on a single interface and some limited support for learning new endpoint addresses automatically. In addition to that it works over UDP and can therefore be used to build layer 2 networks over the internet.
However, VXLAN's data center origin places some restrictions on its usage, especially with IPv4-only endpoints behind NAT:
Source port randomization
This is a performance optimization that allows spreading packets over multiple links over an infrastructure with multipath routing and is enabled by default. While it is possible for a client endpoint behind NAT to reach another endpoint without firewall or NAT this way, the reverse path won't have a proper entry in the NAT table and packets can't make it back. This is also an issue for IPv6 clients behind stateful firewalls. In both cases ports will only be forwarded once a packet is sent behind the firewall or NAT and a entry in the corresponding table could be made. But since source ports are randomly chosen and destination ports are fixed, there will always be entries missing for the firewall for NAT.
Fixed destination port
This is the main restriction that prohibits running multiple clients behind NAT. Since the clients need to share a single IP, they are differentiated by the port. While NAT implementations try to keep the port numbers, when connecting to a common service, the only differentiator left is the source port and therefore will be rewritten by the translation. But since VXLAN endpoints statically configure a destination port instead of learning it from incoming packets, answers will not travel the exact way back where they originated and therefore not be able to be translated back by NAT.
The solution for the 1. problem is relatively easy.
It is possible to restrict the source port to a single port by passing the
srcport 8472 8473 option when creating the tunnel interface.
srcport option expects 2 port numbers that form an interval from which the source port is randomly chosen.
The lower bound is inclusive, the upper bound is exclusive.
The 2. problem is not as easily solved with the standard tools.
There is the option to add VXLAN endpoints manually and configure the destination port with the
bridge fdb add dev $vxlan ... dst $clientip port $dstport command.
This, however, means that an external control plane is necessary to learn new endpoints and the built-in learning mechanism can't be used.
See the next section on how embedding connection information within an IPv6 address can be used to exploit the address learning for IPv4 clients behind NAT.
Theory of operation
Linux' VXLAN implementation is able to automatically learn new endpoint addresses, but not ports. In practice this not an issue for IPv6 clients, because NAT66 is thankfully not a widespread issue and therefore it is possible to assume that ports are left untouched when connecting via IPv6.
If we assume an IPv6 server and provide a translation mechanism (
vx46!) that embeds the necessary connection information of an IPv4 endpoint within an IPv6 address, the IPv6 server VXLAN endpoint can learn everything necessary to form a connection.
IPv4 -> IPv6 translation
The key insight is that all the connection information of an IPv4 endpoint, source IPv4 and port, can be encoded in the lower parts of an IPv6 address.
On an incoming IPv4 UDP Packet,
vx46 will do the following:
- Extract the source IPv4 and source port
- Construct an IPv6 address from the 3 following parts:
- The first 10 bytes of a
/80prefix which is available and routed to the translator
- The 4 bytes of the client's IPv4 address
- The 2 bytes of the client's source port
- The first 10 bytes of a
- Send a packet to the upstream IPv6 VXLAN endpoint using the constructed IPv6 source address and the payload of the incoming IPv4 packet which included all the VXLAN tunnel data
IPv6 -> IPv4 translation
On each incoming IPv6 UDP Packet,
vx46 will do the following:
- Record which destination IPv6 was used
- From that, extract bytes 10-14 as the original client IPv4 and bytes 14-16 as the client port
- Send an IPv4 packet back to the original client IPv4 and port number along with the payload from the incoming IPv6 packet
Isolation from the upstream IPv6 tunnel endpoint
vx46uses a single wildcard socket that binds on the VXLAN port. It will therefore overlap with the port of the actual tunnel endpoint and therefore needs to run on a separate machine or isolated with other means (VRF, network namespaces, container...).
Reachable port for IPv4
How else are the clients supposed to connect?
An unused IPv6
/80prefixed that is routed to the machine
Addresses in this range are used for communicating with the upstream IPv6 VXLAN endpoint. The lower bytes in the range are used to embed the IPv4 client connection information (IPv4 address and port number)
Enable IPv6 non-local binds
vx46to send and receive packets from addresses that aren't configured explicitly on any interface (otherwise that would be a lot of addresses to configure, around
ip route add local $prefix/80 dev lo sysctl -w net.ipv6.ip_nonlocal_bind=1
An upstream IPv6 VXLAN endpoint to forward packets to
vx46 -prefix $prefix -upstream $upstream