v0.0.1

2023-08-11 13:35:34 +02:00 · 2023-08-11 13:35:34 +02:00 · b073feaf9e
parent 89b710befc
commit b073feaf9e
4 changed files with 216 additions and 1 deletions
--- a/README.md
+++ b/README.md
@ -1,3 +1,97 @@
 # vx46

-VXLAN translator between many IPv4-only endpoints and a single IPv6-only endpoint
+`vx46` translates **statelessly** between many IPv4-only endpoints and a single IPv6-only VXLAN endpoint.
+
+## Problem statement
+
+VXLAN has support for one-to-many communication on a single interface and some limited support for learning new endpoint addresses automatically.
+In addition to that it works over UDP and can therefore be used to build layer 2 networks over the internet.
+
+However, VXLAN's data center origin places some restrictions on its usage, especially with IPv4-only endpoints behind NAT:
+
+1. Source port randomization
+
+    This is a performance optimization that allows spreading packets over multiple links over an infrastructure with multipath routing and is enabled by default.
+    While it is possible for a client endpoint behind NAT to reach another endpoint without firewall or NAT this way, the reverse path won't have a proper entry in the NAT table and packets can't make it back.
+    This is also an issue for IPv6 clients behind stateful firewalls.
+    In both cases ports will only be forwarded once a packet is sent behind the firewall or NAT and a entry in the corresponding table could be made.
+    But since source ports are randomly chosen and destination ports are fixed, there will always be entries missing for the firewall for NAT.
+
+2. Fixed destination port
+
+    This is the main restriction that prohibits running multiple clients behind NAT.
+    Since the clients need to share a single IP, they are differentiated by the port.
+    While NAT implementations try to keep the port numbers, when connecting to a common service, the only differentiator left is the source port and therefore will be rewritten by the translation.
+    But since VXLAN endpoints statically configure a destination port instead of learning it from incoming packets, answers will not travel the exact way back where they originated and therefore not be able to be translated back by NAT.
+
+## Solution
+
+The solution for the 1. problem is relatively easy.
+It is possible to restrict the source port to a single port by passing the `srcport 8472 8473` option when creating the tunnel interface.
+The `srcport` option expects 2 port numbers that form an interval from which the source port is randomly chosen.
+The lower bound is inclusive, the upper bound is exclusive.
+
+The 2. problem is not as easily solved with the standard tools.
+There is the option to add VXLAN endpoints manually *and* configure the destination port with the `bridge fdb add dev $vxlan ... dst $clientip port $dstport` command.
+This, however, means that an external control plane is necessary to learn new endpoints and the built-in learning mechanism can't be used.
+See the next section on how embedding connection information within an IPv6 address can be used to exploit the address learning for IPv4 clients behind NAT.
+
+## Theory of operation
+
+Linux' VXLAN implementation is able to automatically learn new endpoint **addresses**, but not **ports**.
+In practice this not an issue for IPv6 clients, because NAT66 is thankfully not a widespread issue and therefore it is possible to assume that ports are left untouched when connecting via IPv6.
+
+If we assume an IPv6 server and provide a translation mechanism (`vx46`!) that embeds the necessary connection information of an IPv4 endpoint **within an IPv6 address**, the IPv6 server VXLAN endpoint can learn everything necessary to form a connection.
+
+### IPv4 -> IPv6 translation
+
+The key insight is that all the connection information of an IPv4 endpoint, source IPv4 and port, can be encoded in the lower parts of an IPv6 address.
+On an incoming IPv4 UDP Packet, `vx46` will do the following:
+
+1. Extract the source IPv4 and source port
+2. Construct an IPv6 address from the 3 following parts:
+    1. The first 10 bytes of a `/80` prefix which is available and routed to the translator
+    2. The 4 bytes of the client's IPv4 address
+    3. The 2 bytes of the client's source port
+3. Send a packet to the upstream IPv6 VXLAN endpoint using the constructed IPv6 source address and the payload of the incoming IPv4 packet which included all the VXLAN tunnel data
+
+### IPv6 -> IPv4 translation
+
+On each incoming IPv6 UDP Packet, `vx46` will do the following:
+
+1. Record which **destination** IPv6 was used
+2. From that, extract bytes 10-14 as the original client IPv4 and bytes 14-16 as the client port
+3. Send an IPv4 packet back to the original client IPv4 and port number along with the payload from the incoming IPv6 packet
+
+## Running `vx46`
+
+### Requirements
+
+1. Isolation from the upstream IPv6 tunnel endpoint
+
+    Currently `vx46` uses a single wildcard socket that binds on the VXLAN port.
+    It will therefore overlap with the port of the actual tunnel endpoint and therefore needs to run on a separate machine or isolated with other means (VRF, network namespaces, container...).
+
+2. Reachable port for IPv4
+
+    How else are the clients supposed to connect?
+
+3. An unused IPv6 `/80` prefixed that is routed to the machine
+
+    Addresses in this range are used for communicating with the upstream IPv6 VXLAN endpoint.
+    The lower bytes in the range are used to embed the IPv4 client connection information (IPv4 address and port number)
+
+4. Enable IPv6 non-local binds
+
+    This allows `vx46` to send and receive packets from addresses that aren't configured explicitly on any interface (otherwise that would be a lot of addresses to configure, around $2^{(32+16)}$)
+    ```sh
+    ip route add local $prefix/80 dev lo
+    sysctl -w net.ipv6.ip_nonlocal_bind=1
+    ```
+5. An upstream IPv6 VXLAN endpoint to forward packets to
+
+### Example
+
+```sh
+vx46 -prefix $prefix -upstream $upstream
+```
--- a/go.mod
+++ b/go.mod
@ -0,0 +1,7 @@
+module git.freifunk-franken.de/jkimmel/vx46
+
+go 1.20
+
+require golang.org/x/net v0.14.0
+
+require golang.org/x/sys v0.11.0 // indirect
--- a/go.sum
+++ b/go.sum
@ -0,0 +1,4 @@
+golang.org/x/net v0.14.0 h1:BONx9s002vGdD9umnlX1Po8vOZmrgH34qlHcD1MfK14=
+golang.org/x/net v0.14.0/go.mod h1:PpSgVXXLK0OxS0F31C1/tv6XNguvCrnXIDrFMspZIUI=
+golang.org/x/sys v0.11.0 h1:eG7RXZHdqOJ1i+0lgLgCpSXAp6M3LYlAo6osgSi0xOM=
+golang.org/x/sys v0.11.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
--- a/main.go
+++ b/main.go
@ -0,0 +1,110 @@
+package main
+
+import (
+	"encoding/binary"
+	"flag"
+	"fmt"
+	"log"
+	"math"
+	"net"
+	"net/netip"
+
+	"golang.org/x/net/ipv6"
+)
+
+func vx46(natprefix netip.Addr, upstreamAddr netip.Addr, port uint16, mtu uint16) error {
+	upstream := netip.AddrPortFrom(upstreamAddr, port)
+	p, err := net.ListenUDP("udp", &net.UDPAddr{Port: int(port)})
+	if err != nil {
+		return err
+	}
+
+	// we need to know to which address incoming ipv6 packets were sent to
+	// extract the client ipv4 and port
+	ipv6.NewPacketConn(p).SetControlMessage(ipv6.FlagDst, true)
+
+	defer p.Close()
+
+	var b [math.MaxUint16]byte
+	var oob [20480]byte // from /proc/sys/net/core/optmem_max
+	for {
+		n, oobn, _, ingressSrcAddrPort, err := p.ReadMsgUDPAddrPort(b[:], oob[:])
+
+		cm := ipv6.ControlMessage{}
+		if err := cm.Parse(oob[:oobn]); err != nil {
+			return err
+		}
+
+		ingressDstAddr, ok := netip.AddrFromSlice(cm.Dst)
+		if !ok {
+			continue
+		}
+
+		var ingressDstAddrPort netip.AddrPort
+		ingressDstAddrPort = netip.AddrPortFrom(ingressDstAddr.Unmap(), port)
+		ingressSrcAddrPort = netip.AddrPortFrom(ingressSrcAddrPort.Addr().Unmap(), ingressSrcAddrPort.Port())
+
+		var egressDstAddrPort netip.AddrPort
+		var egressOOB []byte
+
+		if ingressSrcAddrPort.Addr().Is4() {
+			// embed the "client" ipv4 into the src address for the packet to the upstream vxlan ipv6 endpoint
+			// the destination is the upstream vxlan endpoint
+			inaddr4 := ingressSrcAddrPort.Addr().As4()
+			egressSrcAddr := natprefix.As16()
+			copy(egressSrcAddr[10:14], inaddr4[:])
+			binary.BigEndian.PutUint16(egressSrcAddr[14:16], ingressSrcAddrPort.Port())
+
+			cm := ipv6.ControlMessage{Src: net.IP(egressSrcAddr[:])}
+			egressOOB = cm.Marshal()
+			egressDstAddrPort = upstream
+		} else {
+			inaddr6 := ingressDstAddrPort.Addr().As16()
+			egressDstAddrPort = netip.AddrPortFrom(
+				netip.AddrFrom4([4]byte(inaddr6[10:14])), // extract client ipv4
+				binary.BigEndian.Uint16(inaddr6[14:16]),  // extract client port
+			)
+		}
+
+		outn, outoobn, err := p.WriteMsgUDPAddrPort(b[:n], egressOOB, egressDstAddrPort)
+		if err != nil {
+			return err
+		}
+		if outn != n {
+			return fmt.Errorf("dropped bytes: %d sent of %d", outn, n)
+		}
+		if outoobn != len(egressOOB) {
+			return fmt.Errorf("dropped oob bytes: %d sent of %d", outoobn, len(egressOOB))
+		}
+	}
+}
+func main() {
+	natprefixStr := flag.String("prefix", "", "local IPv6 base address for a /80 to use for communication with upstream")
+	upstreamStr := flag.String("upstream", "", "IPv6 address of the upstream VXLAN endpoint")
+	portInt := flag.Uint("port", 8472, "port for vxlan communication")
+	mtuInt := flag.Uint("mtu", 1422, "buffer size")
+
+	flag.Parse()
+
+	natprefix, err := netip.ParseAddr(*natprefixStr)
+	if err != nil {
+		log.Fatalf("Invalid prefix: %s", err)
+	}
+	upstream, err := netip.ParseAddr(*upstreamStr)
+	if err != nil {
+		log.Fatalf("Invalid upstream: %s", err)
+	}
+	if *portInt > math.MaxUint16 {
+		log.Fatalf("port out of range: %d", *portInt)
+	}
+	port := uint16(*portInt)
+
+	if *mtuInt > math.MaxUint16 {
+		log.Fatalf("mtu out of range: %d", *portInt)
+	}
+	mtu := uint16(*mtuInt)
+
+	if err := vx46(natprefix, upstream, port, mtu); err != nil {
+		log.Println(err)
+	}
+}