Load balancing using nftables

Vasanth S
6 min readJan 10, 2021

The nftable is a successor to iptables and has an improved syntax. In this write-up we will see how to load balance two WAN connections using Linux machine (consider Raspberry Pi). Note that the Load balance in this article is for the Client and not server load balancing. Though the example mentioned here is using two WAN connections, it could very well be extended to any number of connections.

The key idea

We will use the dynamic marking of the packets to select the best outgoing route, and subsequently the connection track module will use the same route till the connection is ended.

Keywords

mark is an arbitrary number that resides only inside the kernel packet path, it never enters from outside nor it goes out in the packet flow

mark is per packet, and for all the packets belong to same flow if the same mark it has used then the mark has to be stored using connection track

In nftables the tables and chains can have any name, which is not the case of iptables. How an arbitrary name is used for association with its action is using the keyword ‘type’. For example:

table ip my_nat {
chain my_filter {
type filter hook prerouting priority -150; policy accept;
}
}

Prerequisites

A working Linux system with three interfaces, one for local network and two for WAN network. The interface can even be WiFi, however the configuration of the interface is not mentioned here, it’s simply three interfaces have to have IP assigned. Since LAN side interface is going to serve connections, probably its best to have fixed non routable IP address and DHCP server is running for the clients to connect it.

Install nftables, e.g sudo apt install nftables

Clean your existing iptables rules for all tables, e.g sudo iptables -F -t filter, sudo iptables -F -t nat, sudo iptables -F -t mangle

The three interfaces for the example are

eth1 — WAN connection 1, IP 192.168.0.109 default gw 192.168.0.1eth2 — WAN connection 2, IP 192.168.29.2 default gw 192.168.29.1eth0 — LAN connection, IP 172.17.0.1

The goal is to make optimal routing by dynamically choosing eth1 or eth2 per every new connection.

Deployment view

Setting up /etc/nftables.conf

nftable service will be using the file /etc/nftables.conf, which is were we will place our rules for policy based routing.

#!/usr/sbin/nft -fflush rulesettable ip my_nat {chain my_filter {
type filter hook prerouting priority -150; policy accept;
iif lo accept;

iifname eth1 jump my_input_public;
iifname eth2 jump my_input_public;
iifname eth0 ip daddr 172.17.0.0/16 jump local_sys;
meta mark set ct mark; #ct state new counter queue num 0 comment "Queue monitor must be running..."
ct state new meta mark set numgen random mod 10 map { 0-3: 100, 4-10: 101 } comment "Without Queue monitor..."

ct mark set meta mark;
counter comment "<- Pre routing";
}
chain my_input_public {
ct state {established,related} counter accept;
ct state invalid log level alert prefix "Incoming invalid:" counter drop;
ct state new log level alert prefix "Incoming:" counter drop;
}
chain local_sys {
ct state {established,related} counter accept
ct state invalid counter drop
ct state new log counter accept;
}
chain output {
type filter hook output priority filter;

ct state {established,related} counter accept
ip saddr 192.168.29.2 ct state new meta mark set 100 counter;
ip saddr 192.168.0.109 ct state new meta mark set 101 counter;
meta mark eq 0 ct state new meta mark set numgen random mod 2 map { 0: 100, 1: 101 } counter; # assign either one for locally initiated connections

ip daddr 127.0.0.1 ct state new meta mark set 50 counter;
ip daddr 172.17.0.0/16 ct state new meta mark set 50 counter;
ct mark set meta mark;
}
chain my_postrouting {
type nat hook postrouting priority -100;
ct mark set meta mark; counter comment "<- Post routing";
meta mark > 50 jump my_snat_postrouting;
counter comment "<- Post routing";
meta mark eq 50 accept;
log counter drop;
}
chain my_snat_postrouting {
counter comment "<- Out Post routing";
meta mark eq 100 counter;
meta mark eq 101 counter;
meta mark eq 100 snat to 192.168.29.2;
meta mark eq 101 snat to 192.168.0.109;
log counter drop;
#snat to mark map { 100 : 192.168.29.2, 101 : 192.168.0.109 };
}
}

The above configuration is mostly self explanatory :). However some items are worth mentioning:

  • The ‘chain my_filter’ is of type hook prerouting and it has to be noted that this chain is applicable for all the interfaces. So in the above example if eth1 interface receives an input packet it will still go through this chain.
  • In the above example the eth1 and eth2 were taken to a new chain and we are specially handling because its facing the public internet. What we are doing is that we accept only packets that belongs to an existing flow, which are initiated by us. So if any unsolicited connection request comes we drop it with logging, since the connection track state will be new.
  • In ‘chain my_filter’ note the use of ‘meta mark set ct mark’ without this, when a reply packet comes from outside server (say via eth1 interface), then the packet has to be loaded with the mark belong to the connection. So we are loading the mark for the packet from the connection track mark. Note that there is rule ‘ct mark set meta mark’ which stores the current mark into connection table mark which must be done for the first packet (although in above example its done for all packets due to laziness on my part)
  • The rule ‘numgen random mod 2 map { 0: 100, 1: 101 }’ means whenever it executes a random number will be generated and will be divided by 2 and the remainder will be taken as the index into the map ‘{ 0: 100, 1: 101 }’, so any time 100 or 101 will be returned
  • The rule ‘meta mark eq 50 accept’ means if the packet has the mark equal to 50 it will accept the packet and as accept is a terminal rule, the next rule in the chain will not be executed. If the mark is not equal to 50 then it will continue to the next rule without doing anything.
  • The keyword ‘counter’ is an extremely useful feature (at least for me which makes nftables favorite compared to iptables). By just adding counter to existing rule, at any time ‘ sudo nft list ruleset -n -a’ will give a clear view of what hits where.

Policy based routing

In addition to the above nftable setup, two separate IP routing table has to be defined. Each routing table will be having their own default gateway and each routing decision will be taken based on fwmark (other methods possible [1]) for the above configuration

# set up the default gateway, which will be used for filling the
# destination MAC address, i.e source MAC address will be device
# MAC address, but for remote servers the destination MAC will
# be filled with default gateway's MAC address and its the job of
# default gateway to transfer the packet to destination
ip route add table 100 default dev eth1 via 192.168.0.1
ip route add table 101 default dev eth2 via 192.168.29.1
# select table based on mark on the packet
ip rule add fwmark 100 table 100
ip rule add fwmark 101 table 101
# for locally initiated connections
ip rule add from 192.168.29.2 table 100
ip rule add from 192.168.0.109 table 101
# other configs
echo 1 > /proc/sys/net/ipv4/ip_forward
echo 0 > /proc/sys/net/ipv4/conf/all/rp_filter

Reference:

[1] https://tldp.org/HOWTO/Adv-Routing-HOWTO/index.html

[2] https://www.system-rescue.org/networking/Load-balancing-using-iptables-with-connmark/

--

--