Our cluster contains several hundreds nodes around the world, and to make a cluster out of these we place certain connectivity requirements to join a node.
The nodes are mostly connected to Science DMZ with 10G-100G speeds. To utilize this speed, they should support jumbo frames (9000 MTU) to all other nodes in the cluster. Also ScienceDMZ assumes the nodes are sitting close to the border router, outside of firewalls and ACLs that significantly lower connection speed and also add restrictions that can make a node unusable to us.
Kubernetes is building a virtual (overlay) network on top of physical network, which makes all nodes see each other directly vis established TCP (IPIP) tunnels. Also many services are using ports on nodes. Because we have so many locations, we can’t work with each campus to handle the network policy for us. Instead, we ask the local admins to open the node completely to the world, and use the cluster-wide Network Policies provided by calico network plugin to handle the firewall.
Calico Network Policies
Calico is the network plugin we use for Kubernetes. It provides the GlobalNetworkPolicy resource to manage the preNAT network policy on hosts in a way that allows us to protect the hosts from the world, while still allowing all needed connectivity between the hosts. With some exceptions, hosts will have a few ports and ICMP open to the world, and only allow full connectivity between the cluster hosts defined by our networkset. Because the cluster is constantly changing, it’s not possible to manage the local university policies by sending emails to each admin, and central management is the only way that works.
You can review the current policy applied to the cluster. It adds the IPTables rules to the nodes to filter the incoming traffic, additional to multiple other rules that kubernetes uses to route packets between hosts. This is often incompatible with manual iptables rules or firewalld enabled on the node, and because of that we ask to turn those off.