Stretching a layer 2 overlay across multiple KubeVirt clusters

This example demonstrates how to connect KubeVirt virtual machines running in different Kubernetes clusters to an L2 EVPN/VXLAN overlay using OpenPERouter, extending the KubeVirt single cluster example.

Overview #

The setup creates both Layer 2 and Layer 3 VNIs, with OpenPERouter automatically creating a Linux bridge on the host. Two KubeVirt virtual machines are connected to this bridge via Multus secondary interfaces of type bridge.

Architecture #

  • L3VNI (VNI 100): Provides routing capabilities and connects to external networks
  • L2VNI (VNI 110): Creates a Layer 2 domain for VM-to-VM communication
  • Linux Bridge: Automatically created by OpenPERouter for VM connectivity
  • VM Connectivity: VMs connect to the bridge using Multus network attachments

KubeVirt Multi Cluster L2 Integration

Example Setup #

Note: Some operating systems have their inotify.max_user_instances set too low to support larger kind clusters. This leads to:

  • virt-handler failing with CrashLoopBackOff, logging panic: Failed to create an inotify watcher
  • nodemarker failing with CrashLoopBackOff, logging too many open files

If that happens in your setup, you may want to increase the limit with:

sysctl -w fs.inotify.max_user_instances=1024

The full example can be found in the project repository and can be deployed by running:

make docker-build demo-multi-cluster

The example configures both an L2 VNI and an L3 VNI. The L2 VNI belongs to the L3 VNI’s routing domain. VMs are connected into a single L2 overlay. Additionally, the VMs are able to reach the broader L3 domain, and are reachable from the broader L3 domain.

Configuration #

1. OpenPERouter Resources #

Create the L3VNI and L2VNI resources (in each cluster):

apiVersion: openpe.openperouter.github.io/v1alpha1
kind: L3VNI
metadata:
  name: red
  namespace: openperouter-system
spec:
  vni: 100
  vrf: red
---
apiVersion: openpe.openperouter.github.io/v1alpha1
kind: L2VNI
metadata:
  name: layer2
  namespace: openperouter-system
spec:
  hostmaster:
    autocreate: true
    type: bridge
  l2gatewayip: 192.170.1.1/24
  vni: 110
  vrf: red

Key Configuration Points:

  • The L2VNI references the L3VNI via the vrf: red field, enabling routed traffic
  • hostmaster.autocreate: true creates a br-hs-110 bridge on the host
  • l2gatewayip defines the gateway IP for the VM subnet

2. Network Attachment Definition #

Create a Multus network attachment definition for the bridge:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: evpn
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "name": "evpn",
      "type": "bridge",
      "bridge": "br-hs-110",
      "macspoofchk": false,
      "disableContainerInterface": true
    }

3. Virtual Machine Configuration #

Create two virtual machines with network connectivity (one VM in each cluster):

This VM should be created in cluster A:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: vm-1
spec:
  runStrategy: Always
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-1
    spec:
      domain:
        devices:
          interfaces:
          - bridge: {}
            name: evpn
          disks:
          - disk:
              bus: virtio
            name: containerdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
        resources:
          requests:
            memory: 2048M
        machine:
          type: ""
      networks:
      - name: evpn
        multus:
          networkName: evpn
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.5.2
        name: containerdisk
      - cloudInitNoCloud:
          networkData: |
            version: 2
            ethernets:
              eth0:
                addresses:
                - 192.170.1.3/24
                gateway4: 192.170.1.1
        name: cloudinitdisk

This VM should be created in cluster B:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: vm-2
spec:
  runStrategy: Always
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-2
    spec:
      domain:
        devices:
          interfaces:
          - bridge: {}
            name: evpn
          disks:
          - disk:
              bus: virtio
            name: containerdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
        resources:
          requests:
            memory: 2048M
        machine:
          type: ""
      networks:
      - name: evpn
        multus:
          networkName: evpn
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.5.2
        name: containerdisk
      - cloudInitNoCloud:
          networkData: |
            version: 2
            ethernets:
              eth0:
                addresses:
                - 192.170.1.30/24
                gateway4: 192.170.1.1
        name: cloudinitdisk

VM Configuration Details:

  • Uses the evpn network attachment for bridge connectivity
  • Cloud-init configures the VM’s IP address and default gateway

Validation #

VM-to-VM Connectivity #

Test connectivity between the two VMs:

# Connect to VM console
virtctl console vm-1

# It may take about 2 minutes to start up and get to the login.
# Once it does, login with username "fedora" an password "fedora".

# Test ping to the other VM
ping 192.170.1.30

Expected output:

PING 192.170.1.30 (192.170.1.30): 56 data bytes
64 bytes from 192.170.1.30: seq=0 ttl=64 time=0.470 ms
64 bytes from 192.170.1.30: seq=1 ttl=64 time=0.321 ms

Packet Flow Verification #

Monitor packet flow on the router to verify traffic:

# Check packets flowing through the bridge
tcpdump -i br-hs-110 -n

Expected packet flow:

13:56:16.151606 pe-110 P   IP 192.170.1.3 > 192.170.1.30: ICMP echo request
13:56:16.151610 vni110 Out IP 192.170.1.3 > 192.170.1.30: ICMP echo request
13:56:16.152073 vni110 P   IP 192.170.1.30 > 192.170.1.3: ICMP echo reply
13:56:16.152075 pe-110 Out IP 192.170.1.30 > 192.170.1.3: ICMP echo reply

VM-to-External L3 Connectivity #

With 192.168.20.2 being the IP of the host on the red network:

# Run this from the VM console (vm-1)
curl 192.168.20.2:8090/clientip

Expected output (vm-1):

192.170.1.3:37144

The pod is able to reach a host on the layer 3 network and the IP is preserved.

When monitoring traffic:

18:01:50.406263 pe-110 P   IP 192.170.1.3 > 192.168.20.2: ICMP echo request, id 21760, seq 0, length 64
18:01:50.406267 br-pe-110 In  IP 192.170.1.3 > 192.168.20.2: ICMP echo request, id 21760, seq 0, length 64
18:01:50.406284 br-pe-100 Out IP 192.170.1.3 > 192.168.20.2: ICMP echo request, id 21760, seq 0, length 64
18:01:50.406287 vni100 Out IP 192.170.1.3 > 192.168.20.2: ICMP echo request, id 21760, seq 0, length 64
18:01:50.406295 toswitch Out IP 100.65.0.0.39370 > 100.64.0.1.4789: VXLAN, flags [I] (0x08), vni 100
IP 192.170.1.3 > 192.168.20.2: ICMP echo request, id 21760, seq 0, length 64

Traffic Flow Analysis: We see that the packet comes from the veth interface towards the bridge associated with VNI 110 (the default gateway), then is routed to the bridge corresponding to the layer 3 domain and finally encapsulated.

Troubleshooting #

Common Issues #

  1. Bridge not created: Verify the L2VNI has hostmaster.autocreate: true
  2. VM cannot reach gateway: Check that the VM’s IP is in the same subnet as l2gatewayip
  3. No VM-to-VM connectivity: Ensure both VMs are connected to the same bridge (br-hs-110)

Debug Commands #

# Check bridge creation
ip link show br-hs-110

# Verify network attachment
kubectl get network-attachment-definitions

# Check VM network interfaces
virtctl console vm-1
ip addr show

L2 VNI as KubeVirt dedicated migration network for cross cluster live migration #

This example is another use-case we can realize using this multi-cluster testbed. All its dependencies are pre-provisioned when you start this demo, i.e. when run the following command:

make docker-build demo-multi-cluster

This will provision the testbed described in architecture, and has one VM pre-provisioned in each cluster: vm-1 in cluster A, and vm-2 in cluster B.

It also has a dedicated migration network set up in both clusters, which happens to be implemented using an L2 VNI ! You can find links to the network manifests below:

Preparing the VM to be migrated #

To achieve cross cluster live migration, you need to provision the VM in both clusters - but, it is only running on one of them. In the “destination” cluster it will have be stuck in WaitingForSync phase, until the source and destination clusters coordinate to migrate the VM across.

Hence, you need to provision the following manifest in your destination cluster:

---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: vm-1
spec:
  runStrategy: WaitAsReceiver   # this is MANDATORY
  template:
    metadata:
      labels:
        kubevirt.io/vm: vm-1
    spec:
      tolerations:
      - key: node-role.kubernetes.io/control-plane
        operator: Exists
        effect: NoSchedule
      domain:
        devices:
          interfaces:
          - bridge: {}
            name: evpn
          disks:
          - disk:
              bus: virtio
            name: containerdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
        resources:
          requests:
            memory: 2048M
        machine:
          type: ""
      networks:
      - multus:
          networkName: evpn
        name: evpn
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.6.2
        name: containerdisk
      - cloudInitNoCloud:
          networkData: |
            version: 2
            ethernets:
              eth0:
                addresses:
                - 192.170.1.3/24
                gateway4: 192.170.1.1
        name: cloudinitdisk

NOTE: notice the running strategy on the VM is WaitAsReceiver.

The next preparation step is to create the destination cluster migration CR. Provision the following manifest for that:

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstanceMigration
metadata:
  name: vmim-target
  namespace: default
spec:
  receive:
    migrationID: abcdef
  vmiName: vm-1

The migrationID attribute is important - in the sense that the source migration object must use the same migration ID. But it can pretty much be anything.

Once the object is provisioned in the destination cluster, we need to understand how to connect the migration controllers on both clusters. For that, we need to inspect the status of the migration CR on the destination cluster. Execute the following command:

kubectl get vmim vmim-target -ojsonpath="{.status.synchronizationAddresses}" | jq
[
  "192.170.10.128:9185"
]

Issuing the cross cluster live migration command #

Now we know which IP and port to use in our source cluster migration CR; provision the following command in the source cluster (pay special attention to the connectURL attribute):

apiVersion: kubevirt.io/v1
kind: VirtualMachineInstanceMigration
metadata:
  name: vmim-source
  namespace: default
spec:
  sendTo:
    connectURL: "192.170.10.128:9185"
    migrationID: abcdef
  vmiName: vm-1

Once this CR is provisioned in the source cluster, the VM will be migrated from the source to the destination cluster. It’s state will be preserved, and established TCP connections will still be alive in the destination cluster.