This example demonstrates how to connect KubeVirt virtual machines running in different Kubernetes clusters to an L2 EVPN/VXLAN overlay using OpenPERouter, extending the KubeVirt single cluster example.
Overview #
The setup creates both Layer 2 and Layer 3 VNIs, with OpenPERouter
automatically creating a Linux bridge on the host. Two KubeVirt virtual
machines are connected to this bridge via Multus secondary interfaces of type
bridge.
Architecture #
- L3VNI (VNI 100): Provides routing capabilities and connects to external networks
- L2VNI (VNI 110): Creates a Layer 2 domain for VM-to-VM communication
- Linux Bridge: Automatically created by OpenPERouter for VM connectivity
- VM Connectivity: VMs connect to the bridge using Multus network attachments
Example Setup #
Note: Some operating systems have their
inotify.max_user_instancesset too low to support larger kind clusters. This leads to:
- virt-handler failing with CrashLoopBackOff, logging
panic: Failed to create an inotify watcher- nodemarker failing with CrashLoopBackOff, logging
too many open filesIf that happens in your setup, you may want to increase the limit with:
sysctl -w fs.inotify.max_user_instances=1024
The full example can be found in the project repository and can be deployed by running:
make docker-build demo-multi-cluster
The example configures both an L2 VNI and an L3 VNI. The L2 VNI belongs to the L3 VNI’s routing domain. VMs are connected into a single L2 overlay. Additionally, the VMs are able to reach the broader L3 domain, and are reachable from the broader L3 domain.
Configuration #
1. OpenPERouter Resources #
Create the L3VNI and L2VNI resources (in each cluster):
apiVersion: openpe.openperouter.github.io/v1alpha1
kind: L3VNI
metadata:
name: red
namespace: openperouter-system
spec:
vni: 100
vrf: red
---
apiVersion: openpe.openperouter.github.io/v1alpha1
kind: L2VNI
metadata:
name: layer2
namespace: openperouter-system
spec:
hostmaster:
autocreate: true
type: bridge
l2gatewayip: 192.170.1.1/24
vni: 110
vrf: red
Key Configuration Points:
- The L2VNI references the L3VNI via the
vrf: redfield, enabling routed traffic hostmaster.autocreate: truecreates abr-hs-110bridge on the hostl2gatewayipdefines the gateway IP for the VM subnet
2. Network Attachment Definition #
Create a Multus network attachment definition for the bridge:
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
name: evpn
spec:
config: |
{
"cniVersion": "0.3.1",
"name": "evpn",
"type": "bridge",
"bridge": "br-hs-110",
"macspoofchk": false,
"disableContainerInterface": true
}
3. Virtual Machine Configuration #
Create two virtual machines with network connectivity (one VM in each cluster):
This VM should be created in cluster A:
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: vm-1
spec:
runStrategy: Always
template:
metadata:
labels:
kubevirt.io/vm: vm-1
spec:
domain:
devices:
interfaces:
- bridge: {}
name: evpn
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
resources:
requests:
memory: 2048M
machine:
type: ""
networks:
- name: evpn
multus:
networkName: evpn
terminationGracePeriodSeconds: 0
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.5.2
name: containerdisk
- cloudInitNoCloud:
networkData: |
version: 2
ethernets:
eth0:
addresses:
- 192.170.1.3/24
gateway4: 192.170.1.1
name: cloudinitdisk
This VM should be created in cluster B:
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: vm-2
spec:
runStrategy: Always
template:
metadata:
labels:
kubevirt.io/vm: vm-2
spec:
domain:
devices:
interfaces:
- bridge: {}
name: evpn
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
resources:
requests:
memory: 2048M
machine:
type: ""
networks:
- name: evpn
multus:
networkName: evpn
terminationGracePeriodSeconds: 0
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.5.2
name: containerdisk
- cloudInitNoCloud:
networkData: |
version: 2
ethernets:
eth0:
addresses:
- 192.170.1.30/24
gateway4: 192.170.1.1
name: cloudinitdisk
VM Configuration Details:
- Uses the
evpnnetwork attachment for bridge connectivity - Cloud-init configures the VM’s IP address and default gateway
Validation #
VM-to-VM Connectivity #
Test connectivity between the two VMs:
# Connect to VM console
virtctl console vm-1
# It may take about 2 minutes to start up and get to the login.
# Once it does, login with username "fedora" an password "fedora".
# Test ping to the other VM
ping 192.170.1.30
Expected output:
PING 192.170.1.30 (192.170.1.30): 56 data bytes
64 bytes from 192.170.1.30: seq=0 ttl=64 time=0.470 ms
64 bytes from 192.170.1.30: seq=1 ttl=64 time=0.321 ms
Packet Flow Verification #
Monitor packet flow on the router to verify traffic:
# Check packets flowing through the bridge
tcpdump -i br-hs-110 -n
Expected packet flow:
13:56:16.151606 pe-110 P IP 192.170.1.3 > 192.170.1.30: ICMP echo request
13:56:16.151610 vni110 Out IP 192.170.1.3 > 192.170.1.30: ICMP echo request
13:56:16.152073 vni110 P IP 192.170.1.30 > 192.170.1.3: ICMP echo reply
13:56:16.152075 pe-110 Out IP 192.170.1.30 > 192.170.1.3: ICMP echo reply
VM-to-External L3 Connectivity #
With 192.168.20.2 being the IP of the host on the red network:
# Run this from the VM console (vm-1)
curl 192.168.20.2:8090/clientip
Expected output (vm-1):
192.170.1.3:37144
The pod is able to reach a host on the layer 3 network and the IP is preserved.
When monitoring traffic:
18:01:50.406263 pe-110 P IP 192.170.1.3 > 192.168.20.2: ICMP echo request, id 21760, seq 0, length 64
18:01:50.406267 br-pe-110 In IP 192.170.1.3 > 192.168.20.2: ICMP echo request, id 21760, seq 0, length 64
18:01:50.406284 br-pe-100 Out IP 192.170.1.3 > 192.168.20.2: ICMP echo request, id 21760, seq 0, length 64
18:01:50.406287 vni100 Out IP 192.170.1.3 > 192.168.20.2: ICMP echo request, id 21760, seq 0, length 64
18:01:50.406295 toswitch Out IP 100.65.0.0.39370 > 100.64.0.1.4789: VXLAN, flags [I] (0x08), vni 100
IP 192.170.1.3 > 192.168.20.2: ICMP echo request, id 21760, seq 0, length 64
Traffic Flow Analysis: We see that the packet comes from the veth interface towards the bridge associated with VNI 110 (the default gateway), then is routed to the bridge corresponding to the layer 3 domain and finally encapsulated.
Troubleshooting #
Common Issues #
- Bridge not created: Verify the L2VNI has
hostmaster.autocreate: true - VM cannot reach gateway: Check that the VM’s IP is in the same subnet as
l2gatewayip - No VM-to-VM connectivity: Ensure both VMs are connected to the same bridge (
br-hs-110)
Debug Commands #
# Check bridge creation
ip link show br-hs-110
# Verify network attachment
kubectl get network-attachment-definitions
# Check VM network interfaces
virtctl console vm-1
ip addr show
L2 VNI as KubeVirt dedicated migration network for cross cluster live migration #
This example is another use-case we can realize using this multi-cluster testbed. All its dependencies are pre-provisioned when you start this demo, i.e. when run the following command:
make docker-build demo-multi-cluster
This will provision the testbed described in architecture, and
has one VM pre-provisioned in each cluster: vm-1 in cluster A, and vm-2
in cluster B.
It also has a dedicated migration network set up in both clusters, which happens to be implemented using an L2 VNI ! You can find links to the network manifests below:
Preparing the VM to be migrated #
To achieve cross cluster live migration, you need to provision the VM in both
clusters - but, it is only running on one of them. In the “destination” cluster
it will have be stuck in WaitingForSync phase, until the source and
destination clusters coordinate to migrate the VM across.
Hence, you need to provision the following manifest in your destination cluster:
---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
name: vm-1
spec:
runStrategy: WaitAsReceiver # this is MANDATORY
template:
metadata:
labels:
kubevirt.io/vm: vm-1
spec:
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
domain:
devices:
interfaces:
- bridge: {}
name: evpn
disks:
- disk:
bus: virtio
name: containerdisk
- disk:
bus: virtio
name: cloudinitdisk
resources:
requests:
memory: 2048M
machine:
type: ""
networks:
- multus:
networkName: evpn
name: evpn
terminationGracePeriodSeconds: 0
volumes:
- containerDisk:
image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.6.2
name: containerdisk
- cloudInitNoCloud:
networkData: |
version: 2
ethernets:
eth0:
addresses:
- 192.170.1.3/24
gateway4: 192.170.1.1
name: cloudinitdisk
NOTE: notice the running strategy on the VM is WaitAsReceiver.
The next preparation step is to create the destination cluster migration CR. Provision the following manifest for that:
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstanceMigration
metadata:
name: vmim-target
namespace: default
spec:
receive:
migrationID: abcdef
vmiName: vm-1
The migrationID attribute is important - in the sense that the source
migration object must use the same migration ID. But it can pretty much be
anything.
Once the object is provisioned in the destination cluster, we need to understand how to connect the migration controllers on both clusters. For that, we need to inspect the status of the migration CR on the destination cluster. Execute the following command:
kubectl get vmim vmim-target -ojsonpath="{.status.synchronizationAddresses}" | jq
[
"192.170.10.128:9185"
]
Issuing the cross cluster live migration command #
Now we know which IP and port to use in our source cluster migration CR;
provision the following command in the source cluster (pay special
attention to the connectURL attribute):
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstanceMigration
metadata:
name: vmim-source
namespace: default
spec:
sendTo:
connectURL: "192.170.10.128:9185"
migrationID: abcdef
vmiName: vm-1
Once this CR is provisioned in the source cluster, the VM will be migrated from the source to the destination cluster. It’s state will be preserved, and established TCP connections will still be alive in the destination cluster.