Module 4: Enhancing Traffic Resilience and Security for Virtual Machines with OpenShift Service Mesh
All assets for this module are in the folder lab-4 . Please change the directory into this folder.
|
cd $HOME/virt-ossm-workspace/lab-4
Task 1: Expose the Business Dashboard VM
In the previous module you deployed and secured the Travel Booking application but currently there is no access to it.
Traditionally we would need to open tickets for firewall rules, loadbalancers, external DNS etc. in order to make the dashboard accessible from the Internet. However in the current environment, networking can be defined within OpenShift and Service Mesh. |
Open the URL to the Service Mesh Ingress Gateway at http://istio-ingressgateway-istio-system.apps.cluster-guid.guid.sandbox.opentlc.com/
It does not return a page. This is expected as we have not yet exposed any services from the mesh.
Exposing the Travel Booking dashboard to the outside via the Ingress Gateway requires the addition of the following Custom Resources (CRs) in OpenShift:
-
Gateway
: A load balancer operating at the edge of the mesh receiving incoming or outgoing HTTP/TCP connections. -
VirtualService
: Defines traffic routing including traffic separation (multiple versions, context based etc.). -
DestinationRule
: Defines policies (loadbalancing, retries, failover, security) that apply to traffic intended for a service after routing has occurred.
Execute the script expose-control-vm.sh in the Terminal to deploy these configurations:
./expose-control-vm.sh apps.cluster-guid.guid.sandbox.opentlc.com
Refresh the URL of the Ingress Gateway at http://istio-ingressgateway-istio-system.apps.cluster-guid.guid.sandbox.opentlc.com and you should be served with the familiar Travel Booking Dashboard. It may take a few seconds for the Istio Gateway configuration to take effect.
Congratulations your solution is public!.
Task 2: Deploying a new application (VM) version with a Canary Release
Often you will be required to deploy and maintain multiple versions of an application to provide new features to a subset of customers.
-
In this Task, you will be releasing a new version v2 of the cars-vm component, allowing 10% of new customers to access this new service, whilst the remainder will continue to use version v1.
-
If everything goes well, you will gradualy increase up to 80% of the traffic to version v2.
-
In order to achieve this, you are going to deploy a new VM with the name
cars-vm-v2-a
and a labelversion=v2
(see details here). -
Then you need to configure a
DestinationRule
to be able to direct the traffic to both versions of thecars-vm
service. (see details here).kind: DestinationRule apiVersion: networking.istio.io/v1alpha3 metadata: name: cars namespace: travel-agency labels: module: m4 spec: host: cars-vm.travel-agency.svc.cluster.local subsets: - labels: version: v1 name: v1 - labels: version: v2 name: v2
-
And finally you define the
VirtualService
which splits the traffic so that90%
of the traffic reachesv1
and10%
goes to the new servicecars-vm-v2-a
with versionv2
.kind: VirtualService apiVersion: networking.istio.io/v1alpha3 metadata: name: cars namespace: travel-agency labels: module: m4 spec: hosts: - cars-vm.travel-agency.svc.cluster.local gateways: - mesh http: - route: - destination: host: cars-vm.travel-agency.svc.cluster.local subset: v1 weight: 90 - destination: host: cars-vm.travel-agency.svc.cluster.local subset: v2 weight: 10
You could also use Kiali to define the configurations. In this case we are using a script.
|
Now execute the script below which delivers all the above configurations:
./multipleversions-for-car-vm-in-the-mesh.sh 90 10
After you have verified the new version is stable go ahead and increase the traffic routing for version v2
to 80%
.
./multipleversions-for-car-vm-in-the-mesh.sh 20 80
The Istio config in Kiali
has been updated (see cars VirtualService
) and soon the Graph should show 80%
traffic flowing to version v2
.
Task 3: Implementing a Circuit Breaker
The new metrics visualisation with Kiali
and Grafana
help business teams to better understand the level of load in terms of networking requests the solution receives, and make appropriate operational decisions. The overall goal is now to make the application more resilient.
Let’s assume the following scenario: The previous release of the new version In order to cope with this, the platform team is confronted with the following requirements:
|
Good news. You can take advantage of the Circuit Breaker feature of OpenShift Service Mesh to achieve the required resillience features.
About Circuit BreakerThe circuit breaker is an important pattern, used in environments with high traffic volumes and many destinations which offer the ability to loadbalance requests to multiple services, as it creates resilient microservice applications. Circuit breaking allows service mesh networking, like in an electric circuit, to monitor the healthiness of all destinations and stop using one of the version=v2 VMs if it starts failing, hence limiting the impact of failures and latency spikes to the end user. |
-
First, you deploy an additional VM with the name
cars-vm-v2-b
. -
This VM will also be exposed as part of
cars-vm
service as we apply the same label version v2. This way we achieve high availability.
Apply the following resource to deploy the new VM
.
oc apply -f cars-vm-v2-b.yaml -n travel-agency
After the new VM is up and running, we will now configure the circuit breaker pattern.
If there is a problem on either of the 2 version=v2 VMs, the service mesh will stop forwarding traffic to it until the service has recovered.
Now apply the circuit-breaker.sh script:
./circuit-breaker.sh
You will notice that in the case of a 5xx error, the service mesh will eject the VM that causes the issue for 3 minutes .
|
Lets test the circuit breaker by forcing an issue in the cars-vm-v2-b
VM.
Now you see that the failing version=v2 endpoint will be removed and no more requests will flow once it has detected the 5xx
failures.
This exclusion lasts per configuration for 180s
or 3 minutes upon which it will be retried and if failed it will again be excluded.
If you restart the workload by executing systemctl --user start cars.service , the traffic for v2 will again start being loadbalanced between the 2 VMs.
|
Contratulations for helping the Travel Agency company to make the solution as resillient as Netflix.
Task 4: Restricting Access to services with Authorization Policies
Although security features such as traffic encryption are by default applied in the mesh, other practices such as access rules on what is a service’s visibility and who can access them are not applied by default. This can have a two-fold effect:
-
Services that are bad actors deployed by 3rd party in the cluster can gain access to a sensitive service,
-
The amount of all possible destinations in a very large cluster can make the configuration of
istio-proxy
sidecar very large, causing evictions and possible cluster instability.
In order to counter these possible issues, you can apply AuthorizationPolicy
resources and visibility restrictions based on the principal (the service identification) included in the exchanged certificate.
About Authorization PoliciesThe authorization policy enforces access control to the inbound traffic in the server side Envoy proxy. Each Envoy proxy runs an authorization engine that authorizes requests at runtime. When a request comes to the proxy, the authorization engine evaluates the request context against the current authorization policies, and returns the authorization result, either ALLOW or DENY. Operators specify Istio authorization policies using YAML notation. |
First you apply a default deny all policy which is a best practise.
echo "apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: allow-nothing
namespace: travel-agency
spec:
{}" | oc apply -f -
echo "apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: allow-nothing
namespace: travel-control
spec:
{}" | oc apply -f -
Now all services of the Travel Booking application stop communicating with each other as they no longer have permission to do so (see also Kiali
Graph for the failures).
You can confirm the effect by accessing the Travel Booking Dashboard which now returns RBAC: access denied .
|
Next apply (in the Terminal) 2 fine grained
|
echo "apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: authpolicy-istio-ingressgateway
namespace: istio-system
spec:
selector:
matchLabels:
app: istio-ingressgateway
rules:
- to:
- operation:
paths: [\"*\"]" |oc apply -f -
echo "apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: allow-selective-principals-travel-control
namespace: travel-control
spec:
action: ALLOW
rules:
- from:
- source:
principals: [\"cluster.local/ns/istio-system/sa/istio-ingressgateway-service-account\"]"|oc apply -f -
echo "apiVersion: security.istio.io/v1beta1
kind: AuthorizationPolicy
metadata:
name: allow-selective-principals-travel-agency
namespace: travel-agency
spec:
action: ALLOW
rules:
- from:
- source:
principals: [\"cluster.local/ns/travel-agency/sa/default\",\"cluster.local/ns/travel-portal/sa/default\"]" |oc apply -f -
After a short period you should gain access to the Travel Booking Dashboard and the Kiali
dashboard will show a restored network of communications between the services.
However, the communication between the travel-control
and travel-agency
services has been restricted as it is unnecessary and the applied AuthorizationPolicy
rule does not permit it.
You can test this by executing the following command in the terminal:
oc -n travel-control exec $(oc -n travel-control get po -l app=control-vm|awk '{print $1}'|tail -n 1) -- curl -o - -I travels-vm.travel-agency.svc.cluster.local:8000/travels/London
You should receive a response that this operation is forbidden.
HTTP/1.1 403 Forbidden
content-length: 19
content-type: text/plain
date: Mon, 24 Mar 2025 16:10:11 GMT
server: envoy
x-envoy-upstream-service-time: 1