Roadmap to the Adaptive Edge – How to Balance Cost Efficiency, Usability, and Performance Tradeoffs

July 5, 2022

The network edge presents new challenges in optimizing application deployments, as the deeper an application is pushed into the internet, the greater the associated costs to run those workloads. Recently, as part of Kubernetes on EDGE Day at this year’s KubeCon + CloudNativeCon event in Europe, Section Senior Product Manager Tom McCollough covered a set of optimization considerations and strategies that allow workloads to be as close to users as possible, bounded by a business value construct. At the same time, he discussed the importance of maintaining simplicity in the usability of the system, so that developers can specify how deeply they want their application to run without overallocation of resources.

This blog post presents an excerpt from Tom’s presentation, discussing the use of Cloud Native Computing Foundation (CNCF) core technologies, including Kubernetes and Prometheus as a base, to establish a framework for evaluating projects within the CNCF landscape and their suitability for edge use cases. Tom also examines Kubernetes Federation techniques, multi-cluster orchestration systems, and traffic direction and service discovery strategies against a selection criterion to assist architects in making “good-fit” decisions.

Building the Roadmap: Where to Start

When we talk about balancing performance against cost, we’re talking about running hundreds of instances of an application at the edge to deliver, on average, shorter distance, lower latency, fewer dropped shopping carts, and so on. But there’s a cost associated with that – not just in terms of cycles, but in terms of operating all those clusters.

While you might love to run clusters everywhere, the cost is likely too high for that to be a realistic option. So, how do you get the effect of running everywhere without actually running everywhere? First, let’s establish our bearings by considering some of the “what ifs” at play and of note:

  • What if we could be adaptive?
  • What if we could move the workload to where it needs to be?
  • What if we could maintain a set of running workloads that are optimal for particular needs – optimal according to distance from an end user, optimal according to budget, etc.?
  • What if we could continuously adapt that set of running workload locations – adapt it according to signals such as health signals, utilization signals, etc.?
  • What if we could use predictive analytics to know ahead of time that we can shut down a workload in one location, start it up in another, and continuously revisit this set of locations and update it as time goes by?

That’s the idea. Consider this as a rough outline for the roadmap you’re building for this first solution for an adaptive optimized edge.

Surveying the Landscape

Now, let’s survey the existing CNCF landscape and how it can be leveraged here. We’ll use Kubernetes, running a containerized workload, as our starting point.

There are still other pieces we’re going to need:

  1. We’re going to need to be able to run our workload on multiple clusters.
  2. We’re going to need to be able to move workload from one cluster to another.
  3. We’re going to need a way to solve the problem related to signals and optimization.
  4. And then finally, we’re going to need to be able to direct traffic. As a workload moves from one location to another, based on the signals, we will need to redirect the traffic to follow along.

Multi-Cluster – The multi-cluster problem is well addressed in the CNCF landscape. There are a number of solutions out there; some of them are forks from the Federation special interest group. Karmada is just one suitable example we can use, as it allows you to run and manage multiple clusters without the cognitive load of having to think about managing each cluster, individually. That solution is ready to go.

Moving Workload – With regard to moving a workload, Open Cluster Management is another tool that lets you manage workloads running on multiple clusters, and administer the multiple clusters themselves. But it adds something different – it adds a placement decision. This gives you ability to customize where your workload runs. It’s a code you write yourself and then provide to Open Cluster Management to tell it where a workload needs to be prioritized. Using this capability and the addition of signals (which we’ll get into more below) you can develop the ability to place a workload.

The thing about Open Cluster Management’s placement decision, however, is that it operates on the initial deployment of that workload. It doesn’t do what we want, which is revisit it on a cadence. And since it doesn’t revisit the decision, you’re going to have to somehow add that capability to open cluster management to rerun this placement, and then actually move the workload.

Signals and Optimization – This is where we’re left empty handed. While CNCF does have an optimization section, it’s devoted towards optimizing within a single Kubernetes cluster – all the examples are about optimizing for cost (how to run this cluster in an optimal way, minimizing compute resource, etc.). What we want to do is optimize across multiple clusters – and we also want to take performance into account. There’s nothing in the landscape that provides that.

So, what now? Unfortunately, you’ll need to develop a custom solution.

Directing Traffic – Finally we come to directing traffic, and now we’re back to having a good representation within the CNCF landscape.

Core DNS is a programmable DNS solution that will route traffic from an end user to the nearest location where that traffic can be served. However, DNS has common issues that come into play. For instance, an ISP, as a piece of the DNS stack, may decide to cache a single DNS answer for its entire user base – and therefore not provide the necessary user location granularity; you really do want each end user to be routed to their closest location, not to the closest location of some other end user in that same ISP’s network.

And then of course, there’s the issue of time to live (TTL). We want use the capability we’re developing here to make highly available services, but if a TTL isn’t being honored (say you’ve asked for a minute and an ISP is giving you an hour), then that’s the best you’re going to be able to do; in the worst case, your switch over time would be an hour for directing traffic. In short, DNS has issues.

Kubernetes global load balancer, however, brings anycast into play for us. Anycast is a capability built into IP that’s purpose-built to route traffic from an end user to the closest location where that workload is being served. And Kubernetes global load balancer will use Border Gateway Protocol (BGP) protocol to control the anycast capability of IP, which is the perfect solution for what we want to do. You can learn more on this particular strategy in our recent series on Building for the Inevitable Next Cloud Outage.

Charting Your Course to the Adaptive Edge

Now, if you take a step back, what you have developed is a high-level roadmap for all the pieces that you need, pulled from the CNCF landscape, to build an optimized adaptive edge. With this you will be able to balance performance against cost to get an optimal solution. You’ll get the effect of running everywhere without actually running everywhere – and without the cost of running everywhere.

That said, while we’ve been talking about performance against cost, it’s also important to note that businesses are unique. Businesses want to put themselves forward in a distinctive way – chart their own course, so to speak. Maybe you want to prioritize something else at your business. Maybe, for example, you want to minimize your carbon footprint. You could build into your optimization engine the ability to prefer green data centers. Or, let’s say that compliance is important. You could build into your optimization to prefer PCI compliant or other types of compliance. Ultimately, your business can customize the optimization engine as needed to suit your purposes.

Section’s Adaptive Edge Engine intelligently and continuously tunes and reconfigures your edge delivery network to ensure your edge workloads are running the optimal compute for your application. If you’re as excited about discovering the adaptive edge as we are, we’d love to chat with you further. Get in touch with us today.

Similar Articles