One of the most important topics in relation to edge computing is scheduling. In the future, it is perfectly feasible that every 5G base station will have a data center at its base. While these edge data centers will enable a massive amount of compute, there won’t be enough to run every application in the world at every tower at once. We need a system that is capable of optimizing workload scheduling to run in the right place at the right time. This is a hugely challenging problem that needs our attention.
The types of edge workload scheduling models break down as follows:
- Static Scheduling: As we see in today’s Content Delivery Networks (CDNs), static scheduling is relatively straightforward with pre-set locations and predetermined configurations;
- Dynamic Scheduling: Scheduling that is latency or volume driven. This is where the most opportunity exists in relation to edge computing;
- Enforcement Scheduling: This involves circumstantial scheduling, such as in the case of location or data protection requirements (e.g. GDPR, PCI Compliance).
Dynamic, or demand-based scheduling, in relation to latency or volume, presents the biggest challenges (and brightest opportunities) when it comes to the various types of edge workload scheduling. There are several factors that drive dynamic scheduling. For instance, a developer determines that certain responses should have specific latency thresholds, such as, “Geography isn’t important, I just want to be within 10 milliseconds of my users.” The same request in terms of volume might look something like, “I’m getting lots of traffic from Europe and while I don’t typically have traffic there, I want the system to detect a certain set of special circumstances and trigger the launch of various components in the right geographies within Europe.”
Edge Workload Scheduling Challenges
Scheduling offloaded service requests to cloud computing can exert a considerable drain on networks. When many service requests are offloaded, it becomes important to work out how to schedule service requests between different edge compute stations so that performance is guaranteed, but costs are kept to a minimum. Compromise between the two is inevitable in finding the optimum balance between value and performance.
Scheduling becomes even more complicated when we take into consideration some of the complex dynamics of the edge. For example, in the case of the device edge, as terminal devices move locations and the service environment accordingly changes across time, working out how to make coherent and consistent dynamic scheduling decisions in line with the uncertainty of request patterns and the shifting environment is considerably challenging. Furthermore, what has been described as the Fourth Industrial Revolution is already upon us. By 2020, experts have predicted there will be up to 50 billion IP-enabled IoT devices alone. This is contributing to a massive explosion in data and data center traffic. As the number of terminal edge devices and mobile services grows, the service request scheduling problem becomes yet more complicated.
Another major challenge with dynamic scheduling is the associated cost of startup and teardown. To better illustrate this concept, consider this race car scenario. Prior to competition, race cars are transported to tracks by (relatively) slow-moving trucks. Once the car arrives, there is an entire crew that prepares the car for competition. When the light turns green at the start of the race, the car goes fast. When it comes to dynamic scheduling, we need to build strategies to expedite the setup and teardown so that it is instantaneous… like Star Trek, beaming the race car into the track at the precise moment the light turns green, and beaming it out once the checkered flag falls.
Traditional approaches to optimization born out of centralized systems, such as dynamic programming and combination optimization, will become more problematic, requiring high-complexity solutions that lead to problematic and long execution periods.
One of the more established potential solutions already in play is the Kubernetes Horizontal Pod Autoscaler. In the context of Kubernetes, there are two things you typically want to scale as a user: pods and nodes. The Horizontal Pod Autoscaler automatically scales the right number of pods in a replication controller, deployment or replica set as determined by observed CPU utilization, or other application-provided metrics with custom metrics support. The decision of when to scale is based on continuously measuring a preset metric and the moment it crosses a set threshold. An example: you want to measure the average CPU consumption of your pods and trigger a scale operation once your CPU consumption surpasses 80%, however, one metric doesn’t fit all types of application so the metric might vary. For a message queue, for instance, the number of messages existing in the waiting state might fit the metric; however, for a memory intensive application, memory consumption might fit the metric, so the percentage will vary across application type. Scaling down when the workload usage drops can also be implemented without causing disruption to the processing of existing requests.
Another solution being explored is the creation of new algorithms, such as the Dynamic Service Request Scheduling (DSRS) algorithm created by engineers at the Beijing Information Science and Technology University, which aims to make decisions around request scheduling that optimize cost while guaranteeing a certain level of performance. They claim, “the DSRS algorithm can be implemented in an online and distributed way”.
Although the compute capacity will be enormous, edge data centers are not equipped to run all of the world’s applications at all times; nor is it economically viable for businesses to consider running all servers at all times. Therefore, whoever works out the most efficient solution for running workloads in the right place at the right time will truly enable the next wave of edge computing.