[Originally published on IBM Cloud Blogs]
A look at the the areas that commonly need to be addressed when designing an edge architecture.
Edge solution architectures are new and very varied. The variations result from all the different components that make up an edge topology – from far edge devices to compute servers to network to cloud.
IT architectural decisions are well known — they capture key design issues and the rationale behind chosen solutions. These are typically conscious design decisions concerning a software system as a whole or one or more of its core components and connectors in any given view. The same holds when designing edge solutions. Often, solution architects have to decide which protocol is best suited to receive data from a particular device, whether to use 5G or Private 5G, or the type of edge server to use. These and many other facets of the architecture need to be considered and decided upon.
It is important to note that every architectural decision (AD) should be documented along with the rationale, implications and any counterarguments. This blog post will explore the areas that commonly need to be addressed when designing an edge architecture. To reiterate, edge encompasses everything from far edge devices all the way to compute and storage in the cloud.
Please make sure to check out all the instalments in this series of blog posts on edge computing:
Classifying architectural decisions
In an edge architecture, we have decided to classify four broad areas where design decisions have to be made, as depicted in Figure 1:
- Device: Device forms, types of apps deployed on given devices and protocols to use
- Cluster: The number and size of clusters/servers needed to support the various devices
- Network: Choosing the most optimal network — 4G, 5G or private 5G — and type of RAN (Radio Access Network)
- Cloud: Cloud model — private, public or hybrid — and what data to store
We think the networking aspect is the most challenging and critical because of 5G, which has promotedsoftware-defined networking (SDN)in a big way. That, in turn, has increased East-West network traffic in an edge architecture:
Each of the areas have many more decision points that can become quite overwhelming to someone attempting to design an end-to-end edge solution. The following definitions and Figure 2 below begin to illustrate the magnitude of these decision points.
There are audio devices, video cameras, heat, light and vibration sensors, pressure gauges and telemetry devices. Anything from a few to hundreds of these types of edge devices can be deployed in an edge topology. In this section, we only included edge devices that contain some compute and storage.
While the use cases vary by industry, edge solutions should support the most common of data protocols, including (but not limited) to Z-Wave, ZigBee, KNX, Bluetooth LE, HomeConnect, Modbus, ONVIF, EnOcean, BACnet, OPC UA, LoRa, Siemens S7, Kafka, Streams, etc.
All far edge devices that have compute and storage would be registered as edge nodes with an edge hub.
Can the data be used as is to infer anomalies or detect fraud and fraudsters or does the data have to be cleansed? How is dark data handled? These data-based decisions get complicated with the increase in artificial intelligence (AI) and machine learning (ML) applications being deployed on edge devices.
Architects have to decide which data is relevant and whether to store that relevant data in the Enterprise data store or in the cloud for machine learning model building and retraining. This is almost a corollary to the previous architectural decision (AD).
All the data from the edge devices may not be useful and does not need to be stored. Besides, transmitting all the data takes time and storing all the data is expensive.
It should be noted that certain industries require enterprises to store all the data for compliance and audit reasons.
Related to the device type discussed above, if there are many IoT-type sensors in the mix that do not have any compute or storage, then the edge architecture will include one or more edge cluster/server. In another scenario, if the app is too large to run on the edge device, then one can deploy the app on the edge cluster/server.
Decide if an edge cluster is required, and if so, what should be the size of that cluster? For example, a QSR (Quick Service Restaurant) could be well served with a small form factor compute like an Intel NUC (Next Unit of Computing) running a ML app ingesting data from three indoor cameras.
In other situations, the architect may have to decide on the number of clusters to deploy. In a large department store or warehouse, in addition to security cameras, there could be inventory scanning devices, robots, POS (point-of-sale) systems, etc., that need to be monitored. That might be more than what a single cluster can support.
Where should the edge hub reside — on-premises or in the public cloud? Industry regulations (as in healthcare and banking) and data sovereignty issues will influence this decision. In most cases, one edge hub should suffice, but there may be a need for an additional hub depending on the deployment topology.
This overlay networking abstraction model opened new venues of efficient routing of traffic — not just to-and-from cloud/data center (known as North-South traffic) but between the deployed applications, among the components of the applications, and to other cloud services (classified as East-West traffic).
South-bound access interface of the edge computing domain is a critical decision that architects have to address because it not only impacts the access points, but the edge computing platform architecture itself. The architecture of the South-traffic management is multi-dimensional and must be decided with two things in mind: security and future evolution of the sensor networks. The figure below shows the various edge decision points:
Access termination technology: The architect must decide whether to choose 4G/5G access if coverage is in tens of meters to miles. It also depends upon the terrain/facility construction, too. If the distance is less than 50m (and spectrum for 4G and 5G becomes an issue) then WiFi is a strong contender for the access protocol.
IoT, sensor and Industry 4.0 have large numbers of interfaces with legacy support. Architects have to account for constrained compute and networking requirements, unique requirements of low power and communication interfaces. These edge devices do not support 4G/5G protocols at this point, and even WiFi is not widely deployed. This leads to another critical decision by the architect – gateway devices.
Edge gateway devices: It is critical to establish a unified stream of highly heterogeneous interfaces supported by the edge devices/sensors. From REST to AMQP to XMPP to MQTT to CoAP to very device-specific protocols, architects have to decide on the most efficient protocol to use in edge computing deployments. With all this aggregation, data may need to be transformed before being consumed by the edge computing platform. The IoT/sensor application protocols are defined as being lightweight and mostly used with low-power infrastructure protocols. Different technology groups and industry organization define multiple infrastructure protocols (PHY/Link/Network Layer stack), meeting their use-case requirements. Bluetooth, Zigbee, HomePlug, 802.11, etc., are few other dominant stacks extensively deployed in the field.
Before virtualization became mainstream, most of the traffic in the network was North-South. As more applications have been migrated to the cloud and we have seen the rise of remote data centers, the East-West traffic within the facility has surpassed the North-South traffic. Figure 3, at a very high level, depicts this East/West network traffic flow.
The unique characteristics of emerging East-West traffic have led to its exponential growth and warrants consideration by architects, namely “API-ification” of the control plane and Layer 7 traffic.
- API-ification of the control plane: The control plane of the application domain and communication plane for the non-human devices (IoT, streaming devices, M2M, etc.) are implemented with APIs.
- Layer 7 traffic: Applications are developed and deployed with microservices and function as software units. Instead of using L2/L3 networking concepts (switches/ToR hierarchy), the software architecture has evolved to use L7 plane (with APIs and enterprise bus supporting Pub-Sub concepts, etc.). Instead of scaling the network with L2/L3 networking tools, the industry has moved towards “Flat Network” topology for East-West traffic. The success of cloud hyperscalers lies in exposing the control and observability plane of the networking with APIs for closed loop (automation). This programmable networking plane provides end-to-end visibility and the ability to apply dynamic attributes.
When combined, the programmable networking at the edge computing platform integrated with deployed cloud/data centers SDN architecture is leading to the “Flat Network” strategy across the geographical boundaries. The edge computing paradigm will evolve with its own dynamics, and the programmable unified control plane (of SDN) will facilitate its adoptability. Extending network slicing defined in 5G from the edge devices to the distributed application plane, adding support of application specific Quality of Service (QoS) and Service Assurance and secure network segmentation will be possible because of programmable networking. The programmable networking approach will contribute greatly by simplifying the operations, reducing costs and increasing agility:
It is worth mentioning that technology is available today which can take a WiFi network (non-3GPP network) and connect it seamlessly to a 5G core. That becomes relevant when making architectural decisions in an Industry 4.0 domain.
Imagine a plant floor has WiFi for sensors, employee mobiles, laptops and other equipment. Today, 3GPP Rel 16 of 5G offers a way in which these WiFi devices can connect to 5G SBA (service-based architecture) core with N3IWF (Non 3GPP Inter Working Function) for device authentication and exposure to the application domain. N3IWF facilitates the routing of user traffic to a UPF (User Plane Function) gateway. This architecture converges/facilitates exposure of the edge computing platform to multi-access networks like 4G/5G, WiFi and other evolving standards.
To architect an efficient and cost-effective edge computing ecosystem, the application domain has to adopt an API-based control plane and software-defined “flat” overlay network. The edge computing architects have to be cognizant of secured, large-scale and distributed architecture, but use resource constraint platform applications. The microservices deployed on the edge computing platform for different functionalities — from analytics to mobile access — are going to use an overlay network to communicate with each other. The architect must ask: “Which deployment domain is best suited?” While Kubernetes is the dominant deployment model, one has to consider Function-as-a-Service (FaaS).
While we alluded to the new application deployment model, there is also the bit about the cloud deployment model, which can be either public, private or hybrid. The cloud deployment model, in many cases, will be dictated by the industry. For example, the healthcare and financial institutions will typically have very specific requirements on how and where data can be stored.
Architects have to decide on the cloud model. While a hybrid model might seem most appropriate, decision would have to be made based on cost and latency when considering whether to use a physical data center, an on-premises cloud, a public cloud or a combination. If a hybrid model is chosen, then the next decision point invariably turns to data. Architects have to consider where to store different types of data. Generic data may be stored in the cloud, but sensitive data may need to be encrypted and stored on-premises.
Since the cloud offers up a lot of resources, including compute, data and storage, that is where machine learning (ML) models are trained and re-trained. So, the questions of data storage, deployment of ML applications and training of ML models need to be addressed from a cloud perspective.
As we mentioned at the start of this blog, there are so many decision points that it can be overwhelming. We have attempted to distill things down and tried to make it more manageable when it comes to architecting an edge solution. We would be amiss to not mention security — explicitly — as the most important architectural decision. The latest ransomware attacks give credence to the edge computing paradigm sometimes also referred to as cloud-out.
Security has to be addressed across all layers, especially in the network layer. The architect has to weigh the cost of having a distributed edge hub topology — with each hub controlling its own set of devices (e.g., a store) — versus the risk of a centralized topology wherein one edge hub in a data center controls all the devices, in all remote locations.
Using Perfect Forward Secrecy (PFS), as does a product like IBM Edge Application Manager (IEAM), does protect the far edge devices, but data encryption technology should also be deployed in the edge hub.
It is important to take a holistic approach towards security, covering all aspects from network to the applications. To that end, architects should be aware of ZTNA (Zero Trust Network Access) and SASE (Secure Access Service Edge). Links can be found in the references section.
The IBM Cloud architecture center offers up many hybrid and multicloud reference architectures, including ML/AI frameworks. Look for the IBM Edge Computing reference architecture and the Network Automation architecture.
This post talked about edge-related architectural decisions and how one has to think about addressing them across the various layers. Some architectural decisions, like security, are much more important than other decisions. And as evidenced in this post, a lot goes into deciding network architecture in an edge solution. With the adoption of a microservices architecture and container technology in modern applications, the percentage of East-West network traffic will continue to grow larger. With that in mind, East-West traffic security should not be overlooked.
We would like to know what you think. Do let us know if you have encountered other architectural decisions when designing edge solutions.
Special thanks to Kavita Bade, Ray Goyette and @SANIL NAMBIAR for reviewing the article.