Hybrid Cloud Mesh

 View Only

Is your cloud infrastructure ready to support AI workloads?

By Raul Gonzalez posted Mon April 15, 2024 09:20 AM

  

During the AI transformation journey, one of the most important decisions to make is decising which cloud provider will run the AI workloads. This is not a trivial question, because most companies don’t run applications in multiple hyperscalers, therefore the chosen cloud may impact other applications.

Vendor Lock-in Problems

Let me tell you this story from one of our customers. This company had all their workloads in Azure, mainly because of its authentication features (LDAP) and, as they were already using this cloud, they continued deploying the rest of their applications in Azure.

This company had a very bad experience migrating their apps from onprem to the cloud, therefore they decided to follow a ‘single cloud’ approach.

A few months ago this company started deploying AI applications for internal use, however they were not getting the performance they wanted, and the cost of running these applications was going through the roof, so they started investigating other options, maybe moving everything back onprem?

Reasons to move to another cloud - GKE support for TPU announcement

Azure is very good with LDAP authentication, that’s why the customer chose it, but Google has put a lot of effort on AI technologies, for instance, GKE (Google Kubenertes Engine) now offer support for TPUs (Tensor Processing Unit). For this reason, the customer wanted to use GKE for their AI workloads.

For the ones that don’t know TPUs are processing units, similar to GPUs and CPUs, that are optimized for AI and ML workloads in terms of cost and performance. And the fact that GKE supports this technology makes them very appealing for companies to run their AI workloads in that environment.

   

Single cloud vs Multi cloud

The customer was facing this dilemma:

  • Deploy the AI workloads in the current cloud (Azure) not taking advantage of GKE AI features
  • Move everything to Google cloud, meaning they have to go through the same pain they had during the onprem-Azure migration
  • Deploy the AI workloads in GKE, meaning they will have to manage the connectivity between clouds somehow

The first option was not a real option, their competitors were already using this technology and they were already feeling the slightly advantage their were getting because of AI.

The customer was seriously considering the second option, moving everything to Google cloud, and they started planning resource allocation and cost for the project, because the pain of having to manage multi cloud connectivity was more daunting.

Each cloud network works differently

Seamless hybrid cloud connectivity

Luckily for the customer, we could offer a solution for them to work in multiple clouds at the same time providing seamless connectivity between the clouds and even onprem environments. This solution is called Hybrid Cloud Mesh and it creates Virtual application Networks (VAN) between environments allowing DevOps teams to build their on connections between apps (aka connectivity on demand).

The first question that came back from the customer when we introduced them this solution was, ‘is this another L3 overlay network?’, and this is a very common question. In a world where apps are moving away from IP addresses, it doesn’t make too much sense keep forcing those apps to work through L3 VPNs, L3 network overlays, using FQDN, etc… These technologies are complex to manage, require changes on the applications to run through the L3 gateways and, basically, they are not application centric.

Hybrid Cloud Mesh

With Hybrid Cloud Mesh, we are breaking the barriers between NetOps/CloudOps and DevOps empowering DevOps teams to create their own app to app connectivity, but still leaving the control of the infrastructure of the network.

Solution

Provisioning Hybrid Cloud Mesh allowed them to run their workloads where it made mor sense for them, some of the workloads in Azure, some of the workloads in GKE and, also very important, some of the workloads on prem. Because another of the functionality available in Hybrid Cloud Mesh is the ability to create VANs between onprem and cloud environments.

Think about how easy was for them to run workloads onprem for testing, and once they are ready to go into production, they just need to move them to the cloud where they want to run them. All this migration with no network changes, because Hybrid Cloud Mesh has already created the connections between onprem and the clouds….all with a zero trust approach.

Example of Zero Trust Connectivity on Demand Policy

Question

With Hybrid Cloud Mesh, this customer has now the infrastructure ready to support and run their AI workloads, but is your cloud infrastructure ready for the future challenges of AI?

0 comments
16 views

Permalink