Guest post from Drew Johnson, VP of Engineering & Operations, Aeris Communications.
As IoT application deployments go global, customers demand more—greater flexibility and choice, better alignment to business objectives, and meaningful improvements to IoT program capabilities, cost, and security. IoT solution providers are looking for efficient ways to bring these benefits to their end customers, and hosting solutions in the cloud is key.
AWS, Google and Microsoft are the “big three” public cloud providers, and among them Google is typically considered the laggard, especially for IoT applications. But after an extensive evaluation of both providers, we at Aeris (an IoT technology partner with over a decade of experience deploying IoT projects for Fortune 100 companies) decided to use Google Cloud Platform (GCP).
What we have running in the cloud is the Fusion IoT Network — the first intelligent multimode 5G-ready solution, including LTE-M, NB-IoT, LTE and 2G/3G. Fusion IoT enables organizations implementing IoT solutions to use global wireless networks with one connectivity subscription on one platform, eliminating multi-network administrative overhead. Fusion IoT Network is built to scale. With operational reach in 180 countries, it handles more than a billion transactions a day for over 1000 customers worldwide.
We chose Google Cloud Platform (GCP) for our production environment due to several factors, including Kubernetes maturity, VPC IP networking capabilities, open-source APIs, costs, and governance. The rest of this article details our evaluation of GCP along these vectors.
Most Mature Managed Kubernetes
We wanted to transition from a Cloud 1.0 approach, where we used managed VMs, to a Cloud 2.0 approach, where we use managed containers and Kubernetes. Cloud 2.0 is primarily about leveraging managed Kubernetes, which provides superior compute density for cost savings as well as orchestrated flexibility for more accurate deployments and performance.
Our analysis showed that GCP’s Google Kubernetes Engine (GKE) is the best, most mature managed Kubernetes platform, which isn’t especially surprising since Google invented Kubernetes and continues to make regular contributions and updates to it in the open source community. We knew we would get Kubernetes updates more quickly if we went with GCP.
VPC/IP Networking Capabilities
Fusion IoT is a cloud-native IoT network, and we wanted to move to a cloud provider with the best IP networking capabilities. Google’s Global Virtual Private Cloud (VPC) IP networking capabilities are the best we have seen of any cloud provider. Aeris is a global provider and we found that the Google IP network backbone provided lower latency for our use cases compared to what we were using previously.
Additionally, we found Google’s project and shared network capability as compelling and differentiated. GCP supports an ability to create discrete isolated VPC networks within a project but then share networks into the project for shared private access. This allowed us to create exact replicas of our network architecture for every phase of the deployment lifecycle, from our development environments to quality assurance environments, and all the way to our production environments. This is a powerful approach which provides better security, deployment accuracy, faster troubleshooting, and separation of concerns between our IP networking team and our automation for infrastructure as code.
The Google IP network, coupled with the Aeris cellular IoT network, will allow us to give our customers unprecedented performance, security, and flexibility.
Open-Source APIs for PaaS
Aeris and many enterprises are trending toward multi-cloud support. However, this can lead to an impediment for leveraging the full capabilities of the cloud providers. Care must be taken not to get locked into any particular cloud’s proprietary Platform as a Service (PaaS) services. On the other hand, running the equivalent services ourselves can be very resource-intensive. The emerging interoperable standards for PaaS are being driven by adoption of open source APIs. Other cloud providers are also doing this, but Google is taking the lead in this area. They are a founder of many open source technologies, and their APIs have become the de-facto standard APIs that those technologies use.
For example, Google is the creator of the Beam API for stream processing. Previously, Aeris had to run and manage our own Flink cluster underneath the Beam API. With GCP, we kept the same Beam API but then leveraged GCP’s DataFlow underneath. In a similar manner, GCP has DataProc, which is a managed Spark and Hadoop service that lets us take advantage of open source data tools for batch processing, querying, and machine learning. We could just plug DataProc in underneath our standard open source APIs and we improved performance without any significant rework on our side.
Google has significant resources running and improving their PaaS services. They have created dashboards and tools which provide our Site Reliability Engineers greater insights. The implementations are solid and will continue to improve over time with much less effort from our team. Overall, this approach has improved reliability and reduced effort without causing lock-in. It is a big win for us.
GCP offers what it calls Sustained Use Discounts – this is a technical strength that Google has with respect to pricing by leveraging their analytics and machine learning capabilities. Other cloud providers typically charge for VMs by either pay-as-you-go or as a Reserved Instance (RI), where you commit to a 1-3 year contract for a better discount. The problem with Reserved Instances is that needs change over time: when you buy a Reserved Instance, you commit to paying for it even if you turn it off early. Effective tracking and managing utilization of RIs takes a lot of resources. GCP has a similar Committed Use Discount. However, GCP also has the Sustained Use Discount, which kicks in automatically after GCP analytics determine you are using resources more than N days in the month. You automatically get a discount and if you turn off an instance, you’re no longer charged for it.
There are at least three other important features in the area of cost management. The GCP Project Organization feature provides superior capabilities for cost control and visibility across different product lines and product deployment lifecycle phases. In addition, Custom Instance Types in GCP provide more flexibility at better cost than other cloud providers’ rigid instance types. This can be very significant if you need, for example, instance types with large memory but relatively low CPU. Even further, GCP’s Recommendation Engine is regularly looking at our VM utilization and recommending that we can save money be changing to a smaller instance type. It’s on our side!
What this means for us is that we don’t have to spend so many resources managing our cloud usage – Google is doing it for us so we can focus on adding functionality to our offering instead of cloud usage management.
When we deployed our products with another cloud provider, we ended up with 14 different accounts cobbled together in a common bill. It was nearly impossible to manage effectively. We were forced to this approach in order to get the access and cost visibility and control we needed. Even so, we had a lot of incidents of changes intended for development environments impacting production. The GCP Project Organization structure, which includes a folder hierarchy, gives us just the access and cost controls we need. We used a combination structure of lifecycle and product area to organize our deployments. We have a major folder branch between non-production and production projects. In non-production, we can have multiple development, quality, and continuous integration environment projects. Each project environment can have its own cost and access controls. The cost of a production environment is only about one-third of the total cost to deliver, so we needed to make sure that non-production environments could be managed very efficiently, as this can have a huge impact on the total cost.
We have more than a hundred developers working on our products who need access to environments and VMs. We also have contractors and partners who often need access. The SSH key mechanism used by other cloud providers does not provide the level of security and management needed. Ultimately, managing those keys is again an incredible resource drain. Using OS Login with GCP gives us better security with much lower management overhead. It gives us more granular access control at the user level and provides centralized access management. GCP is also working security more and more into the fabric of their cloud. For example, their Cloud SQL Proxy provides encryption for data in transit between the application and Cloud SQL with almost no additional effort.
Although we still offer cloud-agnostic solutions, we made the transition to GCP for the Fusion IoT Network because GCP provided the best possible solution for Aeris’ transition to Cloud 2.0. We moved away from managed VMs to managed Kubernetes, got better networking capabilities that were aligned with our business, superior open-source APIs, and a host of other features that allows us greater flexibility and performance with operational savings and tighter security. GCP may be seen as the poor stepson when compared with AWS and Azure in the market share rankings, but for our team at Aeris, it was the perfect fit.
Where do organizations with $8.5 billion in total buying power plan their 2020 budgets?
The world’s only ITEXPO #TechSuperShow Feb 12-14, 2020 in Fort Lauderdale, Florida.