Part 1

Cross-cluster communication sounds simple—until you try to make it work across on-premises environments, public clouds, and heterogeneous services. Behind every Nutanix cluster lies a network of components working in harmony to power the Nutanix Cloud Platform. But when clusters need to talk to each other, the complexity skyrockets. This post explores how we approached that challenge, the solution we built to make it secure, efficient, and scalable—and how we contributed this capability back to the community through open source.

A Nutanix cluster consists of multiple services working together to deliver the rich functionality of the Nutanix Cloud Platform. These services form various cluster components, which you can learn more about in The Nutanix Bible.

Nutanix software can be deployed on-premises or in the public cloud. Most customers operate multiple Nutanix deployments, and many run hybrid environments spanning on-premises and one or more public clouds using products such as the Nutanix Acropolis Operating System (AOS), Prism Central (PC), Nutanix Central (NC), and Nutanix Cloud Clusters (NC2). For simplicity, we’ll refer to any of these product clusters as a “Nutanix Cluster”, since the type of product only changes which control and data path services span across—not how they operate.

To enable advanced features like Disaster Recovery (DR), multi-cluster management, and entity aggregation, services running in one cluster often need to communicate with equivalent or complementary services in another cluster.

Figure 1: Different communication patterns between Nutanix cluster deployments. Figure 1: Different communication patterns between Nutanix cluster deployments.

Designing Cross-Cluster Communication

When designing cross-cluster service-to-service communication, we identified several key requirements:

  • Heterogeneous Services: Different cluster types are at varying stages of modernization, so services may run on VMs or Kubernetes. These heterogeneous services must integrate and communicate seamlessly.
  • Deployment Agnostic: Services should work regardless of whether clusters run on-premises or in the public cloud.
  • Secure and Standardized: Communication must follow a standard discovery mechanism and secure guidelines. On-premises services should remain inherently secure and avoid direct exposure to external networks.
  • Low Overhead: Communication should minimize CPU and memory usage, and management overhead should be low (e.g., no VPN, minimal firewall changes).

Exploring Solutions

A Naïve Approach

One option is letting each service manage its own communication. This works for small, contained systems (e.g., services on the same host or VM) but quickly becomes unscalable. Key issues include:

  • No standardization for security, service discovery, or configuration.
  • Services must track cluster-level changes (node additions/removals, IP updates, certificate rotations).
  • Cross-cluster communication—especially between on-premises and cloud—requires opening firewall ports, introducing security risks and complex management.

Pros: Simple, low resource overhead
Cons: Not scalable, lacks standardization, complex to manage

A Modern Approach

Service meshes like Istio are common in Kubernetes based architectures. However, when we started, service meshes were nascent and resource-heavy (sidecar for every container!). They also typically operate within a single cluster, making multi-cluster communication challenging. Adding firewall rules further complicates things, especially since Nutanix clusters mix native and Kubernetes services.

Pros: Standardization
Cons: High resource overhead, complex management

A Reverse Approach

We needed a custom solution that met all requirements—a middle ground between the two extremes. Reverse connections emerged as a promising idea.

A reverse connection starts when a client initiates a connection to a server, then flips roles: the original server caches the connection and acts as a client, while the original client listens for incoming requests. This ensures:

  • On-premises clusters never accept inbound connections or expose services externally.
  • All communication uses outbound connections to public-cloud clusters or accessible networks.
  • No VPN or firewall rule changes are required.

However, expecting every service pair to implement this protocol is impractical. Services differ in deployment models and programming languages, and we don’t want each service to monitor cluster-level changes. This called for a centralized solution with minimal overhead.

Fortunately, every Nutanix node already runs a custom Envoyproxy process for Prism traffic. This became the ideal place to implement reverse connection logic.

Figure 2: The Vision Figure 2: The Vision

Our solution channels all cross-cluster traffic through a reverse tunnel, enabling:

  • Strong standardization (TLS versions, cipher suites)
  • Centralized monitoring of cluster changes (node additions/removals, IP updates)
  • Minimal resource overhead by reusing existing services

Since Envoyproxy didn’t support this feature, we took on the challenge and built it ourselves. After two internal iterations, we contributed the implementation to the open-source Envoy project through GitHub Issue #33320. What started as an internal need evolved into a feature that benefits the entire Envoy community. The collaboration was incredible—the community helped refine the design and align it with Envoy’s philosophy.

Cross-cluster communication isn’t just a Nutanix problem—it’s a challenge for any distributed system operating across hybrid environments. By contributing this feature to Envoy, we’ve helped create a standardized, secure approach that others can adopt without reinventing the wheel. This means better interoperability, stronger security practices, and reduced complexity for the broader cloud-native ecosystem. If you’re curious about the details or want to see how reverse connections are implemented, check out the associated code and discussions on GitHub. It’s a great example of how real-world challenges can drive meaningful contributions to open source.

Next up

In Part 2, we’ll dive into the implementation details of how Envoy reverse connections work. Stay tuned!

 

©2026 Nutanix, Inc. All rights reserved. Nutanix, the Nutanix logo and all Nutanix product and service names mentioned are registered trademarks or trademarks of Nutanix, Inc. in the United States and other countries. Kubernetes is a registered trademark of The Linux Foundation in the United States and other countries. All other brand names mentioned are for identification purposes only and may be the trademarks of their respective holder(s).