开启辅助访问

超融合+云计算论坛

 找回密码
 立即注册

QQ登录

只需一步,快速开始

查看: 1930|回复: 2
收起左侧

[Acropolis] Acropolis Hypervisor (AHV) I/O Failover & Load Balancing

[复制链接]
发表于 2015-10-28 09:03:33 | 显示全部楼层 |阅读模式

马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。

您需要 登录 才可以下载或查看,没有帐号?立即注册

x

Many customers and partners have expressed interest in Acropolis since it was officially launched at .NEXT in June earlier this year, and since then lots of questions have been asked around resiliency/availability etc.

In this post I will cover how I/O failover occurs and how AHV load balances in the event of I/O failover to ensure optimal performance.

Let’s start with an Acropolis node under normal circumstances. The iSCSI initiator for QEMU connects to the iSCSI redirector which directs all I/O to the local stargate instance which runs within the Nutanix Controller VM (CVM) as shown below.

AHVMPdefault.jpg

I/O will always be serviced by the local stargate unless a CVM upgrade, shutdown or failure occurs. In the event one of the above occurs QEMU will loose connection to the local stargate as shown below.

AHVMPfailedlocal.jpg

When this loss of connectivity to stargare occurs, QEMU reconnects to the iSCSI redirector and establishes a connection to a remote stargate as shown below. AHVMPremote.jpg

The process of re-establishing an iSCSI connection is near instant and you will likely not even notice this has occurred.

Once the local stargate is back online (and stable for 300 seconds) I/O will be redirected back locally to ensure optimal performance.

AHVMPfailback.jpg

In the unlikely event that the remote stargate goes down before the local stargate is back online then the iSCSI redirector will redirect traffic to another remote stargate.

Next lets talk about Load Balancing.

Unlike traditional 3-tier infrastructure (i.e.: SAN/NAS) Nutanix solutions do not require multi-pathing as all I/O is serviced by the local controller. As a result, there is no multi-pathing policy to choose which removes another layer of complexity and potential point of failure.

However in the event of the local CVM being unavailable for any reason we need to service I/O for all the VMs on the node in the most efficient manner. Acropolis does this by redirecting I/O on a per vDisk level to a random remote stargate instance as shown below.

pervmpathfailover.jpg

Acropolis can do this because every vdisk is presented via iSCSI and is its own target/LUN which means it has its own TCP connection. What this means is a business critical application such as MS SQL / Exchange or Oracle with multiple vDisks will be serviced by multiple controllers concurrently.

As a result all VM I/O is load balanced across the entire Acropolis cluster which ensures no single CVM becomes a bottleneck and VMs enjoy excellent performance even in a failure or maintenance scenario.

As i’m sure you can now see, Acropolis provides excellent resiliency and performance even during maintenance or failure scenarios.

Related Posts:

1. Scaling Hyper-converged solutions – Compute only.

2. Advanced Storage Performance Monitoring with Nutanix

3. Nutanix – Improving Resiliency of Large Clusters with Erasure Coding (EC-X)

4. Nutanix – Erasure Coding (EC-X) Deep Dive

5. Acropolis: VM High Availability (HA)

6. Acropolis: Scalability

7. NOS & Hypervisor Upgrade Resiliency in PRISM

欢迎来到 【nutanix.club】最大的中文超融合&云计算社区 请记住我们的网址 www.nutanix.club [这是默认签名,更换签名点这里!]
回复

使用道具 举报

发表于 2016-4-3 17:55:36 | 显示全部楼层

Thanks for sharing
欢迎来到 【nutanix.club】最大的中文超融合&云计算社区 请记住我们的网址 www.nutanix.club [这是默认签名,更换签名点这里!]
回复 支持 反对

使用道具 举报

发表于 2018-9-28 20:00:26 | 显示全部楼层
感谢分享
欢迎来到 【nutanix.club】最大的中文超融合&云计算社区 请记住我们的网址 www.nutanix.club [这是默认签名,更换签名点这里!]
回复

使用道具 举报

您需要登录后才可以回帖 登录 | 立即注册

本版积分规则

返回顶部快速回复上一主题下一主题返回列表微信扫一