-->
@PSUStevens headshot

@PSUStevens blog

You are reading the blog of @PSUStevens.
You can reach me through one of the social accounts below.

No Storage Replication Adapters Installed

In this blog post I'm going to cover a puzzling VMware Site Recovery Manager error that was more challenging to fix than I expected

PSUStevens

5 minutes read

A frustrated IT guy kneeling on both knees in his office

As part of my job, I regularly demonstrate Pure Storage features and how VMware solutions integrate seamlessly with our storage platform. Before a demo, I usually will confirm the software I intend to use is in good working order, but this isn’t always the case. The demo lab is used by many of my colleagues. This means that during their use of the lab something could have stopped working or is in a wonky state. So, Murphy’s Law could rear its ugly head at the most inconvenient time if I cannot confirm all is well beforehand.

Hello Murphy!

Recently, I was scheduled to demo the integration of Pure Storage replication features with VMware Live Site Recovery (LSR). For those who are unfamiliar with LSR, it orchestrates the ability to failover VMs for testing purposes to another site, physical or virtual, or when an actual DR failover is required. Live Site Recovery was previously known as Site Recovery Manager (SRM).

A few days prior to my scheduled customer meeting, I performed a test failover to confirm that there were no issues. On demo day, I logged into the lab and the LSR virtual appliance and clicked the “Test” button to show how the integration works. WHAM! It failed! No matter what I did to save my demo, it wouldn’t work. I profusely apologized to the customer and the account team. I shamelessly had to reschedule for another day.

Time to Investigate

When the failure occurred, there was a strange error about “No space left on device.” See a snippet of the error below:

Failed to create snapshots of replica devices. 
SRA command 'testFailoverStart' failed. 
Internal error: Unhandled exception. 
System.IO.IOException: No space left on device : '/srm/sra/log/testFailoverStart_2025-02-09-18-04-27-1254563-b6a5c7f8-d576-429e-a363-b0498c6cb701.log' at System.IO.RandomAccess...

So, the following day, I started investigating. None of our storage arrays were anywhere near full, so I could take this off the list of issues to investigate. I logged into both of the SRM appliances and noticed this error on SRM02, the “Production Site.”

Unable to find SRA at the paired site

I saw this error on SRM03, the “DR Site.”

No storage replication adapters installed

I logged into SRM03 and began poking around.

I clicked the option to “Rescan Adapters,” but that failed every time. I SSH’d into the appliance and looked for the directory mentioned in the error message. I couldn’t find that directory anywhere on the appliance. Next, I logged into the administrative interface of the appliance on port 5480. I rescanned the adapter again without success. Then, I deleted the SRA and reinstalled it. No change.

I was really annoyed at this point. Next, I thought there might be an issue with the SRA needing to communicate with vCenter. So, I changed some passwords on the vCenter appliances. I went back to SRM03 and rescanned the adapter.

Success!!!

I quickly ran a failover test. Failure!!! Ahhh, Come on!!

At this point, I had been working on this for several hours over the weekend and decided to take a break and try again the following day with a clear mind.

The Next Day

The next day, I picked up where I left off. Except this time, I decided to do another round of web searches to see if I missed a result during one of my previous searches.

That’s when the following link showed up in my search:

VLSR - SRM services stop due to unmanaged SRA log files https://knowledge.broadcom.com/external/article/313050

It turns out this was the answer. Here’s a summary of that article:

When you install a new SRA, it is stored in a directory on a dedicated “support” partition on the SRM appliance. Within the SRA directory are SRA logs when a failover or failover test is performed. If the SRA isn’t cleaning up after itself, the support partition can fill up. This isn’t surprising in our demo lab. I will do several test failovers in demos. So, I can understand when the SRA can’t keep up cleaning up after itself.

I SSH’d into the SRM03 and ran the df -h command to confirm this is the case:

root@srm [ ~ ]# df -h
Filesystem                      Size  Used Avail Use% Mounted on
devtmpfs                        4.0M     0  4.0M   0% /dev
tmpfs                           4.9G   44K  4.9G   1% /dev/shm
tmpfs                           2.0G  856K  2.0G   1% /run
tmpfs                           4.0M     0  4.0M   0% /sys/fs/cgroup
/dev/sda4                        14G  5.5G  7.4G  43% /
tmpfs                           4.9G  488K  4.9G   1% /tmp
/dev/sda2                       238M   36M  190M  16% /boot
/dev/mapper/support_vg-support  3.9G  555M  3.2G  15% /opt/vmware/support
/dev/loop0                      378M  358M     0 100% /opt/vmware/support/logs/srm/SRAs

The last line in the output says it all. The support partition was full. I also now knew where the SRA logs were stored on the appliance.

The Answer

  1. SSH into the appliance as admin using the proper password.
  2. Once logged in escalate to the root user. su root
  3. After doing that change to the /opt/vmware/support/logs/srm/SRAs directory.
  4. You will find at least one directory starting with sha256_ followed by a bunch of characters. This is the folder where the SRA logs are stored. You may find other sha256_ directories. These are log directories for previous SRAs.
  5. Next, go through the directory and clean up old log files. I deleted a ton of old log files and reduced the support partition to 25% used.
  6. Finally, login to the administrative interface of the appliance and rescan adapters. This should succeed.

I confirmed SRM02 could see the SRA from SRM03 and vice versa. I ran a test failover, which worked as expected. Problem solved!


Additional Info

The KB link I discovered on how to fix this problem suggests implementing a CRON job to ensure the logs are regularly cleaned up. If this was a production environment, then I would have implemented that CRON job suggestion. I decided not to do this because I will be installing a newer version of SRM, err, Live Site Recovery (LSR), in the near future.


Summary

In this post, I covered an annoying SRM issue that wasn’t obvious on how to resolve and the steps I took to fix it. Hopefully, you found this post helpful.

Thanks for following along.

Recent posts

See more

Categories

About

This is my personal blog about technical topics including virtualization, storage, networking, backups, and some random IT stuff that strikes my fancy.