Background
When you’re running your own hardware you need to carefully consider disk redundancy. Apart from healthy current backups, disk redundancy is one of the most important things you’ll ever do when running a server setup.
This articles introduces some concepts around RAID, which is what people use for disk redundancy. It also has specific focus on the Broadcom LSI MegaRAID controllers that seems to be universal in data centres for owned / bare metal hardware. Finally the article moves on to help you actively monitor RAID. It doesn’t help you have RAID set up but in the data centre your disks are failing and you’re blissfully unaware. When RAID fails the busy administrator needs to be be acutely aware that they must go and change the disk. This will require a lot of insight into where is what and what is the status.
Why did we make this article? The reason is there are many concepts and finding good and pertinent information on Google is a mess. Not only does Broadcom have poor documentation, but other articles on the internet is totally overwhelming with technical jargon and lingo whereas the busy administrator just wants to skip to “getting it to work”.
RAID Levels
Let’s start with an important point here, namely a quick summary of RAID levels:
RAID Level | Number of Disks | Use | Comment |
---|---|---|---|
RAID 0 | 2 | Performance (striping) | No redundancy |
RAID 1 | 2 | Redundancy | Mirrored |
RAID 5 | 3 | Performance + Redundancy | Most common because of blend of benefits |
RAID 6 | 4 | Performance + Redundancy | Two disk can fail at the same time. Clearly more expensive. Most reliable if you really think two disk can fail at once. |
RAID 10 | 4 | Performance + Redundancy | Combo of RAID 0 and 1. One disk can fail. |
The most common RAID in our opinion is RAID 5, because it uses the least amount of disk but offers redundancy and performance benefits. Please comment if you have an opinion about this.
Checking your system for RAID
Let’s move on to a few commands that allows one to check what RAID controller is installed.
Here we are checking four systems named P, 7, M, and W.
# lspci | grep -i raid 02:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS 2108 [Liberator] (rev 05)
# lspci | grep -i raid 17:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02)
# lspci | grep -i raid 03:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02)
# lspci | grep -i raid 03:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS 2108 [Liberator] (rev 05)
For reference, all four these servers were ordered across two different data centres over a long period of time.
From this output we can learn a few things:
- All cards are Broadcom LSI MegaRAID products
- Two are “Liberator” models 2108 and two are “Invader” models 3108
Communicating with the RAID cards
Next you would want to use a hardware utility to interrogate the drives. This is where is gets very confusing. The utility is called “MegaCLI” and finding the current version can be a nightmare. To make it easy, here is the link:
https://www.broadcom.com/support/download-search?dk=megacli
As of 06 Apr 2023 this search products no less than 62 results. You have to expand the Management Software and Tools section as per the screenshot below:
Yep. I made a square around the date. Imagine, it’s 2023 and you’re working on your mission critical hardware but you have to download a driver dated from 2014. This is really true, have a big swig of coffee.
Next Broadcom have made it really difficult to download, you can just click the link, you have to accept terms. So if you just want to wget it from a server, you can’t. Thanks Broadcom.
Then you have to decompress the files.
Then your path for Linux (RPM based) is easy, but Ubuntu is more complicated.
sudo alien -k --scripts MegaCli-8.07.14-1.noarch.rpm
Potential Errors
On some systems you might encounter some of these errors:
# /opt/MegaRAID/MegaCli/MegaCli64 /opt/MegaRAID/MegaCli/MegaCli64: error while loading shared libraries: libncurses.so.5: cannot open shared object file: No such file or directory
It appears that MegaRAID from 2014 uses ncurses 5, not 6:
# rpm -qi ncurses Name : ncurses Version : 6.1
CentOS RPM instructions
# rpm -Uvh MegaCli-8.07.14-1.noarch.rpm Verifying... ################################# [100%] Preparing... ################################# [100%] package MegaCli-8.07.14-1.noarch is already installed
Reference
https://phoenixnap.com/kb/how-to-set-up-hardware-raid-megacli