Recently I had a Western Digital NAS Red 3TB HDD failed in my server. Server HDDs are connected via MegaRAID SAS 9271-8i HW RAID controller. I have a spare 3TB HDD standing by. But when I tried to set the failed HDD as “removal”… it keeps automatically go into rebuild mode. It does this both in WebBIOS and CLI. It was annoying. I found out the LSI Auto-Rebuild was enabled. Below are MegaCli commands I used to replace the failed HDD.
(This failed HDD isn’t even an year old. I purchased in May of 2017.)
Turn off LSI Auto-Rebuild first.
/opt/MegaRAID/MegaCli/MegaCli64 -AdpAutoRbld -Dsbl -a0
See which RAID array have the failed HDD.
/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -a0
Adapter 0 -- Virtual Drive Information: Virtual Drive: 0 (Target Id: 0) Name :RAID10 RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0 Size : 8.185 TB Sector Size : 512 Is VD emulated : Yes Mirror Data : 8.185 TB State : Degraded Strip Size : 256 KB Number Of Drives per span:2 Span Depth : 3 Default Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if Bad BBU Current Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if Bad BBU Default Access Policy: Read/Write Current Access Policy: Read/Write Disk Cache Policy : Enabled Encryption Type : None Bad Blocks Exist: No PI type: No PI
Locate the failed HDD.
/opt/MegaRAID/MegaCli/MegaCli64 -PDList -a0
In my case it was located in [252:2].
Enclosure Device ID: 252 Slot Number: 2 Drive's position: DiskGroup: 0, Span: 0, Arm: 0 Enclosure position: N/A Device Id: 9 WWN: ***** Sequence Number: 3 Media Error Count: 0 Other Error Count: 0 Predictive Failure Count: 0 Last Predictive Failure Event Seq Number: 0 PD Type: SATA Raw Size: 2.728 TB [0x15d50a3b0 Sectors] Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors] Coerced Size: 2.728 TB [0x15d400000 Sectors] Sector Size: 512 Logical Sector Size: 512 Physical Sector Size: 4096 Firmware state: Failed Commissioned Spare : No Emergency Spare : No Device Firmware Level: 0A82 Shield Counter: 0 Successful diagnostics completion on : N/A SAS Address(0): 0x4433221101000000 Connected Port Number: 1(path0) Inquiry Data: WD-*****WDC WD30EFRX-68EUZN0 82.00A82 FDE Capable: Not Capable FDE Enable: Disable Secured: Unsecured Locked: Unlocked Needs EKM Attention: No Foreign State: None Device Speed: 6.0Gb/s Link Speed: 6.0Gb/s Media Type: Hard Disk Device Drive: Not Certified Drive Temperature : N/A PI Eligibility: No Drive is formatted for PI information: No PI: No PI
Set the HDD offline.
/opt/MegaRAID/MegaCli/MegaCli64 -PDOffline -PhysDrv [252:2] -a0
Mark the HDD missing.
/opt/MegaRAID/MegaCli/MegaCli64 -PDMarkMissing -PhysDrv [252:2] -a0
Set missing HDD as prepared for removal.
/opt/MegaRAID/MegaCli/MegaCli64 -PDPrpRmv -PhysDrv [252:2] -a0
Shutdown server and replace the failed HDD.
Turn LSI Auto-Rebuild back on.
/opt/MegaRAID/MegaCli/MegaCli64 -AdpAutoRbld -Enbl -a0
The RAID controller should start the rebuilding process. You can monitor the progress. I used ‘watch’ to refresh status every five seconds.
watch -n 5 "/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -physdrv[252:2] -a0"
Once I pulled the failed HDD I connect it to an external enclosure, hook it up to my desktop and ran SMART tests. It indeed had failures.
Too bad I have to pay for shipping to get it RMA with Western Digital. I just didn’t expect a NAS drive to fail so quickly.