LSI MegaCli replace failed drive

Recently I had a Western Digital NAS Red 3TB HDD failed in my server. Server HDDs are connected via MegaRAID SAS 9271-8i HW RAID controller. I have a spare 3TB HDD standing by. But when I tried to set the failed HDD as “removal”… it keeps automatically go into rebuild mode. It does this both in WebBIOS and CLI. It was annoying. I found out the LSI Auto-Rebuild was enabled. Below are MegaCli commands I used to replace the failed HDD.

(This failed HDD isn’t even an year old. I purchased in May of 2017.)

 

Turn off LSI Auto-Rebuild first.

/opt/MegaRAID/MegaCli/MegaCli64 -AdpAutoRbld -Dsbl -a0

See which RAID array have the failed HDD.

/opt/MegaRAID/MegaCli/MegaCli64 -LDInfo -Lall -a0
Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name :RAID10
RAID Level : Primary-1, Secondary-0, RAID Level Qualifier-0
Size : 8.185 TB
Sector Size : 512
Is VD emulated : Yes
Mirror Data : 8.185 TB
State : Degraded
Strip Size : 256 KB
Number Of Drives per span:2
Span Depth : 3
Default Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if Bad BBU
Current Cache Policy: WriteBack, ReadAhead, Cached, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy : Enabled
Encryption Type : None
Bad Blocks Exist: No
PI type: No PI

Locate the failed HDD.

/opt/MegaRAID/MegaCli/MegaCli64 -PDList -a0

In my case it was located in [252:2].

Enclosure Device ID: 252
Slot Number: 2
Drive's position: DiskGroup: 0, Span: 0, Arm: 0
Enclosure position: N/A
Device Id: 9
WWN: *****
Sequence Number: 3
Media Error Count: 0
Other Error Count: 0
Predictive Failure Count: 0
Last Predictive Failure Event Seq Number: 0
PD Type: SATA

Raw Size: 2.728 TB [0x15d50a3b0 Sectors]
Non Coerced Size: 2.728 TB [0x15d40a3b0 Sectors]
Coerced Size: 2.728 TB [0x15d400000 Sectors]
Sector Size: 512
Logical Sector Size: 512
Physical Sector Size: 4096
Firmware state: Failed
Commissioned Spare : No
Emergency Spare : No
Device Firmware Level: 0A82
Shield Counter: 0
Successful diagnostics completion on : N/A
SAS Address(0): 0x4433221101000000
Connected Port Number: 1(path0)
Inquiry Data: WD-*****WDC WD30EFRX-68EUZN0 82.00A82
FDE Capable: Not Capable
FDE Enable: Disable
Secured: Unsecured
Locked: Unlocked
Needs EKM Attention: No
Foreign State: None
Device Speed: 6.0Gb/s
Link Speed: 6.0Gb/s
Media Type: Hard Disk Device
Drive: Not Certified
Drive Temperature : N/A
PI Eligibility: No
Drive is formatted for PI information: No
PI: No PI

Set the HDD offline.

/opt/MegaRAID/MegaCli/MegaCli64 -PDOffline -PhysDrv [252:2] -a0

Mark the HDD missing.

/opt/MegaRAID/MegaCli/MegaCli64 -PDMarkMissing -PhysDrv [252:2] -a0

Set missing HDD as prepared for removal.

/opt/MegaRAID/MegaCli/MegaCli64 -PDPrpRmv -PhysDrv [252:2] -a0

Shutdown server and replace the failed HDD.

Turn LSI Auto-Rebuild back on.

/opt/MegaRAID/MegaCli/MegaCli64 -AdpAutoRbld -Enbl -a0

The RAID controller should start the rebuilding process. You can monitor the progress. I used ‘watch’ to refresh status every five seconds.

watch -n 5 "/opt/MegaRAID/MegaCli/MegaCli64 -PDRbld -ShowProg -physdrv[252:2] -a0"

 

Once I pulled the failed HDD I connect it to an external enclosure, hook it up to my desktop and ran SMART tests. It indeed had failures.

Too bad I have to pay for shipping to get it RMA with Western Digital. I just didn’t expect a NAS drive to fail so quickly.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.