<!-----kanoodle cookies-----> <SCRIPT language="JavaScript1.1" type="text/javascript" src="http://context5.kanoodle.com/cgi-bin/ctpub_adserv.cgi?id=85039742&site_id=85039743&format=conly"></SCRIPT> <!-----kanoodle cookies-----> <body> <body bgcolor="#8F8F6B">
 

Home

StatCounter

Wednesday, March 30, 2005

Drew Curtis' FARK.com - Fark farked. Again.

farkcomputersdown_002
(click pic to see a bigger version, w00t)

Here's the scoop:

The Adaptec 2110S RAID card in Fark's database server is dying.
Now the machine won't even POST half the time, or when it does
run, locks up during the SCSI BIOS init, or crashes often with
filesystem errors.

Based on some experience with one at work, and some info on Google
suggesting it was upwards compatible, we sent an Adaptec 2120S
to replace it. It looks like the upwards compatible thing was
a bunch of crap though, and it won't see the old array -- so
we're going to have to reformat and reinstall everything. (Or
rather Servint's going to do it for me based on my partitioning
directions, since I can't make the 9 hour drive anytime in the
next few weeks...)

I was hoping for Saturday afternoon, but that's not going to happen.
We are now scheduled for Tuesday, so, expect some downtime.

---

Several more attempts were made to get the 2120S to see the array.
Not having a copy of the bootable CD that might do it, we just decided
to wipe and reformat and reinstall everything (after taking three
final backups). That seemed to go fine, until the card started
throwing tons of SCSI write errors and timeouts while restoring the
backup, hinting at a bad drive, a bad cable, or bad drive firmware.
After failing to find newer drive firmware from Fujitsu, we decided
to try replacing ALL the drives. It's taking a few hours to get the
array initialized, then the OS will be reinstalled and the backup
restored on top of it.

....and now that that's done, we've hit another hard drive firmware
issue, this time one with a known fix -- we're waiting on Seagate
to send the patch now. No write errors this time, just timeouts.

OK. Drives are patched. We're still hitting timeouts but it's
always when restoring the exact same file, consistently -- which
makes no sense at all (it restores fine on another machine from
the same source archive). Anyway, we skipped that non-critical
file and are restoring everything else now -- the OS is restored,
the database about 10% in as of 5:15 pm ET. With luck we'll be
up around midnight.

We may be up sooner than that; the database is importing faster
than expected (finally something goes right for once). But it
may run slow because connectivity is a bit slower in their
testlab for some reason, maybe a ethernet duplex mismatch or
something.

---

Mike



Too many words. Fark isn't completely down, photoshops y boobies still abound.

(Check me out, that was slick...)
--
Link
Contact SnarkySpot