The boring life of Jerod Poore, Crazymeds' Chief Citizen Medical Expert.

Registration Problems & Expected Service Interruptions

1) There may be a problem with reCaptcha, the zen-like word pairs used in the registration process to help prevent spambots. While I have received only a couple of e-mails from people having a problem with reCaptcha, new registration is significantly down. I have no idea if any problem with reCaptcha is related to the problems I'm having with SMTP and other TCP services. If you are unable to register and did the usual stuff (made sure cookies and java are enabled, cleared your cache, etc.), just drop me a line and I'll manually register you. The link to contact me at sign-in / registration won't work until I can fix the problems the forum is still having with e-mail.

Then again the reCaptcha program could have turned mean and/or crazy. It happens.

2) Dil from tech support got back to me about fixing the monumental fuck-up I did Thursday night. It's going to take a few serious changes to the account. There will be at least one reboot of the system on top of the two (or more) that already happened since Friday evening. Expect an indeterminate number of interruptions between now and Sunday lasting a few minutes. As HTTP was never involved with the problem I hope it stays that way.

Edit as of 4:15 Mountain time: Vlad found the idiotic thing I did and it was very simple to fix. The site will be available until the next majorly stupid thing I do.

I am so the master of fucking things up


One knack from my past life as an Information Systems Technical Wizard that I haven't lost is my ability to fuck things up in new and exciting ways. Some twenty years ago when that way my actual job title (see card) I managed to find a way to cripple a System/38 that IBM thought was impossible. Lucky for them we were just across the street so they could come over and see it with their own eyes. It took about three days working around the clock to fix it.

I was the master of the worst case scenario. If not causing them, then at least imagining them. It's too bad I totally flipped out in Melbourne because I was coming up with stuff for insurance software that either would have been one of those "hindsight is 20-20" deals if I hadn't thought of it beforehand, or an event with the parameters I foresaw came to pass during the design & prototyping phase.

This time around I managed to disable FTP and telnet while trying to get fucking SMTP to work. That takes talent. Especially since I wasn't intentionally doing anything with FTP, telnet or any of the ports associated with them. OK, I was doing stuff with port 25, and one can telnet to port 25, but I don't normally telnet to port 25. The only TCP stuff I fooled around with was, to my knowledge, having to do with SMTP.

Gaaack!

I don't even know if the last change I made is what hosed it, as I had to reboot my PC because of an unrelated problem. Any of the numerous changes I made to xinetd- and SMTP-related configuration files and scripts could have hosed it.

Fortunately HTTP and MySQL are unaffected, which means all the Crazy Meds stuff is unaffected.

This all happened yesterday. I was too fried to do anything about it then. I opened a trouble ticket for an ID-ten-T keyboard issue this morning. I hope it's resolved soon.

Down, Not Across: the Sequel

Whatever the fuck is wrong with the e-mail on Crazy Meds' new server, it is making me way crazier. Can't think straight after only five hours of working on it crazier. It is really depressing. Before I was crippled by my brain cooties I would be able to stay at work until whatever wouldn't work got fixed. Sometimes that meant staying until two or three in the morning, after getting to work between six and seven the previous day. I'd arrive so early that I always had to sign in with the guard at who was at the end of his shift, so when he saw me signing out after midnight he knew something had gone thoroughly ass over tits.

I don't give a damn that it's been 17 years since I last did admin work on a small *nix system. I'm still pissed that I can't make this work.

Fucking SMTP Simple my ass.

It refuses to listen. It might have problems after that, but that's the current problem. It just won't fucking listen to the port it's supposed to. I can't telnet to it from the local host and mail is going nowhere from either the command line or from the forum.

Asking the experts on teh InterTubes isn't going to do me much good. I have a rather primitive form of Qmail. It's 1.03 (or whatever the latest version is), but I don't have any of the stuff that apparently makes everyone's lives easier, like some updated TCP software. So after biting the bullet and looking to install all of that crap, the install fails.

Why?

Because I don't have a fucking complier.

I don't even have the fucking exec commands.

It's about seven hours since I started working on it today. I can't think straight. I haven't been able to think straight for at least an hour. Eight years ago I could go 18 hours without a break before I'd start getting loopy. I could do that for ten days in a row. Three days of five hours of banging my head against the wall is severly messing with me.

Did I mention having to have the septic tank pumped along with some other plumbing work done because the basement got filled up with backwash from the tank? I'm glad there was nothing in there I wanted to keep. I still have to keep my eye on it. I'm lucky the plumber hasn't moved, retired or died. He's worked on this lemon of a house since it was built, and it still manages to surprise him. He's never before had to use the third roll of snaking coil for an interior application. He still hasn't, but he was just about to when the clog was finally dealt with. Although that could have been only the main clog. Who the fuck knows what is up with the maze of PVC, galvanized steel and token copper pipes that gives the screen saver a run for its money.

Just what we need, more down time.

Actually it's not too bad. Expect one or more of what I hope will be brief interruptions in the availability of the forum and even the entire site today (Friday, 17 July) and/or Sunday (19 July). This is to deal with the response time being so freaking long.

I'm still working on the e-mail notifications. Fortunately that should have no impact on forum and site availability.

What Fresh Hell Is This?

I know. Page load error. Connection interrupted.

The entire domain host is down.

A traceroute to the machine directly upstream from them times out. Because of all the wackiness last night I have a whole bunch of traceroutes. The San Diego colo for americanis.net is immediately upstream from the domain host I use. A current traceroute to the machine immediately before my domain host times out. A traceroute to the americanis.net box upstream from that one is successful, albeit taking another route. Traceroutes to crazymeds.us, the Crazy Meds' IP address, and various locations within my domain host all timeout at different locations after taking different routes each.

If anyone knows what's happening with americanis.net, that might be the answer to the current problem.

I'm not going to bother opening a trouble ticket with this one. When their own pages aren't displaying I think they're aware there's a problem.

Weirdest. Domain Name Problem. Ever.

Or maybe not. I think it's pretty whiskey tango foxtrot.

The first thing the domain host tech support people want to eliminate is any ID-ten-T at the keyboard problem, with emphasis on eliminate, as they can reach the site using http://crazymeds.us. I've done enough tech support myself to go along with them. Cleared the cookies, browser & DNS caches, browser history, zapped all of the temporary files (which I do on a regular basis anyway), tried multiple browsers, etc.

Nope. Just as I wrote in the trouble ticket. I could reach the site only via the IP address, some people could reach http://crazymeds.us but not http://www.crazymeds.us, some people had no problem with either.

From home when I ran a traceroute to crazymeds.us it told me crazymeds.us didn't exist. A traceroute to the IP address ended at the domain host, where the IP address resolved to a generic host name for the domain host (vps.domainhostname.com). An nslookup of the IP address returned the same generic name.

From domaintools.com a traceroute attempted to go to crazymeds.us, but timed out at the specific server that's one machine upstream from Crazy Meds at the domain host. I tried nslookup of the IP address at different places and got different results, either crazymeds.us or the generic domain host name.

From domaintools.com I could ping crazymeds.us I didn't think to ping from home.

Even weirder: from the bash shell on the server itself the damn thing didn't recognize its own name.

# traceroute crazymeds.us
traceroute: unknown host crazymeds.us

# nslookup xxx.xx.xx.xxx
;; connection timed out; no servers could be reached

After a few hours I was able to access the site again via the name. I wrote to tech support to let them know. Whatever it was, it fixed itself. That sort of thing happens. Here's the weirdest part of all, about an hour after I wrote them I get an e-mail back to me because they couldn't get to it! How often does that happen?

Weird shit like that happening to any site, even Google or Yahoo, for five or ten minutes, sure. All the time. But for over three hours? Maybe longer because I didn't try anything until 2:00 pm.

I'd love to know what caused it. The tech support guy thinks it had to do with the firewall. Unless I accidently changed something, which is entirely possible, I don't block any access to the DNS port.

In the meantime I'm working on improving the response time and I'm still trying to get the e-mail notifications of PMs, new posts, etc. working. I'm not getting much in the way of an error message when IPB has issues with e-mail. It tells me there's an error. Gee, thanks. Plex likes to scatter error logs all over the place, but I'm not having much luck in finding any details as to why the e-mail isn't happening.

Fuck if I know what happened

I've been trying to get the mail notification functions to work, so I haven't tried to log onto the forum until around 2:00 p.m. my time. That's when I saw the entire site wasn't available.

The domain name isn't resolving. I can reach it with http://216.97.239.204/. The forum won't work that way though. Some people have had some luck reaching the site via http://crazymeds.us with no www. The forum may or may not be accessible that way. I can't reach the site with just the crazymeds.us address.

I can ping crazymeds.us and www.crazymeds.us but I can't get a traceroute past vpl20-sd.lunarpages.com before the traceroute times out. The traceroute fails for both the domain name and the IP address.

I've tried restarting Apache and even a reboot, neither made a difference. Restarting Apache from the command line gives me:

[error] (EAI 3)Temporary failure in name resolution: Cannot resolve host name crazymeds.us --- ignoring!

I haven't done anything with the DNS entries since the initial setup. When I look at them they seem fine and named service is running. Everything was working well last night.

Overnight I was bombarded with a tremendous amount of spam that other people were trying to relay through the site, but as far as I can tell it wasn't delivered. I had over 6,000 e-mails in the queue and I've already deleted them. I hope nothing got through. I thought I had that door closed, but I guess not.

In any event I've opened up a ticket. This time I know to check my e-mail before passing out, which is probably going to be early.


Edit 10:08 pm Mountain time: I wanted to pass out early, but that ain't happening. We're back up. Many of you may not have noticed a thing. The problem was literally all over map as to where Crazy Meds was available and where it wasn't. I'm curious as to what caused it and all, but I'm not losing any sleep over it. I'll post some more of the weirdness later.

It's about fucking time.

The forum seems to be alive again.

There is no guarantee as to how long this will last.

I've seen a little funkiness here and there. Report any problems on the site problems board, or here if you can't log on.

E-mail notification isn't working yet, but I expected that. Due to the high volume of attempts to use my new server as a spambot, starting before I logged on for the first time, I've practically shut down the mail system.

I'll be tweaking stuff here and there. Response time is a bit slow at the moment, so I'm trying to track that down as well. Otherwise, have a part.

Where in hell is Major Kong?

So, what have a managed to accomplish since nine o'clock this morning?

Fuck All.

It looks as if the tech people reinstalled MySQL after all. I got a shitload of files all over the place, including what looks like a full dump of the databases along with a duplicate of the entire directory tree of where the MySQL data are stored. My storage skyrocketed from 2% to 5%. I guess I can keep it for as long as required.

I can't find the log entries that tell me exactly what the fucking problem is with the database. I've been trying to get better logging from MySQL and that's been pretty frustrating as well. At this point I don't know if there aren't enough data to give me something to look at or if I still don't have it right. Please, try to connect to the forum. The more failures the better. I guess.

phpMyAdmin is just fucked, though. I'd love to reinstall phpMyAdmin with a newer version, but from everything I've read, nobody seems to have had any luck getting newer versions to run with Plesk.

Fucking Russians and their fucking Plesk. You'd think in Siberia they'd have nothing better to do than get the fucking code right. Instead they were probably drinking. Or, worse yet, making sure their software would work only with some shitty knock-off of phpMyAdmin that their down-on-his-luck cousin coded.

No wonder Plesk does all sorts of random shit all the time. If it's not phpMyAdmin fucking up it's the log management function crashing. Or something else that gets pissed off because it can't find the SMTP or MySQL or DNS services running. Even though if you look at them through the Virtuozzo panel or the Linux shell the fucking services are running.

Hence the title of this entry.


The Night Shift is Taking a Look

Just got an e-mail from one of the junior system admins. He's stumped as well and is referring the problem to Parallels.

I let him know that re-installing MySQL is an option, as importing the database isn't that big of a deal now that I know about breaking up those two problematic tables into smaller chunks and have already done so.

I don't know where the Plesk software people are. It was originally developed in Novosibirsk, and they're GMT +7:00. The original Parallels is based in Switzerland, but the company that gobbled them up and assumed their name is in the Seattle area.

The Schadenfreude begins to your right.-------------------------->

It makes everything more tolerable. Really.

The Pros from Bangalore are looking at it.

The tech support team are looking into fixing MySQL. I wouldn't be surprised if I broke it, because I'm really good at breaking stuff in new and inventive ways.

I exported discrete chunks of the big files that gave me trouble from the old server. Previously I just overwrote the existing file with each export. I kept the export of the entire database. This is all just in case MySQL needs to be re-installed.

Down, Not Across: Suddenly a Viable Option

I figured out the problem regarding the blank screen. root owned everything. php didn't want to run if a root-owned file was being invoked by someone who isn't wrote. Understandable. Easy enough to fix.

So now I can see what the actual problem is.

Somehow mySQL was installed in such a way that nobody has permission to do a fucking thing.

Not root.
Not the mySQL admin account that was already installed.
Not the IPB user I created who is the DB admin for the IPB database.
Not even the mysql account in linux.

And after going through the fixes involving grant tables and everything else the only thing that chaned is Plesk won't recognize mySQL as running. Just as it earlier stopped recognizing SMTP as running.

Which means I see fuck all through Plesk.

And mySQL is still fucked up about denying access to everybody.

Why do I think the solution is re-installing mySQL? Because my life has to keep getting worse?

Of course it does.

Down, not across. Looking better all the time.

Then again, maybe not

I don't know what the fuck happened, other than my writing about it being a relatively painless process. I should have known better.

It was working this afternoon. Really, it was.

I did the easy stuff, checking permissions and the global config file. Beyond that I'm too out of it. We're on the new server and that's the way it is. It's not a server issue or anything like that. I can't bring up the admin panel, but I can bring up phpMyAdmin

I'll figure out something tomorrow.

Crazy Meds Talk Forum is up and running

We're on our slick new server. How long it will take to propagate all over the InterTubes is anyone's guess.

Forum performance is much better.

Guests are once again allowed to view the forum, and all of the old features are back.

I'll still be tweaking stuff here and there, so expect a few hiccoughs. Otherwise we're good to go.

All in all this was a relatively painless migration, accomplished in less than 48 hours and not once did I have to remind myself of alt.sysadmin.recovery's motto:

"Down, not across.".

It's been 17 years since I was the sysadmin of a box running *nix (SCO Unix, if you must know)
and it's been close to ten years since I last had a shell account on a *nix system where I could do much of anything. So this really happened a lot quicker and quieter than I had originally thought.

I still need to sleep a lot for the next couple of days. I'm not used to 'pushing' myself like this on all of these meds.

I'm fried

I'm really fried now. I'm still plugging away at trying to figure out what's up with the missing 30,000 or so posts. I don't know what the correct number is. The table in the database on the current server has ~44,000 rows, but the stats show ~38,000 posts. Either way, I get the same 13,895 rows no matter how I try to export the table.

Something is funky, because it's the same number of rows when I export the entire database or just that one table. Plus the missing rows are all in the middle somewhere. The very first and very last posts all show up.

There's a utility called bigdump. I can't process how it works. The other recommendation from IPB is no longer supported. I'm going to try exporting 10K rows at a time. That I can do without thinking.

I'm seeing people on the forum. The thing is offline. I'm the only one who should be able to poke around it now.

At least some other stuff was taken care of.

While I still can't access HTML files via the IP address, the sites shows up as expected via the VPS preview using the domain name. The reason why it all stopped working after turning on the VPS software is that document root needed to be changed from /var/www/html to an entirely new and custom directory. Surprise!

Plus the VirtualHost stanza now takes up more than a full screen.

As if often the case when the support person took a look at the DNS entries they were just fine. That was a fast self-correction too, as she got back to me in about 20 minutes.

named and syslog are running as processes, the services just don't appear to be running according to the VPS software's panel. Entries are being written to my logs. Lots of entries, as I have Apache set to debug for now.

Because I'm seeing the index.html page through the preview I'm going to assume, for now, that named is running and there's a bug in the VPS panel.

If I don't get a response about the nameserver problem by tomorrow I'll e-mail support directly.

Edit 7:18 p.m. Mountain Time: exporting 10-30K rows at a time from the two tables that were giving me problems worked. The IPB database has been migrated to the new server.

Since they sent their usual follow-up e-mail, I asked the Bangalore support team about the nameserver issue. I don't know if it's their area, but they've been pretty quick with any problem I've had.

So Far, So Bad

What do I find this morning?

1) The only reply to my ticket regarding the name server issue is that they received the ticket.

2) The DNS information on the new server is gone.

3) After I re-entered the information everything reads *crazymeds.us. instead of *crazymeds.us. Plus I'm not sure if I made the entries correctly.

4) named refuses to start.

5) HTML access via the IP address is still gone.

Edit, 10:13 Mountain time

6) I was finally able to transfer the forum's database. We're missing only 30,000 posts or so.

As of 8:00 p.m. Mountain Time

I'm trying again to import the forum MySQL database. The phpMyAdmin balked the first time. I'm just going to let it run and check it again later.

Somehow in getting all of the Plesk stuff set up I managed to clobber HTML access to the IP address. It's not an Apache thing because it didn't even get to Apache. I think it's some nameserver funkiness or something, because sometimes when referencing specific pages via the IP address I would hit the existing crazymeds.us domain.

So I'm trying to update the DNS entries with the new nameserver information. The new IP address is all set up, but the old domain refuses to let go of the existing nameservers. I opened a ticket.

The meds have told me I've had enough for today. Although it seems like nothing has happened, a lot has. I'll pick this up tomorrow.

Let the Migration Begin!

I'm starting the migration to the new server now.

What that means is the Crazy Meds Talk forum won't be available until whenever I'm done.

The static pages on medications, etc. will be available until I update the DNS entry with the new IP address. Then they'll be back up after what is I hope will be the brief time it takes to propagate said change. How much longer after that until the forum is ready is anyone's guess.

I'll post any updates as required.

Let's make it the weekend.

I asked the IPB tech people about what it takes to move from one server to another. After reviewing the instructions it doesn't look too difficult. Not easy, just not impossible.

So here's the current plan: I'll take the forum offline sometime the evening of 3 July and run a bunch of housekeeping. I'll give myself the entire weekend to get things up and running. Whenever it's done, it's done. Check here for progress reports.

If we're not up and running by a decent hour on 5 July, I'm throwing in the towel, buying the graphic control panel and paying someone at IPB to do the install. If they can't do the install before 10 July I just hope I'll have managed to save all of the data that make up the forum because I really, really don't want a version 10.