"And in the subsequent five years, I got through fifteen of them." Jim's job wasn't defined by the list, however, just as I was about to learn that mine wasn't defined by my somewhat generic "marketing manager" title. As Jim pointed out, "When there were problems to be solved, whoever could solve them did, regardless of what their official title was."
CableFest '99
"We've got some work to do at our data center on Saturday," Cindy informed all of us in the marketing group toward the end of my first week on the job. "Bring warm clothes, because I understand it can get a bit chilly in there." It was our formal invitation to "volunteer" at Google's CableFest '99.
I was no expert on computer hardware. I had read an article or two about servers, hubs, and routers, but I pronounced "router" as if it rhymed with "tooter" instead of "outer." Given my profound lack of technical expertise and my bad computer karma, why would any company allow me in the same room as its computational nerve center? That requires a bit of explanation.
In late 1999, Google began accelerating its climb to market domination. The media started whispering about the first search engine that actually worked, and users began telling their friends to give Google a try. More users meant more queries, and that meant more machines to respond to them. Jim and Schwim worked balls-to-the-wall to add capacity. Unfortunately, computers had suddenly become very hard to get. At the height of the dot-com madness, suppliers were so busy with big customers that they couldn't be bothered fending off the hellhounds of demand snapping at Google's heels. A global shortage of RAM (memory) made it worse, and Google's system, which had never been all that robust, started wheezing asthmatically.
Part of the problem was that Google had built its system to fail.
"Build machines so cheap that we don't care if they fail. And if they fail, just ignore them until we get around to fixing them." That was Google's strategy, according to hardware designer Will Whitted, who joined the company in 2001. "That concept of using commodity parts and of being extremely fault tolerant, of writing the software in a way that the hardware didn't have to be very good, was just brilliant." But only if you could get the parts to fix the broken computers and keep adding new machines. Or if you could improve the machines' efficiency so you didn't need so many of them.
The first batch of Google servers had been so hastily assembled that the solder points on the motherboards touched the metal of the trays beneath them, so the engineers added corkboard liners as insulation. It looked cheap and flimsy, but it prevented the CPUs (central processing units) from shorting out. Next, Larry focused on using space more efficiently and cutting out as many expensive parts as possible. He, Urs, and a couple of other engineers dumped out all the components on a table and took turns arranging the pieces on the corkboard tray like a jigsaw puzzle. * Their goal was to squeeze in at least four motherboards per tray. Each tray would then slide into a slot on an eight-foot-tall metal rack. Since servers weren't normally connected to displays, they eliminated space-hogging monitor cards. Good riddance—except that when something died the ops staff had no way to figure out what had gone wrong, because they couldn't attach a monitor to the broken CPU. Well, they could, but they'd have to stick a monitor card in while the machine was live and running, because Larry had removed the switches that turned the machines off.
"Why would you ever want to turn a server off?" he wondered. Perhaps because plugging a monitor card into an active computer could easily short out the motherboard, killing the whole machine.
After the engineers crammed four boards onto each tray, the one in the back couldn't be reached from the front. To fix it the technician would have to pull the tray out of the rack, but the trays were packed so tightly