Know your Uniform Resource Locator

Posted on March 5th, 2010 in Data Visualization | No Comments »

I work for a company that does a fair amount of web crawling (no, not that one), and recently there has been an ongoing discussion between the engineering side and the business side about various sundry details of URL validation. On a whim, I created this diagram (among others) to help facilitate that discussion.

A diagram of the various parts of a URL

Click for a larger image.

Meetings are like goldfish

Posted on February 25th, 2010 in Random Observations | No Comments »

They expand to fit the space allotted.

2009 Dice Career Fair in Seattle

Posted on March 24th, 2009 in Career, Rants | 1 Comment »

Fun fact: Last month I was laid off from my job at a startup, along with about half the company. Now, I’ve been sending out resumes and interviewing, and although I’ve never been to a career fair before, I figured I might as well go just in case. Why not, right? Well, I’ll tell you why not.

The thing ran from 11 AM to 3 PM, and I got there around 10:45 only to see a very long line forming. The line stretched longer and longer leading up to 11; at least 100 people were there when it opened.

As we slowly shuffled up to the registration table, I came to the realization that there were a grand total of four companies at this so-called “career fair”. FOUR. In the e-mail about the event, they had a list of companies but I thought it was a representative sample—not the entire list.

Isn’t there some agreed-upon or understood minimum number of participating companies for these things? When the entire career fair could fit inside a hotel room, as opposed to a ballroom, I think it’s time to call it quits.

The utterly depressing thing is that almost every candidate there was a middle-aged man in a suit. These are guys that have families to feed, all competing over what amounts to scraps… the desperation was pretty palpable.

Anyway, I just left.

Quick tip: Converting DMG to ISO

Posted on December 6th, 2008 in Bash, Code, Mac OS X | No Comments »

Save this as dmg2iso and run from the terminal:

#!/bin/bash

if [ -z "${1}" ]; then
    echo "Usage: ${0##*/} <file>"
    exit 1
fi

FILE=${1%.dmg}
hdiutil makehybrid ${FILE}.dmg -o ${FILE}

Zend_Search_Lucene: Not enterprise-ready

Posted on November 7th, 2008 in PHP, Zend Framework | 4 Comments »

Zend Framework has been attracting more and more attention from the PHP community lately, and while it lacks certain things (like code generation) that other frameworks (like Rails) have implemented to great effect, Zend Framework 2.0 is slowly taking shape and it looks like it will be the framework of choice for startups and enterprises alike. (Yes, it will even have code generation.)

But despite having several “enterprise-ready” components, I’ve found that one in particular is not: Zend_Search_Lucene, Zend Framework’s native PHP implementation of Apache Lucene, written in Java.

Don’t get me wrong; Zend_Search_Lucene is great for a small site or blog. However, from extensive personal experience, it is not appropriate for a site with a medium or large index. I think this should be noted upfront in the documentation.

Against my better judgment, the company I work for migrated our previous search solution to Zend_Search_Lucene. On pretty heavy-duty hardware, indexing a million documents took several hours, and searches were relatively slow. The indexing process consumed vast amounts of memory, and the indexes frequently became corrupted (using 1.5.2). A single wild card search literally brought the web server to its knees, so we disabled that feature. Memory usage was very high for searches, and as a result requests per second necessarily declined heavily as we had to reduce the number of Apache child processes.

We have since moved to Solr (a Lucene-based Java search server) and the difference is dramatic. Indexing now takes around 10 minutes and searches are lightning fast. What a difference a language makes.