ARe you parsing the raw html? Fetching it with an xml parser might help. That's how our bot system searches websites for content.Wouldn't really fix the issues I'm working with.
Now, I'm getting some kind of weird error. The output shows that a character is the first one, and I removed all the spaces. Yet the character isn't grabbed unless I grab the second character...