A little geek-talk for today. 🙂 Here are some interesting facts learned by a novice webmaster.

I signed up with my hosting company this March. Ever since then, I’ve learned a few things about running a website by reading Apache log file. Aside from identifying all kinds of Microsoft IIS virus calls, I’ve also noticed the interesting trails made by “spiders”. A spider is usually used by a search engine such as AltaVista and Google, and it will index and cache every .htm page on my site.

At the very beginning I thought someone was just going through my site page by page. Wow! How interesting! A Fan! 🙂 Soon I noticed the lack of any image loading. Since 80% of my site is made up of images, it seemed rather odd. So i did a search on my favorite search engine and learned the truth about “user agent” (sounds like Matrix, huh?). Mean while, i learned the beauty of internet, which is– everyone can say what he wants, and every host has the right to deny any kind of access he doesn’t like. Soon robots.txt comes to the rescue. When any of the well-known search engines comes knocking on my door, my website will give the stranger my copy of robots.txt that tells it to go away. It does. What a beautiful thing! 🙂

But, there are search engines who don’t follow this convention. They do not ask for robots.txt, instead, they come in uninvited and start indexing my pages anyway. It is a little rude. So taught me another way, .htaccess– which manages a deny list of all ips. So i sent those rude IPs to a HTTP 403 Error.

Today, a new IP did just that. I dutifully put it on my deny list and then checked out where it is from.

NAME: QWEST-63-148-99-224

So i felt a little uneasy. Arlington, Virginia! Does this has anything to do with The Big Brother??? So I again went on and found out I’m not alone. Quite a few people have complained about this ip and this particular quote made me smile:

# Tell the folks at Cyveillance to take a long walk off of a
# short pier. If they can’t be bothered to read and honor
# the robots.txt file, the heck with them.

RewriteCond %{REMOTE_ADDR} ^$
RewriteRule ^.*$ [R,L]

For further details, go to, look up the Apache
web-server documentation, and read up on the mod_rewrite module and
the RewriteCond and RewriteRule commands.

That’s why i HEART the internet. 🙂