Bot Detection with PHP

Avatar
By:
Checking Credit Card

After a long hiatus, we have a new tutorial online that shows how to detect bots with PHP. Check it out:

In the ongoing work of my PHP CMS, AdaptCMS, one thing I have
basically worked on every version is stats. Every time someone loads a
page, info is stored (date, ip, browser, os, username, referral, etc.).
I either find a new way to make the stats more accessible or a new bit
of important info. Recently I noticed some websites I was visiting were
displaying bots currently at the site (“Googlebot“, “MSN Bot”, etc.)
and found out something new and important to keep track of.

Bot Detection with PHP – Tutorial [View]

http://www.insanevisions.com/tutorials/bot_detection.phps

In the ongoing work of my PHP CMS, AdaptCMS, one thing I have
basically worked on every version is stats. Every time someone loads a
page, info is stored (date, ip, browser, os, username, referral, etc.).
I either find a new way to make the stats more accessible or a new bit
of important info. Recently I noticed some websites I was visiting were
displaying bots currently at the site (“Googlebot”, “MSN Bot”, etc.)
and found out something new and important to keep track of.

Bots or Crawlers, are basically search engines crawling around the
internet. This is how you get your pages on search engines (well, a
major way). Bot Detection isn’t something super vital, but if your CMS
or website already has practically everything, then you need Bot
Detection.

The Function

$bot_list = array(“Teoma”, “alexa”, “froogle”, “Gigabot”, “inktomi”,
“looksmart”, “URL_Spider_SQL”, “Firefly”, “NationalDirectory”,
“Ask Jeeves”, “TECNOSEEK”, “InfoSeek”, “WebFindBot”, “girafabot”,
“crawler”, “www.galaxy.com”, “Googlebot”, “Scooter”, “Slurp”,
“msnbot”, “appie”, “FAST”, “WebBug”, “Spade”, “ZyBorg”, “rabaz”,
“Baiduspider”, “Feedfetcher-Google”, “TechnoratiSnoop”, “Rankivabot”,
“Mediapartners-Google”, “Sogou web spider”, “WebAlta Crawler”);

function detect_bot() {

global
$bot_list;

foreach($botlist as $bot) {

if(
ereg($bot, $_SERVER[‘HTTP_USER_AGENT’])) {

$thebot = $bot;

}

}

if ($bot) {

return
$thebot;

}

}

?>

Really Bot Detection isn’t as difficult as you might guess. The SERVER variable $_SERVER[‘HTTP_USER_AGENT’] has in it a lot of cool info on the visitor such as the Browser they are using and if it’s a bot/crawler, the name of the Bot.

First let me say that this is a portion of the function that will be
in AdaptCMS and if you want to put the bot list inside the function,
feel free. Secondly, let me explain what’s going on in the function.

Breaking Down the Code

foreach($botlist as $bot) {

I don’t think I need to explain a basic array or function tag, so
let’s begin with the foreach and ereg. First off, with the foreach, we
are simply taking the $bot_list array (which contains a list of
bots/crawlers) and going through them one by one, except the foreach
also changes the name to $bot as you will find in the next bit of code.

if(ereg($bot, $_SERVER[‘HTTP_USER_AGENT’])) {

With the ereg PHP function, you basically are searching a variable/array. In this case we want to find if the SERVER variable $_SERVER[‘HTTP_USER_AGENT’] contains any mention of a bot. With the $bot_list variable we have a
list we want to go through, foreach then goes through the array one by
one and now with the ereg we are looking to see if we can match a bot.
If so, then we set the variable $thebot to contain the name of the bot,
which in the end, we return. Here is an example to use it on your
website:

include (“bot_detection.php”);

if (bot_detection()) {

echo
“Hey, you’re a bot! What’s up “.bot_detection().“?”;

}

?>

Conclusion

Most of the tutorials I’ve written so far have mainly been on a
broad subject, nice to talk about something not ordinary and especially
something that should be standard in CMS’s. With this little function
you can use on your website or CMS and enables you to keep track of the
Bot’s/Crawlers on your website. Happy Coding!

>