Optimizing phpBB 2.0 for Search Engines, Page Rank, and Security
DeveloperSide.NET Articles
Search Engines -- Getting your Forums Indexed
Problem: Search Engine bots (spiders, crawlers) will not index pages that contain Session IDs (sid) in URLs.
Example of URLs that will not be indexed...
http://forums.domain.com/index.php?sid=c80e688fbf4ec5347f170b3e4r2067b7
http://forums.domain.com/viewtopic.php?t=1689&sid=c80e688fbf4ec5347f170b3e4r2067b7
Example of URLs that will be indexed...
http://forums.domain.com/index.php
http://forums.domain.com/viewtopic.php?t=1689
*Note that Session IDs are normally stored in cookies; otherwise they are transferred via the URL. For Session IDs to be visibly present in the URL, cookies have to be turned off under your browser's settings.
Solution: Selectively remove Session IDs from URLs.
Method one: Remove Session IDs for specific Search Engine bots by recognizing their 'User-Agent' HTTP header strings.
Example of 'User-Agent' strings that are received on every HTTP request...
- Google -- "Googlebot/2.1 (+http://www.google.com/bot.html)"
- MSN -- "msnbot/1.0 (+http://search.msn.com/msnbot.htm)"
- Yahoo -- "Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)"
Benefits: This method does not remove Session IDs from non-logged in users (guests); allowing guests to posts.
Downside:
(1) While the 'User-Agent' strings of the major Search Engine bots are known, some lesser-known bots will be missed.
(2) The 'User-Agent' string of a bot can change from time to time; an updated list must be kept.
Edit file 'includes/sessions.php'
Replace function...
(last function in file)
{
global $SID;
if ( !empty($SID) && !preg_match('#sid=#', $url) )
{
$url .= ( ( strpos($url, '?') != false ) ? ( ( $non_html_amp ) ? '&' : '&' ) : '?' ) . $SID;
}
return $url;
}
with function...
function append_sid($url, $non_html_amp = false){
global $SID;
if ( !empty($SID) && !preg_match('#sid=#', $url) && !strstr($_SERVER['HTTP_USER_AGENT'] ,'Googlebot') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'msnbot') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'Slurp') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'almaden.ibm.com') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'zyborg') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'Jeeves') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'crawler') && !strstr($_SERVER['HTTP_USER_AGENT'] ,'spider') )
{
$url .= ( ( strpos($url, '?') != false ) ? ( ( $non_html_amp ) ? '&' : '&' ) : '?' ) . $SID;
}
return $url;
}
Method two: Remove Session IDs for all non-logged in (guest/anonymous) users.
Benefits: All Search Engine bots will be able to crawl and index forum.
Downside: Users will need to be registered and logged-in to have the ability to post.
Under Administration Panel -- Forum Admin -- Permissions : switch all forums to "Registered"
Edit file 'includes/sessions.php'
Replace line...
$SID = 'sid=' . $session_id;with line...
if ( $userdata['session_user_id'] != ANONYMOUS ){ $SID = 'sid=' . $session_id; } else { $SID = ''; }Search Engines -- robots.txt
Problem: Search Engine bots will try to index all available pages/links under the forum. Some of these pages/links have no value, can be harmful to page rank, and should not be indexed.
Solution: Create phpBB forum root level file 'robots.txt'; specifying pages/links not to be indexed.
Method:
There are only 3 pages/links that are beneficial to page rank, that should be indexed...
- /index.php
- /viewforum.php
- /viewtopic.php
Every other page/link should be disallowed.
View phpBB root level directory/file structure; disallow everything but the above 3 pages/links...
Contents of 'robots.txt'...
User-agent: * Disallow: /admin/ Disallow: /db/ Disallow: /images/ Disallow: /includes/ Disallow: /language/ Disallow: /templates/ Disallow: /common.php Disallow: /config.php Disallow: /faq.php Disallow: /groupcp.php Disallow: /login.php Disallow: /memberlist.php Disallow: /modcp.php Disallow: /posting.php Disallow: /privmsg.php Disallow: /profile.php Disallow: /search.php Disallow: /viewonline.php
The first line specifies a match for all Search Engines.
The following lines state that any link that starts with the given text should not be indexed.
Page Rank
Problem: The most valuable (local) text for page rank is located in the title of the page. phpBB adds text to the title of a page that takes up valuable space.
Example:
- URL /index.php : "SITENAME :: Index"
- URL /viewforum.php : "SITENAME :: View Forum - forum name here"
- URL /viewtopic.php : "SITENAME :: View topic - topic text here"
Solution: Remove the unnecessary text.
Remove general "SITENAME" text from all pages...
Edit file 'templates/subSilver/overall_header.tpl'
Replace line...
<title>{SITENAME} :: {PAGE_TITLE}</title>with line...
<title>{PAGE_TITLE}</title>Replace the index page "Index" text with site name or keyword text...
Edit file 'language/lang_english/lang_main.php'
Replace line...
$lang['Index'] = 'Index';with line...
$lang['Index'] = 'Your-site-name Forums or keyword text';Remove "View Forum - " text...
Edit file 'viewforum.php'
Replace line...
$page_title = $lang['View_forum'] . ' - ' . $forum_row['forum_name'];with line...
$page_title = $forum_row['forum_name'];Remove "View topic - " text...
Edit file 'viewtopic.php'
Replace line...
$page_title = $lang['View_topic'] .' - ' . $topic_title;with line...
$page_title = $topic_title;Cosmetic Changes
Remove the intrusive phpBB logo...
Edit file 'templates/SubSilver/overall_header.tpl'
Delete or comment out (with <--! -->) line...
<td><a href="{U_INDEX}"><img src="templates/subSilver/images/logo_phpBB.gif" border="0" alt="{L_INDEX}" vspace="1" /></a></td>Hyperlink the sitename back to your main site...
Edit file 'templates/SubSilver/overall_header.tpl'
Edit line...
<span class="maintitle">{SITENAME}</span>with line...
<span class="maintitle"><a href="http://www.example.com" style="text-decoration:none">{SITENAME}</a> - Forums</span>Remove Faq, Memberlist, and Grouplist links from header...
Edit file 'templates/subSilver/overall_header.tpl'
Find...
<td align="center" valign="top" nowrap="nowrap"><span class="mainmenu"> <a href="{U_FAQ}" class="mainmenu"><img src="templates/subSilver/images/icon_mini_faq.gif" width="12" height="13" border="0" alt="{L_FAQ}" hspace="3" />{L_FAQ}</a></span><span class="mainmenu"> <a href="{U_SEARCH}" class="mainmenu"><img src="templates/subSilver/images/icon_mini_search.gif" width="12" height="13" border="0" alt="{L_SEARCH}" hspace="3" />{L_SEARCH}</a> <a href="{U_MEMBERLIST}" class="mainmenu"><img src="templates/subSilver/images/icon_mini_members.gif" width="12" height="13" border="0" alt="{L_MEMBERLIST}" hspace="3" />{L_MEMBERLIST}</a> <a href="{U_GROUP_CP}" class="mainmenu"><img src="templates/subSilver/images/icon_mini_groups.gif" width="12" height="13" border="0" alt="{L_USERGROUPS}" hspace="3" />{L_USERGROUPS}</a> Replace with...
<td align="center" valign="top" nowrap="nowrap"><span class="mainmenu"> <a href="{U_SEARCH}" class="mainmenu"><img src="templates/subSilver/images/icon_mini_search.gif" width="12" height="13" border="0" alt="{L_SEARCH}" hspace="3" />{L_SEARCH}</a> Spammers and Bots
Tell search engines not to index, nor follow links, of the memberlist and user profile pages...
Edit files memberlist.php and includes/usercp_viewprofile.php
Find line...
// Generate pageRight under this line add...
$template->assign_vars(array('META'=>'<meta name="robots" content="noindex,nofollow">'));Remove the Newest User link from being displayed on the main page...
Edit file templates/subSilver/index_body.tpl
Find...
<td class="row1" align="left" width="100%"><span class="gensmall">{TOTAL_POSTS}<br />{TOTAL_USERS}<br />{NEWEST_USER}</span>Edit to...
<td class="row1" align="left" width="100%"><span class="gensmall">{TOTAL_POSTS}<br />{TOTAL_USERS}</span>Do not display un-activated nor zero-post members in your memberlist...
Edit file memberlist.php
Find...
$sql = "SELECT username, user_id, user_viewemail, user_posts, user_regdate, user_from, user_website, user_email, user_icq, user_aim, user_yim, user_msnm, user_avatar, user_avatar_type, user_allowavatarFROM " . USERS_TABLE . "
WHERE user_id <> " . ANONYMOUS . "
ORDER BY $order_by";
Edit to...
$sql = "SELECT username, user_id, user_viewemail, user_posts, user_regdate, user_from, user_website, user_email, user_icq, user_aim, user_yim, user_msnm, user_avatar, user_avatar_type, user_allowavatarFROM " . USERS_TABLE . "
WHERE user_id <> " . ANONYMOUS . "
AND user_active = 1
AND user_posts > 0
ORDER BY $order_by";
Delete all spambot and unwanted users...
Enter the mysql shell...
mysql -u root -p
Select the phpBB database...
use phpbb2;
Display and delete all un-activated users, older than 2 days...
SELECT username, user_id, FROM_UNIXTIME(user_regdate) FROM phpbb_users WHERE user_active=0 AND user_id > 2 AND FROM_UNIXTIME(user_regdate) < DATE_SUB(NOW(),INTERVAL 2 DAY) ORDER BY user_regdate;DELETE FROM phpbb_users WHERE user_active=0 AND user_id > 2 AND FROM_UNIXTIME(user_regdate) < DATE_SUB(NOW(),INTERVAL 2 DAY);
Display and delete all users with zero posts and a website...
SELECT username, FROM_UNIXTIME(user_lastvisit), FROM_UNIXTIME(user_regdate), user_website FROM phpbb_users WHERE user_id > 2 AND user_posts = 0 AND BIN(user_website) IS NOT NULL ORDER BY user_lastvisit;DELETE FROM phpbb_users WHERE user_id > 2 AND user_posts = 0 AND BIN(user_website) IS NOT NULL;
Display and delete all users that have not logged-in in 1 year...
Select username, user_id, FROM_UNIXTIME(user_lastvisit) FROM phpbb_users WHERE user_id > 2 AND (UNIX_TIMESTAMP()-user_lastvisit) > 31536000 ORDER BY user_lastvisit;DELETE FROM phpbb_users WHERE user_id > 2 AND (UNIX_TIMESTAMP()-user_lastvisit) > 31536000;
Ban all *@*.ru, *@*.biz, *@*.info email addresses [this has to be done directly; will not work from the phpbb admin interface]...
insert into phpbb_banlist (ban_email) values ('*@*.ru');insert into phpbb_banlist (ban_email) values ('*@*.biz');
insert into phpbb_banlist (ban_email) values ('*@*.info');