Auto Correct Broken and Mis-Encoded Links to Your Site

Some percentage of the posted links on the internet to your site will contain commonly made typing mistakes that result in the visitor ending up on a 404 / “Not Found” page.

By using Apache’s mod_rewrite and the RewriteRule directive, you can easily auto-correct the majority of the badly-formed incoming links, by reforming the broken links and then redirecting to the correct URL.

A normal link to your website on another website looks like this:

<p>Some text with a <a href="http://www.example.com/page">link anchor text here</a>.</p>

There are several ways this link can be malformed.

Incorrect HTML character encoding/escaping:

http://www.example.com/page&quot;&gt;...

Two links combined:

http://www.example.com/pagehttp://other.website/path

Dots, commas, quotes, parentheses, angle quotes at end:

http://www.example.com/page.
http://www.example.com/page,
http://www.example.com/page"
http://www.example.com/page'
http://www.example.com/page)
http://www.example.com/page(
http://www.example.com/page<
http://www.example.com/page>

White-spaces at end:

http://www.example.com/page <-- a space here

Link, line break, paragraph, list tags at end:

http://www.example.com/page</a>
http://www.example.com/page<br>
http://www.example.com/page</p>
http://www.example.com/page</li>

Variations of above:

http://www.example.com/page<a>
http://www.example.com/page<a/>
http://www.example.com/page<a
http://www.example.com/page</br>
http://www.example.com/page<br />
http://www.example.com/page<p>
http://www.example.com/page<p/>
http://www.example.com/page<li>
...

Fix Broken Incoming Links

To automatically correct the above common link mishaps, place the following code into either the website’s VirtualHost or .htaccess file.

# match on some common link mishaps: link">... escaped as link&quot;&gt;abcdefg
RewriteRule ^(.*)\s*(&quot;)+(&gt;)* $1 [R=permanent,L]

# match on some common link mishaps: two links merged
RewriteRule ^(.*)\s*https?:// $1 [R=permanent,L]

# match on some common link mishaps: ending tags and variations such as <br> <br/> <br /> </br> ... <a <a> <a > </a </a> ...
RewriteRule (.*)\s*</?a\ ?/?>?$ $1 [R=permanent,L]
RewriteRule (.*)\s*</?br\ ?/?>?$ $1 [R=permanent,L]
RewriteRule (.*)\s*</?li\ ?/?>?$ $1 [R=permanent,L]
RewriteRule (.*)\s*</?p\ ?/?>?$ $1 [R=permanent,L]

# match on some common link mishaps: links ending with . , " ' ) ( > < or any whitespace character (on specific single match, with it being one or more times)
RewriteRule (.*)[\.,"'\)\(><\s]+$ $1 [R=permanent,L]

# match on some common link mishaps: multiple ending / (more than 1 ending forward slash)
RewriteRule (.*)//+$ $1/ [R=permanent,L]

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>