Auto Correct Broken and Mis-Encoded Links to Your Site

Posted: 2015-04-06 22:35:48

Some percentage of the posted links on the internet to your site will contain commonly made typing mistakes that result in the visitor ending up on a 404 / “Not Found” page.

By using Apache’s mod_rewrite and the RewriteRule directive, you can easily auto-correct the majority of the badly-formed incoming links, by reforming the broken links and then redirecting to the correct URL.

A normal link to your website on another website looks like this:

<p>Some text with a <a href="http://www.example.com/page">link anchor text here</a>.</p>

There are several ways this link can be malformed.

Incorrect HTML character encoding/escaping:

http://www.example.com/page&quot;&gt;...

Two links combined:

http://www.example.com/pagehttp://other.website/path

Dots, commas, quotes, parentheses, angle quotes at end:

http://www.example.com/page.
http://www.example.com/page,
http://www.example.com/page"
http://www.example.com/page'
http://www.example.com/page)
http://www.example.com/page(
http://www.example.com/page<
http://www.example.com/page>

White-spaces at end:

http://www.example.com/page <-- a space here

Link, line break, paragraph, list tags at end:

http://www.example.com/page</a>
http://www.example.com/page<br>
http://www.example.com/page</p>
http://www.example.com/page</li>

Variations of above:

http://www.example.com/page<a>
http://www.example.com/page<a/>
http://www.example.com/page<a
http://www.example.com/page</br>
http://www.example.com/page<br />
http://www.example.com/page<p>
http://www.example.com/page<p/>
http://www.example.com/page<li>
...

Fix Broken Incoming Links

To automatically correct the above common link mishaps, place the following code into either the website’s VirtualHost or .htaccess file.

# match on some common link mishaps: link">... escaped as link&quot;&gt;abcdefg
RewriteRule ^(.*)\s*(&quot;)+(&gt;)* $1 [R=permanent,L]

# match on some common link mishaps: two links merged
RewriteRule ^(.*)\s*https?:// $1 [R=permanent,L]

# match on some common link mishaps: ending tags and variations such as <br> <br/> <br /> </br> ... <a <a> <a > </a </a> ...
RewriteRule (.*)\s*</?a\ ?/?>?$ $1 [R=permanent,L]
RewriteRule (.*)\s*</?br\ ?/?>?$ $1 [R=permanent,L]
RewriteRule (.*)\s*</?li\ ?/?>?$ $1 [R=permanent,L]
RewriteRule (.*)\s*</?p\ ?/?>?$ $1 [R=permanent,L]

# match on some common link mishaps: links ending with . , " ' ) ( > < or any whitespace character (on specific single match, with it being one or more times)
RewriteRule (.*)[\.,"'\)\(><\s]+$ $1 [R=permanent,L]

# match on some common link mishaps: multiple ending / (more than 1 ending forward slash)
RewriteRule (.*)//+$ $1/ [R=permanent,L]