Jul 04, 2009, 04:53 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
Search via SMF or Google: modx forums all of modxcms.com web
  MODxCMS.com   Forums   Help Login Register  
News:Read what MODx Developers say: MODx Dev. Blogs
Pages: [1] 2 3 ... 8   Go Down
  Print  
Author Topic: [PLUGIN] SEO Strict URLs  (Read 24910 times)
0 Members and 1 Guest are viewing this topic.
JeremyL
Full Member
***
Posts: 100


WWW
« on: Sep 22, 2006, 12:51 AM »

SEO Strict URLs
Enforces the use of strict urls to prevent duplicate content.

Official Repository Release: http://www.modxcms.com/SEO-Strict-URLs-1.0-1337.html
New support thread: http://modxcms.com/forums/index.php/topic,12452.0.html

ApoXX, has taken this plugin to a new level and now it seems stable enough for an official release. From my understanding it will become part of the core in a soon to be future release.

Note that the code only works on 0.9.5+

If you find any bugs please let us know.

Current Features:
Quote
# 301 Redirect from /index.php?id=8 to /alias.html
# 301 Redirect from /page, /page/ to /page.html
# 301 Redirect from non domain.com url to www.domain.com url (requires .htaccess edit)
# If you switch Friendly URLs off, it redirects /page.html and /page to /index.php?id=48 - this enforces the options that have been selected.
# 301 Redirect /{site_start}, /{site_start}.html  to root folder / (no page in url)
# Menu links pointing to "www.domain.com/{start_page}" are changed to "www.domain.com" (Can turn off)
# If a page is marked as a folder in the modx system, it shows as /dir/ and not /dir.html (can turn off)
# 301 Redirect /folder to /folder/ to mimic apache, only happens when file is marked as folder (can be turned off)
# Individual URL and link rewriting. For example, you can set rss feed to default to feed.rss, not feed.rss.html.  Also, menu links can link directly to the folder name rather than having to go through an extra level of redirects.

ToDo
Quote
# Any suggestions?

Change Log
Quote
Version 1.0 (summary of all versions up to 1.0)
--------------
# FIXED: All known bugs as of 02/24/2007
# ADDED: Added ability to enforce alternate url extensions such as .cc or .xml

Version 0.7
--------------
# ADDED: If a page is marked as a folder in the modx system, it shows as /dir/ and not /dir.html (can turn off)
# ADDED: 301 Redirect /folder to /folder/ to mimic apache, only happens when file is marked as folder (can be turned off)

Version 0.6
-------------
# FIXED BUG: When installed in modx is subfolder calling /index.html does infinite redirect
# FIXED BUG: When installed in modx is subfolder any redirect adds an extra folder to a redirect url


Install:


Plugin name: SEO Strict URLs
Description: <strong>1.0.0</strong> Enforces the use of strict urls to prevent dup content

Plugin configuration:
Code:
&editDocLinks=Edit document links;int;1 &makeFolders=Rewrite containers as folders;int;1 &override=Enable manual overrides;int;0 &overrideTV=Override TV name;string;seoOverride

On Install: Check the OnWebPageInit & OnWebPagePrerender boxes in the System Events tab.

For overriding documents, create a new template variabe (TV) named seoOverride (or whatever was defined in the configuration) with the following options:

Input Type: DropDown List Menu
Input Option Values: Disabled==-1||Base Name==0||Append Extension==1||Folder==2
Default Value: -1
NOTE: You must set "Enable manual overrides" in plugin configuration to 1 to enable this TV

Code:
// Strict URLs
// version 1.0
// Enforces the use of strict URLs to prevent duplicate content.
// By Jeremy Luebke @ www.xuru.com
// Contributions by Brian Stanback @ www.stanback.net

// On Install: Check the "OnWebPageInit" & "OnWebPagePrerender" boxes in the System Events tab.
// Plugin configuration: &editDocLinks=Edit document links;int;1 &makeFolders=Rewrite containers as folders;int;1 &override=Enable manual overrides;int;0 &overrideTV=Override TV name;string;seoOverride

// For overriding documents, create a new template variabe (TV) named seoOverride with the following options:
//    Input Type: DropDown List Menu
//    Input Option Values: Disabled==-1||Base Name==0||Append Extension==1||Folder==2
//    Default Value: -1
// NOTE: You must set "Enable manual overrides" in plugin configuration to 1 to enable this TV

//  # Include the following in your .htaccess file
//  # Replace "example.com" &  "example\.com" with your domain info
//  RewriteCond %{HTTP_HOST} .
//  RewriteCond %{HTTP_HOST} !^www\.example\.com [NC]
//  RewriteRule (.*) http://www.example.com/$1 [R=301,L]

// Begin plugin code
$e = &$modx->event;

if ($e->name == 'OnWebPageInit')
{
   $myProtocol = ($_SERVER['HTTPS'] == 'on') ? 'https' : 'http';
   $s = $_SERVER['REQUEST_URI'];
   $parts = explode("?", $s); 

   $documentIdentifier = ($modx->documentIdentifier) ? $modx->documentIdentifier : $modx->config['error_page'];  // Set error page document ID if page is not found
   $alias = $modx->aliasListing[$documentIdentifier]['alias'];
   if ($makeFolders) $isfolder = (count($modx->getChildIds($documentIdentifier, 1)) > 0) ? 1 : 0;

   if ($override && $overrideOption = $modx->getTemplateVarOutput($overrideTV, $documentIdentifier))
   {
      switch ($overrideOption[$overrideTV])
      {
         case 0:
            $isoverride = 1;
            break;
         case 1:
            $isfolder = 0;
            break;
         case 2:
            $makeFolders = 1;
            $isfolder = 1;
      }
   }

   if ($isoverride)
   {
      $strictURL = preg_replace('/[^\/]+$/', $alias, $modx->makeUrl($documentIdentifier));
   }
   elseif ($isfolder && $makeFolders)
   {
      $strictURL = preg_replace('/[^\/]+$/', $alias, $modx->makeUrl($documentIdentifier)) . "/";
   }
   else
   {
      $strictURL = $modx->makeUrl($documentIdentifier);
   }

   $myDomain = $myProtocol . "://" . $_SERVER['HTTP_HOST'];
   $newURL = $myDomain . $strictURL;
   $requestedURL = $myDomain . $parts[0];

   if ($documentIdentifier == $modx->config['site_start'])
   {
      if ($requestedURL != $modx->config['site_url'])
      {
         // Force redirect of site start
         header("HTTP/1.1 301 Moved Permanently");
         $qstring = preg_replace("#(^|&)(q|id)=[^&]+#", '', $parts[1]);  // Strip conflicting id/q from query string
         if ($qstring) header('Location: ' . $modx->config['site_url'] . '?' . $qstring);
         else header('Location: ' . $modx->config['site_url']);
         exit;
      }
   }
   elseif ($parts[0] != $strictURL)
   {
      // Force page redirect
      header("HTTP/1.1 301 Moved Permanently");
      $qstring = preg_replace("#(^|&)(q|id)=[^&]+#", '', $parts[1]);  // Strip conflicting id/q from query string
      if ($qstring) header('Location: ' . $strictURL . '?' . $qstring);
      else header('Location: ' . $strictURL);
      exit;
   }
}
elseif ($e->name == 'OnWebPagePrerender')
{
   if ($editDocLinks)
   {
      $myDomain = $_SERVER['HTTP_HOST'];
      $furlSuffix = $modx->config['friendly_url_suffix'];
      $baseUrl = $modx->config['base_url'];
      $o = &$modx->documentOutput; // get a reference of the output

      // Reduce site start to base url
      $overrideAlias = $modx->aliasListing[$modx->config['site_start']]['alias'];
      $overridePath = $modx->aliasListing[$modx->config['site_start']]['path'];
      $o = preg_replace("#((href|action)=\"|$myDomain)($baseUrl)?($overridePath/)?$overrideAlias$furlSuffix#", '${1}' . $baseUrl, $o);

      if ($override)
      {
         // Replace manual override links
         $sql = "SELECT tvc.contentid as id, tvc.value as value FROM " . $modx->getFullTableName('site_tmplvars') . " tv ";
         $sql .= "INNER JOIN " . $modx->getFullTableName('site_tmplvar_templates') . " tvtpl ON tvtpl.tmplvarid = tv.id ";
         $sql .= "LEFT JOIN " . $modx->getFullTableName('site_tmplvar_contentvalues') . " tvc ON tvc.tmplvarid = tv.id ";
         $sql .= "LEFT JOIN " . $modx->getFullTableName('site_content') . " sc ON sc.id = tvc.contentid ";
         $sql .= "WHERE sc.published = 1 AND tvtpl.templateid = sc.template AND tv.name = '$overrideTV'";
         $results = $modx->dbQuery($sql);
         while ($row = $modx->fetchRow($results))
         {
            $overrideAlias = $modx->aliasListing[$row['id']]['alias'];
            $overridePath = $modx->aliasListing[$row['id']]['path'];
            switch ($row['value'])
            {
               case 0:
                  $o = preg_replace("#((href|action)=\"($baseUrl)?($overridePath)?|$myDomain$baseUrl$overridePath/?)$overrideAlias$furlSuffix#", '${1}' . $overrideAlias, $o);
                  break;
               case 2:
                  $o = preg_replace("#((href|action)=\"($baseUrl)?($overridePath)?|$myDomain$baseUrl$overridePath/?)$overrideAlias$furlSuffix/?#", '${1}' . rtrim($overrideAlias, '/') . '/', $o);
                  break;
            }
         }
      }

      if ($makeFolders)
      {
         // Replace container links
         foreach ($modx->documentListing as $id)
         {
            if (count($modx->getChildIds($id, 1)))
            {
                  $overrideAlias = $modx->aliasListing[$id]['alias'];
                  $overridePath = $modx->aliasListing[$id]['path'];
                  $o = preg_replace("#((href|action)=\"($baseUrl)?($overridePath)?|$myDomain$baseUrl$overridePath/?)$overrideAlias$furlSuffix/?#", '${1}' . rtrim($overrideAlias, '/') . '/', $o);
            }
         }
      }
   }
}
« Last Edit: Feb 26, 2007, 03:17 PM by JeremyL » Logged

Boby
Full Member
***
Posts: 155



WWW
« Reply #1 on: Sep 22, 2006, 02:33 AM »

Looks interesting, but I think this should be optional:
Quote
If a page is marked as a folder in the modx system, it should be only seen as /dir/ and not /dir.html (can turn off)

Think about documents with comments, they need to be marked as directory because comments are stored in as child document.

All the rest seems pretty cool Wink
Logged

...my Photo Gallery on Flickr...
PaulGregory
MODx's midnight runner
Committed to MODx
*****
Posts: 1,097

MODx's midnight runner


WWW
« Reply #2 on: Sep 22, 2006, 06:18 AM »

@Boby: That's probably covered by the "(can turn off)" bit.

@JeremyL: Please remind people that they need to check the "OnWebPageInit" box in the System Events tab.

I haven't changed the .htaccess - as far as I can comprehend, this is only needed for the "Redirect from non domain.com url to www.domain.com url" bit, which I don't need.

I can confirm the main features work as advertised, and I have identified 2 additional features that could be considered plus points.

1) If you switch Friendly URLs off, it redirects /page.html and /page to /index.php?id=48 - so essentially, this enforces the options that have been selected.
2) Visiting www.domain.com redirects to www.domain.com/index.html - a good and useful anti-duplication feature.

And there are 2 negative results - although to be fair these are both predictable from the above post information.

3) I have a page with the alias feed.xml that I want to refer to as feed.xml rather than feed.xml.html - obviously this plugin redirects. However, I imagine I could either hack the plugin to truncate the $strictURL before any second ".", to solve that, or (possibly safer) replace .xml.html with .xml and .css.html with .css - if this is something other people will want to do, it would be worth including that in the main plugin.

4) This broke my form. I assume this is due to the way that anything extra in the URL is considered to be "wrong", and is the same as the BUG in the todo list. This is therefore a fairly major oversight and really needs resolving ASAP. I think you need to do two things:
i) collect all the &x=y values, miss out id=, and add them back onto the $newURL.
ii) strip the &x=y bits from $s before running the comparison.

But a good start, and neither obstacle unsurmountable.

EDIT: Have reread the bug text "BUG: Make blog.html?start=2 look like blog2.html. Currently blog.html?start=2 does not work." I disagree. Getting blog2.html to load the blog page and pass it start=2 is a massively complicated thing to do and is a feature not a bug fix. Getting blog.html?start=2 to work is the priority, as per 4) above.

EDIT 2: To clarify, this works fine with POST requests, but obviously GET requests are killed.
« Last Edit: Sep 22, 2006, 06:33 AM by PaulGregory » Logged

No, I don't know what OpenGeek's saying half the time either.
MODx Documentation: The Wiki | My Wiki contributions | Main MODx Documentation
Forum: Where to post threads about add-ons | Forum Rules
Like MODx? donate (and/or share your resources)
Like me? See my Amazon wishlist
MODx "Most Promising CMS" - so appropriate!
JeremyL
Full Member
***
Posts: 100


WWW
« Reply #3 on: Sep 22, 2006, 09:00 AM »

@PaulGregory - What version are you testing on?


Quote
Think about documents with comments, they need to be marked as directory because comments are stored in as child document.

Yea, that's the reason I was going to add the can turn off bit because I knew some people wouldn't want this. But you have to remember, many people will try and link to the directory a page is in. So if the blog pages are in /blog/, they will link there just by habit. It will 301 redirect to /blog.html, but I would personally rather have a blog post look like an index page in a folder then to have people linking to the wrong place.

Quote
need to check the "OnWebPageInit" box in the System Events tab.

Done

Quote
1) If you switch Friendly URLs off, it redirects /page.html and /page to /index.php?id=48 - so essentially, this enforces the options that have been selected.
2) Visiting www.domain.com redirects to www.domain.com/index.html - a good and useful anti-duplication feature.

1) Great idea. I'll add it to the list. Not sure if a 404 will be thrown up first and prevent the feature though. Will take some testing.
2) I was actually doing it the opposite way "# 301 Redirect /index, /index.html  to root folder / (no page in url)". SE's see www.comain.com/ as the root of the whole website. They see /index.html as a different document. It would be a bad idea to 301 redirect the root of the website to a document in my opinion. Instead /index.html will redirect back to the root www.comain.com/.

I'll try and come up with some good solutions for #3 & #4 when I get home this afternoon from work.
« Last Edit: Sep 22, 2006, 09:08 AM by JeremyL » Logged

PaulGregory
MODx's midnight runner
Committed to MODx
*****
Posts: 1,097

MODx's midnight runner


WWW
« Reply #4 on: Sep 22, 2006, 10:15 AM »

The one I tested it on happened to be 0.9.5 rev 1392.

But you misunderstand. 1 & 2 are not feature requests, they are features that I have discovered already work whilst testing.

So you can take out "If you switch Friendly URLs off, it redirects /page.html and /page to /index.php?id=48 - this enforces the options that have been selected." from the ToDo list and move it to the features list.

2) I was actually doing it the opposite way "# 301 Redirect /index, /index.html  to root folder / (no page in url)". SE's see www.comain.com/ as the root of the whole website. They see /index.html as a different document. It would be a bad idea to 301 redirect the root of the website to a document in my opinion. Instead /index.html will redirect back to the root www.comain.com/.

Ah, I see that on your to-do now. But whatever your future intention, the current situation is that it goes domain.com > domain.com/index.html.

To make the site_home page go the other way, you'd probably have to build the target strict URL of the site_home, and change the headers accordingly. You have to be careful here as it's easy to end up with an infinite loop; indeed in testing a possible solution my browser detected the infinite loop and no longer follows 301 redirects for my test domain. I'm hoping a reboot will solve that but in the meantime have reverted the change I made.

--

I got 3 to work fine for me just adding two simple replaces after the target URL is built:
Code:
   $strictURL = $modx->makeUrl($modx->documentIdentifier);
   $strictURL = str_replace(".css.html",".css",$strictURL);
   $strictURL = str_replace(".xml.html",".xml",$strictURL);
It's not pretty, but on further consideration it's probably faster than any generically useful way. So it is probably worth including this hack as a tip in the documentation, but not inculding it in the actual plugin.

Incidentally, it's probably worth pointing out that I hardwire references to feed.xml and stylesheet.css where I use them because [~11~] adds the .html (which I don't want)!

Finally,
Quote
# 301 Redirect /folder to /folder/ to mimic apache
That too should be optional, I'd prefer /news to /news/

Logged

No, I don't know what OpenGeek's saying half the time either.
MODx Documentation: The Wiki | My Wiki contributions | Main MODx Documentation
Forum: Where to post threads about add-ons | Forum Rules
Like MODx? donate (and/or share your resources)
Like me? See my Amazon wishlist
MODx "Most Promising CMS" - so appropriate!
JeremyL
Full Member
***
Posts: 100


WWW
« Reply #5 on: Sep 22, 2006, 10:53 AM »

Quote
But you misunderstand. 1 & 2 are not feature requests, they are features that I have discovered already work whilst testing.

Thats what I get for being in a hurry this morning and skimming the post Cheesy

Quote
Incidentally, it's probably worth pointing out that I hardwire references to feed.xml and stylesheet.css where I use them because [~11~] adds the .html (which I don't want)!

Yea, that's what I need to think about. I actually consider this a bug in the FURL logic. Maybe a bug report on 0.9.5 might get it fixed before launch. I need to study what non html pages are being rewritten and what extensions are being forced on them.

Quote
To make the site_home page go the other way, you'd probably have to build the target strict URL of the site_home, and change the headers accordingly. You have to be careful here as it's easy to end up with an infinite loop; indeed in testing a possible solution my browser detected the infinite loop and no longer follows 301 redirects for my test domain. I'm hoping a reboot will solve that but in the meantime have reverted the change I made.

I was actually thinking of a simple conditional statement to check and see if the URI was www.domain.com/index* and doing the redirect and if it was domain.com/ then the main plugin code would be skipped.

*Deleted Unfinished SentenceNot sure how that happened and can't remember what I was saying before. Oh well*
« Last Edit: Sep 22, 2006, 11:33 AM by JeremyL » Logged

PaulGregory
MODx's midnight runner
Committed to MODx
*****
Posts: 1,097

MODx's midnight runner


WWW
« Reply #6 on: Sep 22, 2006, 11:21 AM »

I assume the rest of your post will come along in a moment.

MODx's FURL system - it just needs an option of "Don't add default extension to aliases with a . in them". I might see if I can suggest an update to the code before 0.9.5 hits. It has to be an option because previous versions have promoted the use of . in aliases (one page is called script.aculo.us.html, for example). Actually, I'd vote in favour of incorporating your plugin features into MODx. They really help MODx live up to the claim of SEO CMS.

I was actually thinking of a simple conditional statement to check and see if the URI was www.domain.com/index* and doing the redirect and if it was domain.com/ then the main plugin code would be skipped. I think that
Well, you can't literally check for index.* because that might not be the name of the home page. That's why I tried building a comparison string based on $modx->makeUrl(1); - although again, the home page might not be 1 - I just couldn't find the proper variable for site_home. I think this line will work, I just got something simple wrong and can't do anymore tests for a bit.
Logged

No, I don't know what OpenGeek's saying half the time either.
MODx Documentation: The Wiki | My Wiki contributions | Main MODx Documentation
Forum: Where to post threads about add-ons | Forum Rules
Like MODx? donate (and/or share your resources)
Like me? See my Amazon wishlist
MODx "Most Promising CMS" - so appropriate!
JeremyL
Full Member
***
Posts: 100


WWW
« Reply #7 on: Sep 22, 2006, 11:36 AM »

Scratch this post, i found it.
« Last Edit: Sep 22, 2006, 11:48 AM by JeremyL » Logged

OpenGeek
MODx Co-Founder
Foundation
*
Posts: 5,813


damn accurate caricatures...


WWW
« Reply #8 on: Sep 22, 2006, 11:46 AM »

The configuration setting is called site_start and is available programmatically as $modx->config['site_start'] or via templates as [(site_start)]
Logged

Jason Coward
MODx Co-Founder
xPDO Founder
CTO @ Collabpad
work productively.
work intelligently.
work together.
Light is just a vibration of a note too. Everything is. You've got to keep that in mind.
  Frank Zappa
JeremyL
Full Member
***
Posts: 100


WWW
« Reply #9 on: Sep 22, 2006, 12:16 PM »

I updated the code. It now redirects to www.domain.com/ and not /index.html.

The internal links that ModX seems to be writing point to /index.html. I need to now add something to parse the doc and have the internal links pointing to the right url (domain.com).

I think I'll need to brainstorm and study some code to figure out where these urls in the menus are written and where that can be rewritten. Any recommendations are welcome.
« Last Edit: Sep 22, 2006, 12:29 PM by JeremyL » Logged

JeremyL
Full Member
***
Posts: 100


WWW
« Reply #10 on: Sep 22, 2006, 12:47 PM »

OK I have been trying to test every event process I could think would work to get the menu to display www.domain.com/. The code below is what I'm using to run the baseline test. Anyone have a better idea of where I could either intercept this url writing in the menus or rewite it after it has already been written?

Code:
$e = &$modx->Event;

switch ($e->name)
{
  case "OnWebPageComplete":
    $o = &$modx->documentOutput; // get a reference of the output
    $o = str_replace("http://www.domain.com/index.html","http://www.domain.com",$o);
    break;
  default :
    return; // stop here - this is very important.
    break;
}
« Last Edit: Sep 22, 2006, 12:59 PM by JeremyL » Logged

PaulGregory
MODx's midnight runner
Committed to MODx
*****
Posts: 1,097

MODx's midnight runner


WWW
« Reply #11 on: Sep 22, 2006, 12:54 PM »

Ah, but the link in the html is most likely to /index.html, rather than the full thing.
It will be easiest to add this code in at the point that a FURL is created, as there we have the document ID number that can be compared to site_home.
Logged

No, I don't know what OpenGeek's saying half the time either.
MODx Documentation: The Wiki | My Wiki contributions | Main MODx Documentation
Forum: Where to post threads about add-ons | Forum Rules
Like MODx? donate (and/or share your resources)
Like me? See my Amazon wishlist
MODx "Most Promising CMS" - so appropriate!
OpenGeek
MODx Co-Founder
Foundation
*
Posts: 5,813


damn accurate caricatures...


WWW
« Reply #12 on: Sep 22, 2006, 01:09 PM »

The url rewriting is done towards the end of the parsing process, and the only event you could use to alter them is OnWebPagePrerender.  But they would already rewritten at that point, so you'd have to replace the already rewritten URLs
Logged

Jason Coward
MODx Co-Founder
xPDO Founder
CTO @ Collabpad
work productively.
work intelligently.
work together.
Light is just a vibration of a note too. Everything is. You've got to keep that in mind.
  Frank Zappa
JeremyL
Full Member
***
Posts: 100


WWW
« Reply #13 on: Sep 22, 2006, 01:49 PM »

Quote
The url rewriting is done towards the end of the parsing process, and the only event you could use to alter them is OnWebPagePrerender.  But they would already rewritten at that point, so you'd have to replace the already rewritten URLs

The problem is that they are being written for the index page to www.domain.com/index.html.

I need that one link written to www.domain.com (no page reference).

I figured out why my code wasnt working though. I was searching for the full domain/url. All that was in the href tags was the page. I'll update the code later and then get some feedback. I think the solution may create future problems but I need to have it tested.
« Last Edit: Sep 22, 2006, 01:58 PM by JeremyL » Logged

JeremyL
Full Member
***
Posts: 100


WWW
« Reply #14 on: Sep 22, 2006, 07:35 PM »

Ok I have updated the code again. I fixed the issue with $_GET urls. The query string is now added back to the end of the strict urls. I still want to end up making blog.html?start=2 show a static url. i will figure out a structure soon I hope.

Current Features
Quote
# 301 Redirect from /index.php?id=8 to /alias.html
# 301 Redirect from /page, /page/ to /page.html
# 301 Redirect from non domain.com url to www.domain.com url (requires .htaccess edit)
# If you switch Friendly URLs off, it redirects /page.html and /page to /index.php?id=48 - this enforces the options that have been selected.
# 301 Redirect /{site_start}, /{site_start}.html  to root folder / (no page in url)
# Menu links pointing to "www.domain.com/{start_page}" are changed to "www.domain.com" (Can turn off)
« Last Edit: Sep 22, 2006, 07:40 PM by JeremyL » Logged

Pages: [1] 2 3 ... 8   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP

Copyright © 2005-2008 MODxCMS, All rights reserved. Contact Us
Styles by ziworks.com

Powered by SMF | SMF © 2006-2008, Simple Machines LLC

Valid XHTML 1.0! Valid CSS!