Login!
Register as a new user Lost password?
 

MODx Bug/Feature Tracker and Feature Requests

Welcome to the MODx CMS Tracker. Please choose the appropriate project from the drop down menu and provide as much information as possible regarding your server environment and browser. Thanks!

FS#874 — Alias not saved if char- encoding set other then UTF-8 (bug in save_content.processor.php)

Attached to Project — MODx
Opened by Jelle Jager (TobyL) - Thursday, 24 May 2007, 12:21AM
Last edited by Jason Coward (opengeek) - Thursday, 14 June 2007, 02:08AM
Task Type Bug Report
Category Core Distribution
Status Requires testing
Assigned To Jason Coward (opengeek)
Jelle Jager (TobyL)
Operating System All
Severity Critical
Priority Urgent
Reported Version 0.9.6-RC3
Due in Version 0.9.6.1
Due Date Undecided
Percent Complete 90%

Details

In stripAlias() (line 570) the test for the character set is missing the { } brackets. If the test fails (ie char set != UTF-8) the $replace_array is not initialized and hence the function strtr($alias,$replace_array); causes a php warning;

Committed correction in branches/0.9.6/ @ 2758

This task depends upon

This task blocks these from closing
Comment by Jason Coward (opengeek) - Thursday, 31 May 2007, 11:01AM
  • Field changed: Status (Requires testing → Researching)
  • Field changed: Due in Version (0.9.6 → 0.9.6.1)
  • Field changed: Priority (Normal → Urgent)
  • Field changed: Percent Complete (100% → 60%)
  • Task reassigned to Jelle Jager (TobyL), Jason Coward (opengeek)
This is apparently not fixed properly in 0.9.6 final; it appears to only be working with UTF-8 encodings, and is still not saving the alias when latin1 or other encodings are employed. See http://modxcms.com/forums/index.php/topic,15170.msg99113.html#msg99113 for more information.

Comment by Jelle Jager (TobyL) - Monday, 04 June 2007, 11:23AM
  • Field changed: Status (Researching → Requires testing)
  • Field changed: Percent Complete (60% → 90%)
Previous fix was too hasty (apologies).. Have now removed the UTF-8 test altogether. The test belonged to a commented out utf_decode statement and is essentially unnecessary.

Committed in /branches/0.9.6/ @2776

Comment by Jason Coward (opengeek) - Thursday, 14 June 2007, 02:08AM
This still wasn't working for me on latin1 encodings; my solution is in /branches/0.9.6 @2782/3

Comment by David Molliere (davidm) - Tuesday, 18 September 2007, 03:00AM
Didn't see that bug report, but Laurentc publised a fix for the Latin1 issue (which works ONLY for Latin1).

Don't know if it helps, but for reference it's there:
http://modxcms.com/forums/index.php/topic,16292.msg104080.html#msg104080


Comment by Laurent (laurentc) - Friday, 21 September 2007, 02:49AM
I think that the problem was corrected but at the David's request and in reference to this post :
http://modxcms.com/forums/index.php/topic,18371.0.html

here what I did :

As TobyL said, there is no $replace_array initialized to use with the strtr() fonction if character setting is not UTF-8.
So, in ligne 859 of save_content.processor.php, I had and "else" condition with a $replace_array initialized with all the ISO french special characters and some others.


  save_content.processor.php
Comment by Jason Coward (opengeek) - Wednesday, 24 October 2007, 12:19PM
It seems to me that we would have to have a file in the proper encoding with an array of each encoding's special characters with appropriate replacements for every encoding to fully resolve this issue. Can someone think of a practical way to accomplish this? Or an alternative approach that might work across encodings?

Comment by Carl Holmberg (carlholmberg) - Wednesday, 31 October 2007, 02:36PM
Could something like this maybe work?
--
function stripAlias($alias) {
global $modx;

$charset = strtoupper($modx->config['modx_charset']);
$alias = strip_tags($alias);

if ($charset == 'UTF-8') {
// The following requires PHP 4.4.0 or 5.1.0 or later
// Maybe some more codes should be added... but this
// essentially removes unicode chars not converted by htmlentities
$alias = @preg_replace('/[\p{Po}\p{Sm}\p{M}]+/', '', $alias);
}

if (function_exists('iconv')) {
$alias = iconv($charset, 'ASCII//TRANSLIT', $alias);
} else {
$alias = htmlentities($alias, ENT_QUOTES, $charset);
$alias = preg_replace('/&([a-z])[a-z]+;/i', '$1', $alias);
}

$alias = preg_replace('/[^\w\d%-]/', '', $alias);
$alias = trim($alias, '-');
return $alias;
}
--
I haven't tested that much though...

Comment by Carl Holmberg (carlholmberg) - Thursday, 01 November 2007, 09:22AM
Sorry for the previous – I have a better version now...

--
function stripAlias($alias) {
global $modx;

$charset = strtoupper($modx->config['modx_charset']);
$alias = strip_tags($alias);

if (function_exists('iconv')) {
$alias = iconv($charset, 'ASCII//TRANSLIT', $alias);
} else {
$alias = htmlentities($alias, ENT_QUOTES, $charset);
$alias = preg_replace('/&([a-z])[a-z]+;/i', '$1', $alias);
$alias = preg_replace('/[\xC0-\xF7]{1}[\x80-\xBF]/i', '', $alias);
}

$alias = preg_replace('/[^\w\d%-]/', '', $alias);
$alias = trim($alias, '-');
return $alias;
}
--
In the conditional statement if iconv is installed it uses it for the translation to ASCII with a good conversion of i.e. æ to ae. If iconv isn't installed it first converts with htmlentities and isolates the first char after '&' after that it removes all multibyte chars that remain (htmlentities doesn't remove all). The last lines with the preg_replace and trim are basically from the current version though they aren't identical.
I don't think that having large conversion arrays for every possible language is a feasible solution.

Comment by Jason Coward (opengeek) - Sunday, 18 November 2007, 10:22AM
I don't understand how this solves the problem. It seems to me that this would just convert characters from whatever encoding to ASCII using transliteration. How would that be valid in a UTF-8 site?

Comment by Aidan Haskins (shuriken) - Saturday, 26 July 2008, 12:39AM
Wouldn't it be possible to just validate alpha+num characters across the whole charset?

Using something like this: /[^\.%\p{L}\p{N} _-]/u