MODx Bug/Feature Tracker and Feature Requests
Welcome to the MODx CMS Tracker. Please choose the appropriate project from the drop down menu and provide as much information as possible regarding your server environment and browser. Thanks!
FS#874 — Alias not saved if char- encoding set other then UTF-8 (bug in save_content.processor.php)
Attached to Project —
MODx
Opened by Jelle Jager (TobyL) - Thursday, 24 May 2007, 12:21AM
Last edited by Jason Coward (opengeek) - Thursday, 14 June 2007, 02:08AM
Opened by Jelle Jager (TobyL) - Thursday, 24 May 2007, 12:21AM
Last edited by Jason Coward (opengeek) - Thursday, 14 June 2007, 02:08AM
| Task Type | Bug Report |
|---|---|
| Category | Core Distribution |
| Status | Requires testing |
| Assigned To |
Jason Coward (opengeek) Jelle Jager (TobyL) |
| Operating System | All |
| Severity | Critical |
|---|---|
| Priority | Urgent |
| Reported Version | 0.9.6-RC3 |
| Due in Version | 0.9.6.1 |
| Due Date | Undecided |
| Percent Complete |
|
Details
In stripAlias() (line 570) the test for the character set is missing the { } brackets. If the test fails (ie char set != UTF-8) the $replace_array is not initialized and hence the function strtr($alias,$replace_array); causes a php warning;Committed correction in branches/0.9.6/ @ 2758
This task depends upon
This task blocks these from closing
- Field changed: Status (Requires testing → Researching)
- Field changed: Due in Version (0.9.6 → 0.9.6.1)
- Field changed: Priority (Normal → Urgent)
- Field changed: Percent Complete (100% → 60%)
- Task reassigned to Jelle Jager (TobyL), Jason Coward (opengeek)
This is apparently not fixed properly in 0.9.6 final; it appears to only be working with UTF-8 encodings, and is still not saving the alias when latin1 or other encodings are employed. See http://modxcms.com/forums/index.php/topic,15170.msg99113.html#msg99113 for more information.- Field changed: Status (Researching → Requires testing)
- Field changed: Percent Complete (60% → 90%)
Previous fix was too hasty (apologies).. Have now removed the UTF-8 test altogether. The test belonged to a commented out utf_decode statement and is essentially unnecessary.Committed in /branches/0.9.6/ @2776
Don't know if it helps, but for reference it's there:
http://modxcms.com/forums/index.php/topic,16292.msg104080.html#msg104080
http://modxcms.com/forums/index.php/topic,18371.0.html
here what I did :
As TobyL said, there is no $replace_array initialized to use with the strtr() fonction if character setting is not UTF-8.
So, in ligne 859 of save_content.processor.php, I had and "else" condition with a $replace_array initialized with all the ISO french special characters and some others.
--
function stripAlias($alias) {
global $modx;
$charset = strtoupper($modx->config['modx_charset']);
$alias = strip_tags($alias);
if ($charset == 'UTF-8') {
// The following requires PHP 4.4.0 or 5.1.0 or later
// Maybe some more codes should be added... but this
// essentially removes unicode chars not converted by htmlentities
$alias = @preg_replace('/[\p{Po}\p{Sm}\p{M}]+/', '', $alias);
}
if (function_exists('iconv')) {
$alias = iconv($charset, 'ASCII//TRANSLIT', $alias);
} else {
$alias = htmlentities($alias, ENT_QUOTES, $charset);
$alias = preg_replace('/&([a-z])[a-z]+;/i', '$1', $alias);
}
$alias = preg_replace('/[^\w\d%-]/', '', $alias);
$alias = trim($alias, '-');
return $alias;
}
--
I haven't tested that much though...
--
function stripAlias($alias) {
global $modx;
$charset = strtoupper($modx->config['modx_charset']);
$alias = strip_tags($alias);
if (function_exists('iconv')) {
$alias = iconv($charset, 'ASCII//TRANSLIT', $alias);
} else {
$alias = htmlentities($alias, ENT_QUOTES, $charset);
$alias = preg_replace('/&([a-z])[a-z]+;/i', '$1', $alias);
$alias = preg_replace('/[\xC0-\xF7]{1}[\x80-\xBF]/i', '', $alias);
}
$alias = preg_replace('/[^\w\d%-]/', '', $alias);
$alias = trim($alias, '-');
return $alias;
}
--
In the conditional statement if iconv is installed it uses it for the translation to ASCII with a good conversion of i.e. æ to ae. If iconv isn't installed it first converts with htmlentities and isolates the first char after '&' after that it removes all multibyte chars that remain (htmlentities doesn't remove all). The last lines with the preg_replace and trim are basically from the current version though they aren't identical.
I don't think that having large conversion arrays for every possible language is a feasible solution.
Using something like this: /[^\.%\p{L}\p{N} _-]/u