May 17, 2008, 10:32 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
modxcms.com web
  MODxCMS.com   Forums   Help Login Register  
News:Read what MODx Developers say: MODx Dev. Blogs
Pages: [1] 2 3   Go Down
  Print  
Author Topic: HTMLPurifier?  (Read 9415 times)
0 Members and 1 Guest are viewing this topic.
tillda
Testers
*
Posts: 89



WWW
« on: Aug 18, 2006, 08:29 PM »

Does anyhone have a free hour to implement HTMLPurifier plugin? =)

Check: http://hp.jpsband.org/

The only problem is that AFAIK we so far don't have order on events implemented and this is usefull when it's really the LAST plugin.
Logged

Ambush Commander
Jr. Member
*
Posts: 19


« Reply #1 on: Aug 20, 2006, 01:04 PM »

Well, I could try noodling around with Modx CMS, and see if I could bang out a plugin (I guess I ought to do it for all the major blogs/CMS out there to speed adoption).

Tracked you guys down from my referrer log. Wink
Logged
rthrash
Foundation
*
Posts: 8,791



WWW
« Reply #2 on: Aug 20, 2006, 03:58 PM »

Welcome Ambush! Cheesy

Please feel free to join the fray! Your code looks really great. If we need to insert a new event to tag off of, I don't see why would couldn't manage to squeeze that one in. Wink
Logged

MODx is a framework that allows web professionals to turn over sites to end-users for daily maintenance without worrying. Community participation and questions are encouraged, especially when you help us help you, read the wiki, and review snippet parameters – even if you have to look at the source. Searching the forums helps, too.
Ryan Thrash
MODx Co-Founder
Principal @ Collabpad
work productively.
work intelligently.
work together.
Dr. Scotty Delicious
Coding Team
*
Posts: 1,131


Raise the Jolly Roger!


WWW
« Reply #3 on: Aug 21, 2006, 12:41 AM »

Well, I could try noodling around with Modx CMS, and see if I could bang out a plugin (I guess I ought to do it for all the major blogs/CMS out there to speed adoption).

Tracked you guys down from my referrer log. Wink

That would be awesome!
Thanks Ambush Commander!\

-sD-
Logged

We pillage, we plunder, we rifle and loot. Drink up me 'earties, Yo Ho!
We kidnap and ravage and don't give a hoot. Drink up me 'earties, Yo Ho!
Yo Ho, Yo Ho! A pirate's life for me.

http://scottydelicious.com
http://piratemachine.org
http://treasurechestcart.com

My Amazon Wishlist Wink

Download WebLoginPE 1.3.0: A complete web user management snippet for MODx.
TreasureChest Documentation: http://www.treasurechestcart.com/documentation.html.
TreasureChest. Easy MODx eCommerce!
Ambush Commander
Jr. Member
*
Posts: 19


« Reply #4 on: Aug 21, 2006, 02:24 PM »

Okay, after wrangling with the download (it doesn't work in Firefox or Opera regardless of firewall, you may want to investigate that), I've got a copy of the package.  However, the events that the documentation covers seem to only cover filters during pageserves. Now, while it's possible to hook in HTMLPurifier at that point in time, the library is not fast, and it would be better if it was used on form submission. (or, if you have a caching system that I don't know about, that works too).  Any pointers?
Logged
doze
Coding Team
*
Posts: 2,907


....Boom!


« Reply #5 on: Aug 21, 2006, 03:01 PM »

..However, the events that the documentation covers seem to only cover filters during pageserves. Now, while it's possible to hook in HTMLPurifier at that point in time, the library is not fast, and it would be better if it was used on form submission. (or, if you have a caching system that I don't know about, that works too).  Any pointers?
You can use the OnDocFormSave event to itercept document saving and alter the posted data. Look for a example here Re: Is it possible to change values when a document is saved in the manager?

Also, you can see all the available events by creating a new plugin and viewing "System Events" tab. I know the documentation isn't very well detailed at this stage, but there is a reason for this too. The developing pace of this project has been extremely fast and the documentation has not just followed the same pace. There is also a total rewrite going on behind the scenes currently and you can imagine that there's not alot of extra energy to write a documentation at this time knowing that it will be outdated in the near future. But with the next release, there will be (along with other documentation) extensive developers documentation built straight from the source, so it's all being worked on. The future version is using XPDO ORM layer (also built by MODx core team member) and that will give you some idea where this project is going..

I don't know what would be the best event/events to implement the HTMLPurifier plugin, but I know that OnDocFormSave is not the best, because at this point none of the chunks/snippets/etc have not returned their output. I guess OnCacheUpdate could be one place to "purify" cached pages. OnParseDocument would be done at every page render, but as you say, it might be too much overhead.. So maybe someone with more knowledge on inner workings can give you a better answer.

Anyways.. here's a list of sytem events from 0.9.2.1, some new are coming in 0.9.5 and Ryan even said that new event for this purpose could be squeezed in if needed  Smiley (altought, I think that there is allready enought events to choose from..)

Template Service Events

OnDocPublished
OnDocUnPublished
OnLoadWebDocument
OnParseDocument
OnWebPageInit
OnWebPagePrerender

Cache Service Events

OnBeforeCacheUpdate
OnBeforeSaveWebPageCache
OnCacheUpdate
OnLoadWebPageCache

Web Access Service Events

OnBeforeWebLogin
OnBeforeWebLogout
OnWebAuthentication
OnWebChangePassword
OnWebCreateGroup
OnWebDeleteUser
OnWebLogin
OnWebLogout
OnWebSaveUser

Manager Access Events
OnBeforeManagerLogin
OnBeforeManagerLogout
OnManagerAuthentication
OnManagerChangePassword
OnManagerCreateGroup
OnManagerDeleteUser
OnManagerLogin
OnManagerLogout
OnManagerPageInit
OnManagerSaveUser

Parser Service Events

OnFileManagerUpload
OnPageNotFound
OnPageUnauthorized
OnSiteRefresh

Chunks
OnBeforeChunkFormDelete
OnBeforeChunkFormSave
OnChunkFormDelete
OnChunkFormPrerender
OnChunkFormRender
OnChunkFormSave

Documents
OnBeforeDocFormDelete
OnBeforeDocFormSave
OnCreateDocGroup
OnDocFormDelete
OnDocFormPrerender
OnDocFormRender
OnDocFormSave

Modules
OnBeforeModFormDelete
OnBeforeModFormSave
OnModFormDelete
OnModFormPrerender
OnModFormRender
OnModFormSave

Plugins
OnBeforePluginFormDelete
OnBeforePluginFormSave
OnPluginFormDelete
OnPluginFormPrerender
OnPluginFormRender
OnPluginFormSave

RichText Editor
OnRichTextEditorInit
OnRichTextEditorRegister

Snippets

OnBeforeSnipFormDelete
OnBeforeSnipFormSave
OnSnipFormDelete
OnSnipFormPrerender
OnSnipFormRender
OnSnipFormSave

System Settings

OnFriendlyURLSettingsRender
OnInterfaceSettingsRender
OnMiscSettingsRender
OnSiteSettingsRender
OnUserSettingsRender

Template Variables
OnBeforeTVFormDelete
OnBeforeTVFormSave
OnTVFormDelete
OnTVFormPrerender
OnTVFormRender
OnTVFormSave

Templates

OnBeforeTempFormDelete
OnBeforeTempFormSave
OnTempFormDelete
OnTempFormPrerender
OnTempFormRender
OnTempFormSave

Users
OnBeforeUserFormDelete
OnBeforeUserFormSave
OnUserFormDelete
OnUserFormPrerender
OnUserFormRender
OnUserFormSave

Web Users
OnBeforeWUsrFormDelete
OnBeforeWUsrFormSave
OnWUsrFormDelete
OnWUsrFormPrerender
OnWUsrFormRender
OnWUsrFormSave
Logged

MODxWiki || Please, list wiki worthy material here!
Dr. Scotty Delicious
Coding Team
*
Posts: 1,131


Raise the Jolly Roger!


WWW
« Reply #6 on: Aug 21, 2006, 03:12 PM »

This would be perfect for Replix, which seems to insert <br />tags as if they are going out of style!

-sD-
Logged

We pillage, we plunder, we rifle and loot. Drink up me 'earties, Yo Ho!
We kidnap and ravage and don't give a hoot. Drink up me 'earties, Yo Ho!
Yo Ho, Yo Ho! A pirate's life for me.

http://scottydelicious.com
http://piratemachine.org
http://treasurechestcart.com

My Amazon Wishlist Wink

Download WebLoginPE 1.3.0: A complete web user management snippet for MODx.
TreasureChest Documentation: http://www.treasurechestcart.com/documentation.html.
TreasureChest. Easy MODx eCommerce!
Ambush Commander
Jr. Member
*
Posts: 19


« Reply #7 on: Aug 21, 2006, 08:45 PM »

Quote
I don't know what would be the best event/events to implement the HTMLPurifier plugin, but I know that OnDocFormSave is not the best, because at this point none of the chunks/snippets/etc have not returned their output.

That might be a good thing.  While I've tried to make HTMLPurifier as permissive as possible, there are certain HTML elements it will never support: FORM (and friends), OBJECT, EMBED, IFRAME, etc. Since snippets and chunks are highly trusted, we may want to let them bypass the filtering process. Their syntax is primarily compatible, although the ampersands may be a PITA to handle (they'll all get escaped).

What precisely is expected user input, and what kinds of HTML do snippets and chunks use? If snippets/chunks need to bypass the filter, we'd want to put HTMLPurifier before them, but if their output is basically the same, we can put HTMLPuriifer after, perhaps on the cache event.

Besides all that, I'm still not precisely sure how the plugin structure works (from what I gather, it's a snippet that's directly copypasted onto your index.php).
Logged
tillda
Testers
*
Posts: 89



WWW
« Reply #8 on: Aug 21, 2006, 10:31 PM »

Sorry for my misunderstanding...is there a problem with speed while hooking it on OnWebPageInit?

Probably it was not a best idea to assign this to someone out of the team, while creating modx plugin is really simple, check out this code, that is Texy plugin (Texy is Textile/Markdown alternative).

Code:
$e = &$modx->Event;

switch ($e->name) {
case "OnWebPagePrerender":
include_once($modx->config["base_path"].'/assets/plugins/texy/texy.php');
$texyengine = &new Texy();
$doc = $modx->documentOutput;
$doc = $texyengine->process($doc);
$modx->documentOutput = $doc;
break;

default: // stop here
return;
break;
}

return $texy;

Basically you create a case for events and modify $modx->documentOutput inside.
Logged

Ambush Commander
Jr. Member
*
Posts: 19


« Reply #9 on: Aug 22, 2006, 02:20 PM »

Perhaps so, considering the state of the documentation. People are posting code willy nilly, but where precisely does it all go? The plugin directory? A new module? I don't see the word Plugin mentioned at all in the Content Manager, is it equivalent to module?

Sorry about my ignorance. It takes me a little while to grasp third-party applications, especially big ones. I've never seen anything like Modx before (and that, in a way, is a good thing ;-)
Logged
doze
Coding Team
*
Posts: 2,907


....Boom!


« Reply #10 on: Aug 22, 2006, 02:30 PM »

log in to the manager, then go to resources > manage resources > plugins tab > new plugin
Logged

MODxWiki || Please, list wiki worthy material here!
Ambush Commander
Jr. Member
*
Posts: 19


« Reply #11 on: Aug 22, 2006, 03:01 PM »

Hmm... this is quite frustrating.

I've tried this:

Code:
$e = &$modx->Event;

$e = &$modx->Event;

// Only on this event
if ($e->name == 'OnDocFormSave') {
    set_include_path('/Documents and Settings/Edward/My Documents/My Webs/htmlpurifier/library'
. PATH_SEPARATOR . get_include_path());
    include_once('HTMLPurifier.php');
    $purifier = new HTMLPurifier();
    $_POST['tvcontent'] = $purifier->purify($_POST['tvcontent']);
}

return $purifier;

But it doesn't seem to work. A good test of HTMLPurifier is assigning a lang attribute to one of the elements. It should be copied over to xml:lang as per XHTML compatibility guidelines. I'm not getting this behavior, so I have to assume that the plugin isn't working.

Plus, the thing itself is extremely hacky: what if input comes in from another vector? To be quite honest, I don't know what I should be doing. If someone else wants to take a stab at it, be my guest, but I am stumped.

A few details about my library for anyone who wants to step up to the plate: It's extremely easy to use, add the directory containing the library files to your path, include HTMLPurifier.php, instantiate an HTMLPurifier object, and then call purify() on whatever you need. The above code shows the theoretical flow pattern.

HTMLPurifier will remove anything that's not in its list of allowed elements, but the notable ones are OBJECT, EMBED, IFRAME and FORM (I like to call these defective by design). It will remove anything not in allowed attributes, which means that any scripting added on later on will be removed if you process to late.

It's not meant to process complete documents (because, of course, other parts of the document may need scripting and forms and etc). So it shouldn't be run on complete pages.

Finally, the library currently only supports UTF-8. I am working to also allow other major charsets (notably iso-8859-1), but if you don't switch to UTF-8, expect some weird char encoding issues.
Logged
davidm
Marketing & Design Team
*
Posts: 6,423


The best way to predict the future is to invent it


WWW
« Reply #12 on: Sep 05, 2006, 10:58 AM »

*Bump*

Anyone can answer Ambush there ?

I'd really LOVE to have the HTML purifier integrated with MODx !
Logged

blog.nodeo.net : Pour un web libre, moderne et ouvert! :: | ! Nouveau ! Les forums modxcms.fr : Participez à l'élaboration du site MODx francophone ! ! Nouveau ! :.

MODx est l'outil idéal pour les developpeurs et webdesigners qui cherchent un framework de gestion de contenu hautement flexible et performant, tout en étant simple d'accès pour les utilisateurs finaux.

Config : Apache 2.2.8 - MySQL 5.0.45 - PHP 5.2.5 | Debian 4.0 (Etch)

Réalisations sous MODx : nodeo.net | gican.asso.fr | michelez-notaires.com | amadom.gerondicap.com | sworld.com | soleil.info
 et 3 autres en cours de réalisation Smiley
PaulGregory
MODx's midnight runner
Moderator
*
Posts: 1,095

MODx's midnight runner


WWW
« Reply #13 on: Sep 05, 2006, 11:26 AM »

Looks like he came very close with the last posting, but he seems to have
a) assumed that the content is called "tvcontent".
b) assumed that you can write to $_POST['tvcontent'] (which I've never seen done before).
Logged

No, I don't know what OpenGeek's saying half the time either.
MODx Documentation: The Wiki | My Wiki contributions | Main MODx Documentation
Forum: Where to post threads about add-ons | Forum Rules
Like MODx? donate (and/or share your resources)
Like me? See my Amazon wishlist
MODx "Most Promising CMS" - so appropriate!
PaulGregory
MODx's midnight runner
Moderator
*
Posts: 1,095

MODx's midnight runner


WWW
« Reply #14 on: Sep 05, 2006, 12:02 PM »

EDIT: Version 3, edited with fuller instructions, and actually tested and working:

Right. It's academic whether you can write to $_POST, as the content has already been transferred to $content by the time the plugin point comes around. While I don't actually have the HTML Purifier code (I can't get to the site), I have tested the basic principle of the plugin by creating a test snippet that tinkered with the content before saving. The following should therefore work.

Save the HTML Purifier stuff in "assets/plugins/htmlpurifier".

Create a plugin. Name doesn't matter, but call it HTMLPurifier. Use this code:

Code:
$e = &$modx->Event;
if ($e->name == 'OnBeforeDocFormSave') {
    set_include_path('../assets/plugins/htmlpurifier/library/');
    include_once('HTMLPurifier.php');
    $purifier = new HTMLPurifier();
    global $content;
    $content = $purifier->purify($content);
}

Make sure OnBeforeDocFormSave is ticked on the System Events tab.

Save.

Try editing documents; deliberately put crappy HTML in and form tags, see if they disappear on saving.
« Last Edit: Sep 05, 2006, 05:23 PM by PaulGregory » Logged

No, I don't know what OpenGeek's saying half the time either.
MODx Documentation: The Wiki | My Wiki contributions | Main MODx Documentation
Forum: Where to post threads about add-ons | Forum Rules
Like MODx? donate (and/or share your resources)
Like me? See my Amazon wishlist
MODx "Most Promising CMS" - so appropriate!
Pages: [1] 2 3   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP

Copyright © 2005-2008 MODxCMS, All rights reserved. Contact Us
Styles by ziworks.com

Powered by SMF 1.1.4 | SMF © 2005, Simple Machines LLC

Valid XHTML 1.0! Valid CSS!