Topic: HTMLPurifier?  (Read 15102 times)

Pages: [1] 2 3   Go Down

#1: 18-Aug-2006, 08:29 PM

Testers
tillda
Posts: 89

WWW
Does anyhone have a free hour to implement HTMLPurifier plugin? =)

Check: http://hp.jpsband.org/

The only problem is that AFAIK we so far don't have order on events implemented and this is usefull when it's really the LAST plugin.

#2: 20-Aug-2006, 01:04 PM

Ambush Commander
Posts: 20

Well, I could try noodling around with Modx CMS, and see if I could bang out a plugin (I guess I ought to do it for all the major blogs/CMS out there to speed adoption).

Tracked you guys down from my referrer log. Wink

#3: 20-Aug-2006, 03:58 PM

Foundation

rthrash
Posts: 11,348

WWW
Welcome Ambush! Cheesy

Please feel free to join the fray! Your code looks really great. If we need to insert a new event to tag off of, I don't see why would couldn't manage to squeeze that one in. Wink
MODx is a content managmeent framework that allows web professionals to turn over sites to end-users for daily maintenance without worrying. Please help us help you when asking for assistance and read the wiki. Searching the forums from the top level helps, too.
Ryan Thrash
MODx Co-Founder
Principal @ Collabpad
work productively.
work intelligently.
work together.

#4: 21-Aug-2006, 12:41 AM

Coding Team

Dr. Scotty Delicious
Posts: 1,192

D.F.P.A.

WWW
Well, I could try noodling around with Modx CMS, and see if I could bang out a plugin (I guess I ought to do it for all the major blogs/CMS out there to speed adoption).

Tracked you guys down from my referrer log. Wink

That would be awesome!
Thanks Ambush Commander!\

-sD-

#5: 21-Aug-2006, 02:24 PM

Ambush Commander
Posts: 20

Okay, after wrangling with the download (it doesn't work in Firefox or Opera regardless of firewall, you may want to investigate that), I've got a copy of the package.  However, the events that the documentation covers seem to only cover filters during pageserves. Now, while it's possible to hook in HTMLPurifier at that point in time, the library is not fast, and it would be better if it was used on form submission. (or, if you have a caching system that I don't know about, that works too).  Any pointers?

#6: 21-Aug-2006, 03:01 PM

Coding Team

doze
Posts: 4,099

....Boom!

..However, the events that the documentation covers seem to only cover filters during pageserves. Now, while it's possible to hook in HTMLPurifier at that point in time, the library is not fast, and it would be better if it was used on form submission. (or, if you have a caching system that I don't know about, that works too).  Any pointers?
You can use the OnDocFormSave event to itercept document saving and alter the posted data. Look for a example here Re: Is it possible to change values when a document is saved in the manager?

Also, you can see all the available events by creating a new plugin and viewing "System Events" tab. I know the documentation isn't very well detailed at this stage, but there is a reason for this too. The developing pace of this project has been extremely fast and the documentation has not just followed the same pace. There is also a total rewrite going on behind the scenes currently and you can imagine that there's not alot of extra energy to write a documentation at this time knowing that it will be outdated in the near future. But with the next release, there will be (along with other documentation) extensive developers documentation built straight from the source, so it's all being worked on. The future version is using XPDO ORM layer (also built by MODx core team member) and that will give you some idea where this project is going..

I don't know what would be the best event/events to implement the HTMLPurifier plugin, but I know that OnDocFormSave is not the best, because at this point none of the chunks/snippets/etc have not returned their output. I guess OnCacheUpdate could be one place to "purify" cached pages. OnParseDocument would be done at every page render, but as you say, it might be too much overhead.. So maybe someone with more knowledge on inner workings can give you a better answer.

Anyways.. here's a list of sytem events from 0.9.2.1, some new are coming in 0.9.5 and Ryan even said that new event for this purpose could be squeezed in if needed  Smiley (altought, I think that there is allready enought events to choose from..)

Template Service Events

OnDocPublished
OnDocUnPublished
OnLoadWebDocument
OnParseDocument
OnWebPageInit
OnWebPagePrerender

Cache Service Events

OnBeforeCacheUpdate
OnBeforeSaveWebPageCache
OnCacheUpdate
OnLoadWebPageCache

Web Access Service Events

OnBeforeWebLogin
OnBeforeWebLogout
OnWebAuthentication
OnWebChangePassword
OnWebCreateGroup
OnWebDeleteUser
OnWebLogin
OnWebLogout
OnWebSaveUser

Manager Access Events
OnBeforeManagerLogin
OnBeforeManagerLogout
OnManagerAuthentication
OnManagerChangePassword
OnManagerCreateGroup
OnManagerDeleteUser
OnManagerLogin
OnManagerLogout
OnManagerPageInit
OnManagerSaveUser

Parser Service Events

OnFileManagerUpload
OnPageNotFound
OnPageUnauthorized
OnSiteRefresh

Chunks
OnBeforeChunkFormDelete
OnBeforeChunkFormSave
OnChunkFormDelete
OnChunkFormPrerender
OnChunkFormRender
OnChunkFormSave

Documents
OnBeforeDocFormDelete
OnBeforeDocFormSave
OnCreateDocGroup
OnDocFormDelete
OnDocFormPrerender
OnDocFormRender
OnDocFormSave

Modules
OnBeforeModFormDelete
OnBeforeModFormSave
OnModFormDelete
OnModFormPrerender
OnModFormRender
OnModFormSave

Plugins
OnBeforePluginFormDelete
OnBeforePluginFormSave
OnPluginFormDelete
OnPluginFormPrerender
OnPluginFormRender
OnPluginFormSave

RichText Editor
OnRichTextEditorInit
OnRichTextEditorRegister

Snippets

OnBeforeSnipFormDelete
OnBeforeSnipFormSave
OnSnipFormDelete
OnSnipFormPrerender
OnSnipFormRender
OnSnipFormSave

System Settings

OnFriendlyURLSettingsRender
OnInterfaceSettingsRender
OnMiscSettingsRender
OnSiteSettingsRender
OnUserSettingsRender

Template Variables
OnBeforeTVFormDelete
OnBeforeTVFormSave
OnTVFormDelete
OnTVFormPrerender
OnTVFormRender
OnTVFormSave

Templates

OnBeforeTempFormDelete
OnBeforeTempFormSave
OnTempFormDelete
OnTempFormPrerender
OnTempFormRender
OnTempFormSave

Users
OnBeforeUserFormDelete
OnBeforeUserFormSave
OnUserFormDelete
OnUserFormPrerender
OnUserFormRender
OnUserFormSave

Web Users
OnBeforeWUsrFormDelete
OnBeforeWUsrFormSave
OnWUsrFormDelete
OnWUsrFormPrerender
OnWUsrFormRender
OnWUsrFormSave
New MODx wiki! Please help up with documentation efforts! || Old Wiki

"He can have a lollipop any time he wants to. That's what it means to be a programmer."

#7: 21-Aug-2006, 03:12 PM

Coding Team

Dr. Scotty Delicious
Posts: 1,192

D.F.P.A.

WWW
This would be perfect for Replix, which seems to insert <br />tags as if they are going out of style!

-sD-

#8: 21-Aug-2006, 08:45 PM

Ambush Commander
Posts: 20

Quote
I don't know what would be the best event/events to implement the HTMLPurifier plugin, but I know that OnDocFormSave is not the best, because at this point none of the chunks/snippets/etc have not returned their output.

That might be a good thing.  While I've tried to make HTMLPurifier as permissive as possible, there are certain HTML elements it will never support: FORM (and friends), OBJECT, EMBED, IFRAME, etc. Since snippets and chunks are highly trusted, we may want to let them bypass the filtering process. Their syntax is primarily compatible, although the ampersands may be a PITA to handle (they'll all get escaped).

What precisely is expected user input, and what kinds of HTML do snippets and chunks use? If snippets/chunks need to bypass the filter, we'd want to put HTMLPurifier before them, but if their output is basically the same, we can put HTMLPuriifer after, perhaps on the cache event.

Besides all that, I'm still not precisely sure how the plugin structure works (from what I gather, it's a snippet that's directly copypasted onto your index.php).

#9: 21-Aug-2006, 10:31 PM

Testers
tillda
Posts: 89

WWW
Sorry for my misunderstanding...is there a problem with speed while hooking it on OnWebPageInit?

Probably it was not a best idea to assign this to someone out of the team, while creating modx plugin is really simple, check out this code, that is Texy plugin (Texy is Textile/Markdown alternative).

Code:
$e = &$modx->Event;

switch ($e->name) {
case "OnWebPagePrerender":
include_once($modx->config["base_path"].'/assets/plugins/texy/texy.php');
$texyengine = &new Texy();
$doc = $modx->documentOutput;
$doc = $texyengine->process($doc);
$modx->documentOutput = $doc;
break;

default: // stop here
return;
break;
}

return $texy;

Basically you create a case for events and modify $modx->documentOutput inside.

#10: 22-Aug-2006, 02:20 PM

Ambush Commander
Posts: 20

Perhaps so, considering the state of the documentation. People are posting code willy nilly, but where precisely does it all go? The plugin directory? A new module? I don't see the word Plugin mentioned at all in the Content Manager, is it equivalent to module?

Sorry about my ignorance. It takes me a little while to grasp third-party applications, especially big ones. I've never seen anything like Modx before (and that, in a way, is a good thing ;-)

#11: 22-Aug-2006, 02:30 PM

Coding Team

doze
Posts: 4,099

....Boom!

log in to the manager, then go to resources > manage resources > plugins tab > new plugin
New MODx wiki! Please help up with documentation efforts! || Old Wiki

"He can have a lollipop any time he wants to. That's what it means to be a programmer."

#12: 22-Aug-2006, 03:01 PM

Ambush Commander
Posts: 20

Hmm... this is quite frustrating.

I've tried this:

Code:
$e = &$modx->Event;

$e = &$modx->Event;

// Only on this event
if ($e->name == 'OnDocFormSave') {
    set_include_path('/Documents and Settings/Edward/My Documents/My Webs/htmlpurifier/library'
. PATH_SEPARATOR . get_include_path());
    include_once('HTMLPurifier.php');
    $purifier = new HTMLPurifier();
    $_POST['tvcontent'] = $purifier->purify($_POST['tvcontent']);
}

return $purifier;

But it doesn't seem to work. A good test of HTMLPurifier is assigning a lang attribute to one of the elements. It should be copied over to xml:lang as per XHTML compatibility guidelines. I'm not getting this behavior, so I have to assume that the plugin isn't working.

Plus, the thing itself is extremely hacky: what if input comes in from another vector? To be quite honest, I don't know what I should be doing. If someone else wants to take a stab at it, be my guest, but I am stumped.

A few details about my library for anyone who wants to step up to the plate: It's extremely easy to use, add the directory containing the library files to your path, include HTMLPurifier.php, instantiate an HTMLPurifier object, and then call purify() on whatever you need. The above code shows the theoretical flow pattern.

HTMLPurifier will remove anything that's not in its list of allowed elements, but the notable ones are OBJECT, EMBED, IFRAME and FORM (I like to call these defective by design). It will remove anything not in allowed attributes, which means that any scripting added on later on will be removed if you process to late.

It's not meant to process complete documents (because, of course, other parts of the document may need scripting and forms and etc). So it shouldn't be run on complete pages.

Finally, the library currently only supports UTF-8. I am working to also allow other major charsets (notably iso-8859-1), but if you don't switch to UTF-8, expect some weird char encoding issues.

#13: 5-Sep-2006, 10:58 AM

Marketing & Design Team

davidm
MODx evangelist
Posts: 7,073

The best way to predict the future is to invent it

WWW
*Bump*

Anyone can answer Ambush there ?

I'd really LOVE to have the HTML purifier integrated with MODx !
.: nodeo.net : Pour un web libre, moderne et ouvert ! :: david-molliere.net : Suivez en "live" mes expérimentations et billets sur les CMS et autres applications web :.

*** Forums modxcms.fr Participez à l'élaboration du site MODx francophone ! ***

! Nouveau !  En live, ne manquez pas les news de modxcms.fr sur Twitter   ! Nouveau !

MODx est l'outil idéal pour les developpeurs et webdesigners qui cherchent un framework de gestion de contenu hautement flexible et performant, tout en étant simple d'accès pour les utilisateurs finaux.

Config : Apache 2.2.8 - MySQL 5.0.67 - PHP 5.2.8 | Debian 4.0 (Etch)

Réalisations sous MODx : | pargade-notaires.fr | soleil.info | gican.asso.fr | michelez-notaires.com | amadom.gerondicap.com | jocelyne-violet.net

#14: 5-Sep-2006, 11:26 AM

PaulGregory
MODx's midnight runner
Posts: 1,097

MODx's midnight runner

WWW
Looks like he came very close with the last posting, but he seems to have
a) assumed that the content is called "tvcontent".
b) assumed that you can write to $_POST['tvcontent'] (which I've never seen done before).
No, I don't know what OpenGeek's saying half the time either.
MODx Documentation: The Wiki | My Wiki contributions | Main MODx Documentation
Forum: Where to post threads about add-ons | Forum Rules
Like MODx? donate (and/or share your resources)
Like me? See my Amazon wishlist
MODx "Most Promising CMS" - so appropriate!

#15: 5-Sep-2006, 12:02 PM

PaulGregory
MODx's midnight runner
Posts: 1,097

MODx's midnight runner

WWW
EDIT: Version 3, edited with fuller instructions, and actually tested and working:

Right. It's academic whether you can write to $_POST, as the content has already been transferred to $content by the time the plugin point comes around. While I don't actually have the HTML Purifier code (I can't get to the site), I have tested the basic principle of the plugin by creating a test snippet that tinkered with the content before saving. The following should therefore work.

Save the HTML Purifier stuff in "assets/plugins/htmlpurifier".

Create a plugin. Name doesn't matter, but call it HTMLPurifier. Use this code:

Code:
$e = &$modx->Event;
if ($e->name == 'OnBeforeDocFormSave') {
    set_include_path('../assets/plugins/htmlpurifier/library/');
    include_once('HTMLPurifier.php');
    $purifier = new HTMLPurifier();
    global $content;
    $content = $purifier->purify($content);
}

Make sure OnBeforeDocFormSave is ticked on the System Events tab.

Save.

Try editing documents; deliberately put crappy HTML in and form tags, see if they disappear on saving.
« Last Edit: 5-Sep-2006, 05:23 PM by PaulGregory »
No, I don't know what OpenGeek's saying half the time either.
MODx Documentation: The Wiki | My Wiki contributions | Main MODx Documentation
Forum: Where to post threads about add-ons | Forum Rules
Like MODx? donate (and/or share your resources)
Like me? See my Amazon wishlist
MODx "Most Promising CMS" - so appropriate!

#16: 5-Sep-2006, 04:42 PM

Ambush Commander
Posts: 20

Hmm... it doesn't work... (at least for me, anyway). And yes, I did check the proper system event (the heading for that set of events was greyed, if that means anything).

What troubles are you having getting to the site?

#17: 5-Sep-2006, 05:22 PM

PaulGregory
MODx's midnight runner
Posts: 1,097

MODx's midnight runner

WWW
Am now able to get to your site, so I'm testing it and find that there's an issue with the include.
New code:
Code:
$e = &$modx->Event;
if ($e->name == 'OnBeforeDocFormSave') {
    set_include_path('../assets/plugins/htmlpurifier/library/');
    include_once('HTMLPurifier.php');
    $purifier = new HTMLPurifier();
    global $content;
    $content = $purifier->purify($content);
}

Ambush, if your plugin wasn't erroring on save, then something else is wrong. Yes, there's gray text, I don't know what it means.

Works with my basic test; having something in bold and some form tags. Bold tags stay, form tags vanish. Woo!
(I'm sure there's more to HTMLPurifier than that, but it proves that something is happening).
« Last Edit: 5-Sep-2006, 05:27 PM by PaulGregory »
No, I don't know what OpenGeek's saying half the time either.
MODx Documentation: The Wiki | My Wiki contributions | Main MODx Documentation
Forum: Where to post threads about add-ons | Forum Rules
Like MODx? donate (and/or share your resources)
Like me? See my Amazon wishlist
MODx "Most Promising CMS" - so appropriate!

#18: 5-Sep-2006, 05:29 PM

Ambush Commander
Posts: 20

Even with the fixed include path it doesn't work.

#19: 5-Sep-2006, 07:03 PM

PaulGregory
MODx's midnight runner
Posts: 1,097

MODx's midnight runner

WWW
There was a ); missing in one version - please make sure that you've repasted the whole thing, and that OnBeforeDocFormSave is checked, not OnDocFormSave.

When you say "it doesn't work", are you getting an error message when you save a document?
No, I don't know what OpenGeek's saying half the time either.
MODx Documentation: The Wiki | My Wiki contributions | Main MODx Documentation
Forum: Where to post threads about add-ons | Forum Rules
Like MODx? donate (and/or share your resources)
Like me? See my Amazon wishlist
MODx "Most Promising CMS" - so appropriate!

#20: 5-Sep-2006, 07:05 PM

Ambush Commander
Posts: 20

Nope. Doesn't work as in nothing happens. Is it working on your end?
Pages: [1] 2 3   Go Up
0 Members and 1 Guest are viewing this topic.