Dec 04, 2008, 01:15 AM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
Search via SMF or Google: modx forums all of modxcms.com web
  MODxCMS.com   Forums   Help Login Register  
News:Read what MODx Developers say: MODx Dev. Blogs
Pages: [1]   Go Down
  Print  
Author Topic: Performance of MODx for larger sites  (Read 3831 times)
0 Members and 1 Guest are viewing this topic.
tomtom
Jr. Member
*
Posts: 8


« on: Jun 27, 2006, 04:15 PM »

Hi all,
I am thinking about using MODx for a medium sized project and therefore did some performance tests.
I populated the modx_site_content table with lot of entries and the result looks like the system scales quite bad for sites with many pages.

Didn’t look at the code in detail up to now, but this looks like a conceptual problem?
E.g. why is it required to load the page aliases, document ids or document relations of any existing page in the system on every page request?
Is MODx just not designed to handle larger sites or is there something I am doing wrong or could change to get acceptable performance?

Any ideas, experiences or suggestions?


Here some figures of a first quick test:

System:
Linux Debian Sarge
PHP5  5.1.4
MySQL 4.1
Aapache 2
default install of MODx 0.9.2.1


Code:
modx_site_content, 20 Rows, 59.2 KB
siteCache.idx.php 143.0 KB
PHP memory allocated: 1.3 MB
MySQL: 0.0171 s, 12 request(s), PHP: 0.2073 s, total: 0.2244 s, document retrieved from database.

modx_site_content, 50 Rows, 110.5 KB
siteCache.idx.php 148.8 KB
PHP memory allocated: 1.4 MB
MySQL: 0.0187 s, 12 request(s), PHP: 0.2449 s, total: 0.1595 s, document retrieved from database.

modx_site_content, 500 Rows, 1.2 MB
siteCache.idx.php  235.8 KB
PHP memory allocated: 2.1 MB
MySQL: 0.0618 s, 12 request(s), PHP: 0.2310 s, total: 0.2928 s, document retrieved from database.

modx_site_content, 5000 Rows, 4.3 MB
siteCache.idx.php 1.1 MB
PHP memory allocated: 10.2 MB
MySQL: 0.3139 s, 12 request(s), PHP: 0.7982 s, total: 1.1121 s, document retrieved from database.

modx_site_content, 10000 Rows, 8.7 MB
siteCache.idx.php 2.1 MB
PHP memory allocated: 19.1 MB
MySQL: 0.6684 s, 12 request(s), PHP: 1.9220 s, total: 2.5904 s, document retrieved from database.

modx_site_content, 15000 Rows, 13.2 MB
siteCache.idx.php 2.9 MB
PHP memory allocated: 27.8 MB
MySQL: 0.9567 s, 12 request(s), PHP: 2.4612 s, total: 3.4180 s, document retrieved from database.

modx_site_content, 20000 Rows, 17.6 MB
siteCache.idx.php 3.9 MB
PHP memory allocated: 38.9 MB
MySQL: 1.5259 s, 12 request(s), PHP: 2.8582 s, total: 4.3841 s, document retrieved from
database.


Regards,
Tom
« Last Edit: Jun 28, 2006, 07:11 AM by PaulGregory » Logged
xwisdom
Foundation
*
Posts: 1,732



« Reply #1 on: Jun 27, 2006, 05:19 PM »

Many thanks for sharing your findings with us.

We are currently aware of some of the limitations when it comes on to very large sites. Work is currenly being done to make the system perform better on such websites.

Some users currently have the system running with over 2000 pages and 100+ users. The trick IMO is to tune your modx site by using document caching among other things mentioned in the forums.

Logged

xWisdom
www.xwisdomhtml.com
The fear of the Lord is the beginning of wisdom:
MODx Co-Founder - Create and do more with less.
tomtom
Jr. Member
*
Posts: 8


« Reply #2 on: Jun 28, 2006, 03:18 AM »

Hi xwisdom,

Quote
The trick IMO is to tune your modx site by using document caching among other things mentioned in the forums.

Document caching (of MODx) won't help too much if you look at the figures.  Just the time PHP takes is already too much.

Regards,
Tom
Logged
Djamoer
Testers
*
Posts: 1,492

No one can limit a man other than the man himself.


WWW
« Reply #3 on: Jun 28, 2006, 05:42 AM »

Document caching (of MODx) won't help too much if you look at the figures.  Just the time PHP takes is already too much.

I'm totally interested in discussing this further with you.
Can you explain a little bit more on how you get all those number? I'm assuming that when the first time you load the page, it will required a lot more  workload, compare to the second time you load the page, if you use page caching system on MODx.

I'll help you understand the big picture on how MODx work, and if you can justify the number again for us, it will be great.

Basically MODx parser will always load all the cache file that've pointed out in your benchmark result, which is why I can understand that the amount of memory the php requires is increasing rapidly but following the ammount of the pages that you've loaded into the database. The reason is because it needs to loads all the pages id, alias, and etc into the cache, which is if you have 20,000 pages, it will have more than 40,000 line of code inside the cache. We might need to start thingking a better way to cache this document information without sacrificing the system performance on every new page request.

The second phase, MODx will need a way to determine the output for the current page request. During this time, MODx will use the above cached array data for document to determine which document id need to be loaded to the front page. I believe this will be one of those performce penalty on MODx, considering the amount of array data that needs to be processed over and over again.

The third phase, the system will check the cache directory, if it does find a cache file for that specific file, it will load the page from the cache file, but if it doesn't, it will load it from database, and parse the whole page again. Basically the bottle neck that I can see from this approach is the amount of cache files that keep accumulating inside the directory, and if you use linux, Linux known for its problem with having a bunch of files in 1 directory. So from your benchmark, we will have approximately 20,000 files. This will be a pain in the neck when cleaning the site cache, because the system will have to remove all those 20,000 files at once. I'm not sure about the php function to read the files, but I hope it won't be too much of a problem.

The last phase will be parsing the cache data or the parsed data from previous to be parsed again, basically this is to parse the uncache snippet that is not being cached yet. I believe this won't be a problem.

So my conclusion, there is no wonder that the php processing time took so much time, which increase quite drastically following the amount of pages that the site have. The amount of memory needed is also reasonable. The only thing that is quite unexpected is the mysql processing time. I have no idea why it increase quite drastically compare to php processing time and memory allocation needed.

Could you justify this for me, when you load the page, did you benchmarked it base on uncache page or cache page? It suppose to make a different in mysql processing time. I believe if you have a cached page, there is only 3-5 mysql request, but I'm not 100% sure.

PS: I might be wrong, but I open to any suggestion, so we can improve the current core code. Do you have any experience in optimizing code tomtom?
Logged

tomtom
Jr. Member
*
Posts: 8


« Reply #4 on: Jun 28, 2006, 09:04 AM »

Hi Djamoer,

Quote
Can you explain a little bit more on how you get all those number?

Ok that’s what I did:
1. created a few “root” pages and some child pages. (about 20 at all).
2. disabled caching by default ( $modx->documentObject['cacheable'] = 0; )
3. did a “Refresh Site” in administration
4. inserted “echo memory_get_usage();” at the end of the siteCache.idx.php file as I expected one of the reasons for high memory consumption at this file ...
5. loaded one parent page of the site in browser ( http://modx.tld/test.html )
6. noted down the results

After that populated the modx_site_content table with some dummy pages all belonging to the same parent page.
Repeated step 3 – 6

And so on.....

Quote
The reason is because it needs to loads all the pages id, alias, and etc into the cache, which is if you have 20,000 pages, it will have more than 40,000 line of code inside the cache. We might need to start thingking a better way to cache this document information without sacrificing the system performance on every new page request.

IMO that’s the most important thing to improve, if we want to use MODx for larger sites.


Quote
The only thing that is quite unexpected is the mysql processing time.

I found this query from function “getActiveChildren” being sole responsible for the bad mysql performance:

Code:
$sql = "SELECT DISTINCT $fields FROM $tblsc sc
      LEFT JOIN $tbldg dg on dg.document = sc.id
      WHERE sc.parent = '$id' AND sc.published=1 AND sc.deleted=0
      AND ($access)
      ORDER BY $sort $dir;";

Btw this query is also executed when retrieving a cached document!


Quote
Do you have any experience in optimizing code tomtom?

Not in particular, though I have some years of experience with PHP and MySQL.
But I am really interested in MODx because I love the concept in general.
It is the best CMS written in PHP I’ve seen so far.

So I would really like to help out a little bit if I can...
Thanks for all this information so far.

Regards,
Tom

Logged
tomtom
Jr. Member
*
Posts: 8


« Reply #5 on: Jun 28, 2006, 10:52 AM »

A first concrete improvement suggestion:

The "DropMenu" snippet is calling modx->getActiveChildren. There it would be an useful improvement to make it possible to query just for children not hidden in menu. At the moment the DropMenu snippet filters this afterwards by php.
Filtering this by MySQL can be a great speed improvement thinking of parent elements having a lot of hidden items, e.g. probably quite common for parent elements containing news.

Logged
Djamoer
Testers
*
Posts: 1,492

No one can limit a man other than the man himself.


WWW
« Reply #6 on: Jun 28, 2006, 11:34 AM »

Thanks for the input tomtom. This is my comment on your reply.

The size of the memory being use is way above on what it suppose to do, that's why we need a better way to improve the caching of the site id array, but me personally, I would suggest the use of some other third party caching system to cache memory on the server itself, considering when you have 20000 pages, it usually involve a large corporate or community website, which usually have their own dedicated server and etc. As for the DropMenu, I would definetely assume that you have your site with basic installation and default site sample, which have a dropmenu snippet to list all the documents on the system. From my opinion, that's one of the reason we have caching mechanism. If you call [[DropMenu]] without making it uncache, the result will be cached on the server, and it will only required the first user to load the page and access the database, then the system will cache the result. With that, I believe the mysql processing time will be below what you already benchmarked for the rest of the user. Another solution will be to cache each document with their hidden menu status, and let DropMenu snippet read the data from here. But it basically will add more memory to the cache system (maybe sql solution will be great).

I believe we need to tackle the cache memory usage. The rest of it can be solved with best practices by using a default cache system. So my conclusion, the current MODx system does scale quite well for a large website, except for the fact that we can improve it by 30-60%, if we do optimization on the current core code. Tomtom, do you have time and willingness in contributing to this? Sounds to me that you know quite well on how to trace back MODx logic, even though you're a newbie in MODx core code. If you have a good idea that you can implement to the current core code, feel free to post in this forum, and I believe Xwisdom or OpenGeek would be more than happy to review it.

Sincerely,
Logged

ppaulousek
Guest
« Reply #7 on: Jun 28, 2006, 11:36 AM »

I'm not an expert at all, but I also guess that DropMenu is a very "hungry" snippet.
(Maybe its also responsible for executing the "activeChildren" call for cached pages?)
I would very much like to see tomtom's results without DropMenu being involved!
Logged
rthrash
Foundation
*
Posts: 9,575



WWW
« Reply #8 on: Jun 28, 2006, 12:53 PM »

FYI, DropMenu will be re-authored for the next release to be much less resource hungry. Wink
Logged

MODx is a framework that allows web professionals to turn over sites to end-users for daily maintenance without worrying. Community participation and questions are encouraged, especially when you help us help you, read the wiki, and review snippet parameters – even if you have to look at the source. Searching the forums helps, too.
Ryan Thrash
MODx Co-Founder
Principal @ Collabpad
work productively.
work intelligently.
work together.
Djamoer
Testers
*
Posts: 1,492

No one can limit a man other than the man himself.


WWW
« Reply #9 on: Jun 29, 2006, 08:28 AM »

I'm not an expert at all, but I also guess that DropMenu is a very "hungry" snippet.
(Maybe its also responsible for executing the "activeChildren" call for cached pages?)
I would very much like to see tomtom's results without DropMenu being involved!

I'm also interested in seeing the result without DropMenu snippet, and maybe a test with caching and without caching, so we can compare it side by side, how beneficial the caching system that we have.

As for DropMenu, I believe we do really need to search for it from calling back to database, even though the page had been cached. Maybe we can optimize it with MySql call. I'm no sql expert, but someone might be able to come up with a good solution. SOMEONE? Wink Jason? Raymond? Ryan? Wink
Logged

OpenGeek
MODx Co-Founder
Foundation
*
Posts: 5,054


looking a little more like my avatar again...


WWW
« Reply #10 on: Jun 29, 2006, 12:30 PM »

Optimization and component flexibility are almost always at odds in the world of web application development, just as are the finer qualities of optimization itself, i.e. memory footprint vs. speed of execution.  I have done a lot of experimenting over the past 6 months focusing on these issues in regard to the current MODx code.  The ability to create a menu quickly can be addressed by removing all need for SQL, and making all the metadata of each MODx document available in a preloaded array, similar to how it works now.  But as already pointed out, with large sites, this does not scale well.  But neither does executing queries to search for individual resources in the database when building internal links, or menus, etc.  Further, knowledge of and access to the site structure is a very common and important requirement for a lot of add-on components, even beyond the menu builder.

My solution 6 months ago, when developing the initial rewrite of MODx using OO principles, was to compromise by allowing a site to be divided into contexts.  These contexts could preload a map of their section of the site, similar to how siteCache does now, but including all necessary metadata for building common menus and links within that section.  In this way, if you have 20,000 documents, you could organize those into logical sections and define a context for each, so that the memory load/speed of execution balance is maintained by only preloading the document metadata array for that context (perhaps plus any defined without context, or within a global context).

Let me know your thoughts on this as I work on completing the 1.0 rewrite; if you do have alternative suggestions or ideas in this regard, let's discuss them now.

And yes, we can definitely optimize the SQL in DropMenu to do all of the selection, but depending on table indexing and other SQL factors, it may be faster to just load the data less exclusively and loop through like it does.  And caching does address it very well if you don't mind refreshing your entire site cache whenever your site structure changes; this is truly the best way to deal with it.
Logged

Jason Coward
MODx Co-Founder
xPDO Founder
Principal @ Collabpad
work productively.
work intelligently.
work together.
The spirit of a warrior is not geared to indulging and complaining, nor is it geared to winning or losing. The spirit of the warrior is geared only to struggle, and every struggle is a warrior's last battle on earth. Thus the outcome matters very little to him. In his last battle on earth a warrior lets his spirit flow free and clear. And as he wages his battle, knowing that his intent is impeccable, a warrior laughs and laughs.
  — don Juan Matus
tomtom
Jr. Member
*
Posts: 8


« Reply #11 on: Jun 29, 2006, 02:13 PM »

Hi all,
sorry I am short in time at the moment.

Perhaps I will find some time for further testing on weekend
else I will do it next week.


One quick note:
Quote
we can definitely optimize the SQL in DropMenu to do all of the selection, but depending on table indexing and other SQL factors, it may be faster to just load the data less exclusively and loop through like it does

In my special case with 20000 pages loaded and without menu caching it was an mysql speed improvement of more than one second when excluding hidden elements by query!

Thanks a lot for all this input.

Regards,
Tom
Logged
tillda
Testers
*
Posts: 89



WWW
« Reply #12 on: Jul 09, 2006, 07:58 AM »

Friend, did we make some particular database optimizations? Have we have a closer look on how indexes are set up or is it just like how it was x years ago? The indexes should convert all waiting time to be logarithmically dependent on th number of db entries, so I would not expect mysql time increasing so quickly...
Logged

ZAP
Testers
*
Posts: 1,387



« Reply #13 on: Jul 27, 2006, 05:18 PM »

The other part of this discussion is how large numbers of documents affects the performance of the manager...

So it sounds as if modifying the DropMenu snippet to use a specific mySQL query that eliminates hidden docs instead of teh GetActiveChildren API makes a huge difference when creating those menus. Seems like there's no reason not to do that then. Perhaps there should be more parameters that can be passed to that function to allow for more specific queries?

I hear that the new WayFarer menu snippet will be awesome...
Logged

"Things are not what they appear to be; nor are they otherwise." - Buddha

"Well, gee, Buddha - that wasn't very helpful..." - ZAP

Useful MODx links: documentation | wiki  | forum guidelines  | bug reports  | info you should include with your post | commercial support options
Pages: [1]   Go Up
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP

Copyright © 2005-2008 MODxCMS, All rights reserved. Contact Us
Styles by ziworks.com

Powered by SMF 1.1.4 | SMF © 2005, Simple Machines LLC

Valid XHTML 1.0! Valid CSS!