Large community usage?

  • I am curious on how well this software performs on large communities. Particularly I am looking for examples using 4.x with over a 1,000 or more people online at any given moment.

    • Official Post

    Hello @PhoenixDragon,


    The entire software is designed to run smoothly on low-budget shared hosting, an environment where even the smallest performance gotcha will horribly backfire. Additionally we pay attention to design features to scale nicely, you won't find a single feature that will suddenly overwhelm your system just because it has to deal with large amounts of data.


    Large and very active forums are:
    http://zeldauniverse.net/forums/
    https://forum.fcbayern.com
    https://www.bisaboard.de
    http://www.dogforum.de

  • I'm not concerned about shared hosting as my current setup requires a dedicated server. Although I am concerned about load and do need a software that will be as responsive, if not more so, with over a continues 1,000+ users online 24/7.

    • Official Post

    I'm not concerned about shared hosting as my current setup requires a dedicated server.

    It is not that much about shared hosting, but about designing a software that runs well with even with low amount of resources. The individual resource usage per request is pretty low, which is extremely important for sites with a high concurrent visitor count as it enables you to serve more users with the same hardware.


    I would be happy to provide you with more information, if you could tell me a bit about your current setup. Especially the number of average concurrent and median peak PHP requests would be interesting, as it tells an in-depth story of what is actually happening. A high "users online"-count is a nice looking number, but unfortunately does not expose enough details such as time frame (e.g. 1.000 users are counted across the last X minutes) and the behavior (e.g. average of 3 requests per minute).

  • It is not that much about shared hosting, but about designing a software that runs well with even with low amount of resources. The individual resource usage per request is pretty low, which is extremely important for sites with a high concurrent visitor count as it enables you to serve more users with the same hardware.

    Good. I understood your point. I'm just curious to the extent of it.


    I would be happy to provide you with more information, if you could tell me a bit about your current setup. Especially the number of average concurrent and median peak PHP requests would be interesting, as it tells an in-depth story of what is actually happening. A high "users online"-count is a nice looking number, but unfortunately does not expose enough details such as time frame (e.g. 1.000 users are counted across the last X minutes) and the behavior (e.g. average of 3 requests per minute).


    Assume I've done the math already and 1,000+ visitors at every given moment of every given second is accurate. With the minimum request being 1,000+ at any given moment. Assume there is also 56 million post to query which vary from 1 page to 300 page threads with the average being 30+ pages. Assume an average of 3,000 new post per day and assume the record high online was 190,000 people online.



    A simple yes or no will suffice, but the question is, do you feel confident your software could manage such a load without locking up? If you cannot look past my numbers, assume the question is generic in that I wish to know if you feel your software could handle any size community? If that isn't good enough, assume the largest community that you know of using your software was to be 4x larger. Do you feel your software could manage it without issue?

    • Official Post

    Assume I've done the math already and 1,000+ visitors at every given moment of every given second is accurate. With the minimum request being 1,000+ at any given moment.

    Thank you for clarifying this, I was simply asking because this term is anything but a fixed metric and especially with vBulletin quite a lot of people tampered with the time period for their user online statistics.


    Assume there is also 56 million post to query which vary from 1 page to 300 page threads with the average being 30+ pages.

    This is a non-issue, the number of posts doesn't impact the performance.


    Regarding your average number of pages, there is an active German thread on our forums which is at 536 pages at the time of writing, you can try it out yourself. The FC Bayern forum (link above) is another good example as their threads usually average at about 40 pages. We solve the issue with very long running threads by having two separate queries that fetch the posts: The first only selects the ids and the second fetches all the data using the result from the first query. This ensures a constantly low query runtime no matter how large your thread is, I can go into detail if you're interested.


    A simple yes or no will suffice, but the question is, do you feel confident your software could manage such a load without locking up?

    In all honesty, I can't even recall ever stumbling across a forum that huge (with the notable exception of gaiaonline.com), so I can only make assumptions at this point. I mean forums with 5 or 10 million posts aren't that uncommon today, but 56 million is a whole different story. The number of posts are a complete non-issue as everything is designed to run with larger amounts of data and there are many forums running with Burning Board having more than a million posts - without ever experiencing a degraded performance. Every relevant query runs against database indices, preventing any deal-breaking full table scans or similar.


    Regarding the number of concurrent requests: There are no deadlocks or other issues when it comes to a high request count, so technically speaking it can scale indefinitely without every locking up. We offer both a memcached integration (as a replacement for the file-based cache) and elasticsearch (replacement for MySQL FULLTEXT search, which does anything but scale), to remove any potential bottlenecks that may show up when working with large amounts of data.


    In general there are a few key principles we've followed that ensure proper scaling:

    • Avoid table locks, InnoDB's row-level locking helps a lot solving this.
    • Queries make use of indices to avoid full table scans.
    • Caching of complex data that is more or less static but requires some time to compute, for example the forum list is using cached data to show the structure and the last post.
    • Denormalization of data when necessary, e.g. the list of the first three users that liked a post are cached for each post in database to avoid running an extra query per post.
    • Queries that would return large results are performed using two separated queries (for example posts on a thread page) to prevent MySQL of having to work with too many data. This design allows a fast lookup for the relevant object ids and we can then fetch all additional data by using exactly these ids, making full use of table indices and primary keys.

    To sum it all up: Based on our experience with actively visited forums having several million posts we can say that there is absolutely no evidence of deadlocks or slowdowns. zeldauniverse.net's forums contains 5.6 million posts and runs perfectly fine and I'm confident that this is true even with 10 times the data. But as I said earlier, this is only an assumption/educated guess based on our combined experience building a forum software for almost 15 years so far.

  • We solve the issue with very long running threads by having two separate queries that fetch the posts:

    How many queries do your software have and on average the time to complete that query for 1 individual?


    . We offer both a memcached integration (as a replacement for the file-based cache) and elasticsearch (replacement for MySQL FULLTEXT search, which does anything but scale), to remove any potential bottlenecks that may show up when working with large amounts of data.

    Both of these are standard without relying on a 3rd party addon?

    • Official Post

    How many queries do your software have and on average the time to complete that query for 1 individual?

    It requires 2 separate queries:

    • Read the post ids for current page of the thread.
    • Read all necessary data (including users via LEFT JOIN) for the post ids previously fetched with the query above.

    Both queries use only indices (including foreign keys) to retrieve the data, resulting in a static index lookup by MySQL.


    Both of these are standard without relying on a 3rd party addon?

    Memcached is built into the core (the next major introduces Redis as an additional alternative) and elasticsearch is available as a paid plugin offered and maintained by us.

  • This is the 2nd time I've been informed that a feature was pending in the next major release. Your next major release is 4.2 is this correct?

  • That is correct, we've outlined the major changes in this post: Outlook on the future of WoltLab Community Framework 2.2

    Unfortunately, that thread hasn't outlined any of the features which I've been told will be in the next release. However I did notice this past dated in January 2016


    Outlook on the future of WoltLab Community Framework 2.2



    We're still heavily working on the planned features and until they have sufficiently stabilized in terms of feature sets and functionality we would like to wait before announcing more details. All though we want to keep everyone posted about our progress, there is only little value announcing features that are later dropped or drastically changed - a false impression we would like to avoid.

    Since it has been 6 months, can you please further elaborate? Will the features you've previously mentioned (CDN support and Redis) be included in 4.2 or are you speaking of a yet further release?

    • Official Post

    Will the features you've previously mentioned (CDN support and Redis) be included in 4.2 or are you speaking of a yet further release?

    Redis has already been implemented, but the CDN support is still on our list. There has been no work done towards this task yet, mainly because there are other components that need to be stabilized first. I'm afraid this is everything I can tell you at this point :/

  • Redis has already been implemented, but the CDN support is still on our list. There has been no work done towards this task yet, mainly because there are other components that need to be stabilized first. I'm afraid this is everything I can tell you at this point :/

    Thank you

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!