Few technical questions

  • Currently, we are using discourse but have some usability problems with it. So we are on the lookout for new forum software. Woltlab looks and feels so good, so here are a few questions/concerns:

    1) Do you use or plan to use infinity loading in the main parts of your software?

    2) In publicly available repositories, it seems you are using prepared statements, do you use those in forum software as well? Recently there have been some papers claiming that those are not optimal solutions in large volume applications, like this one https://ieeexplore.ieee.org/document/9854303

    If you are using them, are there plans to switch to better alternatives performance wise?

    3) How many users have the largest community that is using the woltlab forum?

  • Hello,

    1) Do you use or plan to use infinity loading in the main parts of your software?

    The software is in constant development and we’ve introduced live updates for notifications and threads in the last release. We’re planing to extend this even further to create a more instantaneous page responsiveness, this includes the ability to load content in-place. How this is done is not fixed in stone though and depends on whichever method delivers the best user experience.

    2) In publicly available repositories, it seems you are using prepared statements, do you use those in forum software as well? Recently there have been some papers claiming that those are not optimal solutions in large volume applications, like this one https://ieeexplore.ieee.org/document/9854303

    If you are using them, are there plans to switch to better alternatives performance wise?

    We make use of real prepared statements in all parts of our software. The security benefit of using prepared statements far outweighs the performance drawback of the extra roundtrips to the database, effectively making an application immune to SQL injection attacks. For repeated queries, such as inserts or updates using multiple datasets prepared statements are faster and more efficient than “traditional” queries.

    Furthermore, focusing on just database queries alone just doesn’t work outside of lab environments. In general, the entire communication with the database takes up less than 10% of the entire request. The heavy-lifting of a request happens in PHP, which means that even if we improve the latency to the database by falling back to “traditional queries”, the performance uplift would be negligible.

    3) How many users have the largest community that is using the woltlab forum?

    That’s not really a useful metric, because neither the number of registered users nor the number of concurrent users tell a story. Unless a software has some severe bottlenecks, you can simply “throw more hardware at the problem” and be fine. Generally speaking, our software is optimized to run fast on limited web hosting accounts, but also scale up to communities with tens of millions of posts.

    Developing such a piece of software for over two decades teaches you a thing or two about efficiency and what works when scaled up and what doesn’t. For example, emails are not sent synchronously and are instead pushed to a background queue for asynchronous delivery. This makes the software both highly fault tolerant towards transient network or email server issues, but also solves the issue of dealing with large email volumes.

  • Thank you for a fast response. Live updates are fine, but infinity loading is actually the reason for the switch. Infinity loading feels so unorganised..

    For repeated queries, such as inserts or updates using multiple datasets prepared statements are faster and more efficient than “traditional” queries

    This is actually wrong according to the paper. They show that traditional multi sql queries are way faster for 1000 operations/datasets, quote: "Standard multi-SQL method is faster than prepared statements by 83.19% for INSERT, 96.19% for SELECT, 76.87% for UPDATE and 85.87% for DELETE SQL queries". This does not sound like "negligible".

    I have been in "forum business" a long time and have never heard of woltlab before, hence my concerns. Hearing that you are in the space for 20 years is reassuring, but should then definitely work a bit on the marketing :)

  • This does not sound like "negligible".

    Relative numbers neither tell the whole story nor do they provide enough context. When executing the query you have essentially two metrics:

    1. How long does it take the database to evaluate and execute the query?
    2. How long does it take until the result arrives at the callee?

    The use of prepared statements have an impact on (2.), because the extra roundtrip adds latency. However, it does not impact the execution speed of (1.), because this scales only with the work that the database has to do.

    For the sake of demonstration I have tried out both cases on my local development machine using an in-dev version of our software.

    With prepared statements:

    Code
    Execution time: 0.0529s (79.017% PHP, 20.983% SQL) | SQL queries: 20 | Memory-Usage: 4.36 MB

    Without prepared statements:

    Code
    Execution time: 0.0528s (72.917% PHP, 27.083% SQL) | SQL queries: 20 | Memory-Usage: 4.31 MB

    (I can enable the emulation of prepared statements in the database driver, which will effectively perform the substitution on the client side and the database server will only ever see a “traditional query”.)

    I was using nginx 1.23, PHP 8.0 and MySQL 8.0 for the test. The numbers above represent the mean average of 10 runs (recorded after 10 extra runs to warm up all caches).

    It nicely illustrates the problem of the statistical error caused by the run-to-run variance when dealing with sub milisecond differences. Both requests above took a total of ~53ms to complete and throughout all runs I can observe either no difference or sub 1% differences in run time.

    There is and will always be a performance benefit of not using prepared statement. But in real world scenarios the difference will disappear in the statistical error, because it is so small.

  • That was too technical for me :) maybe the paper is wrong considering discrepancies between yours and their results.

    What I can say is that it seems that the forum is super fast. We will be in touch for a more detailed consultation/test if we decide to go with woltlab

  • I’m sorry, it looks like I got carried away ;)

    My point was that the catch line of the paper is an oversimplification. Let’s say that you want to buy some milk from your local store:

    1. Your go to the store and declare that you would like to make a purchase and pay with your credit card.
    2. The store acknowledges this and you head back home.
    3. You grab your credit card and the list of things you want to purchase and head to the store again.
    4. The store processes the transaction and once the payment has been made you can leave the store with your purchased goods.

    In a “traditional” query the steps 1. to 3. are one step only and this is where the performance advantage comes from.

    However, and this is why such examples always suck, unlike in the real world, 1. to 3. are extremely fast compared to step 4.

    Now imagine that step 4 takes 10 minutes and step 1. to 3. take only 1 minute in total. By combining step 1. through 3. into a single step you can then shave off 30 seconds. You are now facing 10:30 minutes versus 11 minutes for your purchase. But what if step 4 takes 30 minutes, but steps 1. to 3. will still only take 1 minute, now you have 30:30 mins vs 31 minutes.

    My point is that for queries against a meaningful amount of data the actual time it takes to communicate with the database server is dwarfed by the time it takes to process the query. The astronomical numbers in the paper would be true for a shopping trip to buy a single bottle of milk. However, the usual trip to the store will involve much larger purchases and therefore the time inside the store will increase, but the time to travel to and from the store will remain virtually constant.

  • The astronomical numbers in the paper would be true for a shopping trip to buy a single bottle of milk

    The paper does not talk about buying a single bottle of milk since that is obvious because of an extra round trip in prepared statement, rather it is talking about buying 1000 items, meaning 1000 trips to the server. Astronomical difference is achieved because no matter how optimised prepared statement execution phase is, 1000 round trips are very slow. So this might, as noted in the paper, be more obvious for example in the API use cases than in regular app use cases. I think you missed the keyword "multi" in multi-sql (example).

  • […] because no matter how optimised prepared statement execution phase is, 1000 round trips are very slow.

    First things first, they are not that slow. Second, while I agree that they add latency, their impact are far from being as significant as implied by the paper. Even if they add +100% to the travel time to the store, if you still spend 99% of the total time inside the store, it will still have no meaningful impact on your entire journey.

    The bottom line is that the use of prepared statements does not add a meaningful latency to the response times of our software. The software is blazingly fast with and without prepared statements as proven in a previous message. And at the end of the day this is what matters, both to you and the user.

  • And at the end of the day this is what matters, both to you and the user

    I agree :)

    But I also don't agree in a sense that you are talking about one user, when the effect is visible when talking order of magnitude larger. For example, when considering 100 simultaneous user connections or 10000 users... It clearly affects the server resources by at least double, other example whose results are the same as in the paper (100%, or double): https://orangematter.solarwinds.com/2014/11/19/ana…nt-performance/

    Granted, this is an API example, but a high volume forum can be viewed as that, I think.

  • Granted, this is a API example, but a high volume forum can be viewed as that, I think.

    Yes, but this is also an incredibly simple use case, it doesn’t get any simpler than that.

    That’s like saying that it’s faster and more efficient to walk two steps instead of getting into a car, starting the engine, driving the distance of two steps, stopping the engine and getting out of the car. That is both totally true and irrelevant at the same time.

    You’ve suggested using something else than prepared statements for performance reasons in the context of our software. And in the context of our software the use of prepared statements have no meaningful impact, because the time is spent doing actual work in other parts of the request. That’s the problem with micro optimizations, once you’re past a certain point you diminishing returns are a real thing.

    The huge advantage of prepared statements are that they make your application invulnerable to SQL injections. Without them you would have to resort to manual escaping and then it’s irrelevant if you get it right 999 out of 1,000 times. The one time you miss the proper escaping is the one that will ruin your day. And in our case the extra layer of security gained from prepared statements is worth the <1% performance penalty from using them in our software.

Participate now!

Don’t have an account yet? Register yourself now and be a part of our community!