IFDB Database Performance Issues

IFDB has been getting slower and slower lately, but intermittently.

IFDB is a classic “LAMP stack” application, running on Linux, Apache, MySQL (MariaDB, actually), and PHP.

I have fairly limited experience operating a LAMP stack app, but my rough understanding is that, when it gets intermittently slow, it’s usually the database that’s to blame. (And, indeed, it’s the DB process that’s running hot on the machine.)

Back in October, I turned on slow-query logging, and filed a dozen bugs on queries that could be potentially improved.

https://github.com/iftechfoundation/ifdb-suggestion-tracker/issues?q=is%3Aissue+is%3Aopen+label%3Aperformance

I have not, since then, had time to work on them. PRs would be welcome for any of these, or even just advice.

A couple of recurring issues:

  1. We have some views, but the views themselves don’t have indexes, so it seems like it’s just doing a full table scan of the underlying data. Is that how views work? Could I create an index for these views??
  2. Is there a clever way in MariaDB to do full-text search indexes? IFDB search is just always doing full table scans of the games table, which seems bad.
8 Likes

I’ve worked very little with MariaDB or MySQL, but I’m a database professional.

I doubt you can put an index on a view. You put an index on the corresponding colum(s) in the underlying tables. But picking which indexes to add requires a bit of database knowledge as well.

Full text indexes: Full-Text Index Overview - MariaDB Knowledge Base

3 Likes

If I could have a look at the table definitions, index definitions and view definitions, I could come up with some advice to address the slow queries.

Note that a full table scan isn’t always a bad choice. If the query optimizer expects to hit more than ~10% of the rows in a table, or the table is small, it will typically opt for a FTS even if there is an index. If the query optimizer is bad, it may opt to use the index just because it’s there, even though a FTS would be faster.

1 Like

You can find all the code for IFDB at https://github.com/iftechfoundation/ifdb

There’s instructions in the README to download a DB archive and generate a schema from that, so you can run the entire stack in a Docker environment.

For your convenience, I’ve attached a zip of the generated schema file.

ifdb-schema.zip (6.9 KB)

I also have worked very little with MariaDB/MySQL in the past, especially query analysis/optimization. Most of my experience is on PostgreSQL.

Normally I’d ask Postgres to EXPLAIN ANALYZE a query, and I’d know how to read it. But you’ll see in the slow-query bugs I filed https://github.com/iftechfoundation/ifdb-suggestion-tracker/issues?q=is%3Aissue+is%3Aopen+label%3Aperformance that most of them seem a little ambiguous about whether they’re actually doing full scans; I don’t have experience reading them, so it’s hard to tell.

For example, let’s look at this issue: https://github.com/iftechfoundation/ifdb-suggestion-tracker/issues/401

The query is very simple.

select
  numRatingsInAvg, numRatingsTotal, avgRating, numMemberReviews
from
  gameRatingsSandbox0
where
  gameid = 'ebqzkpo6d9s3wbwm';

gameRatingsSandbox0 is a view, based on the reviews table and the users table. It’s a very complicated select clause, with a bunch of nested subqueries. It’s defined on line 1540 of the SQL file I’ve attached to this post. The view definition is 150 lines long.
:dizzy_face:

The slow query log says Full_scan: yes which, I suppose, must mean what it says: it’s doing a full scan of the reviews table in order to compute this data. But the reviews table has a key on gameid, so it would be quite surprising if this little query is actually reading all of the data in the reviews table in order to compute its results. Even querying all of the data in the view seems like a surprising query plan for something like this.

Assuming it is doing a full scan of the reviews table, what would I do about it? Write application code (in PHP?) to hand-roll a materialized view? (Does MariaDB have a feature that could do that for me automatically?)

Or is MariaDB automatically caching the whole thing in memory, so a “full-scan” isn’t quite as bad as it looks? :thinking:

Could I at least add some sort of index or primary key to this view? (But how do you even add a primary key to a view…??)

1 Like

For what it’s worth (not much, I suspect), as well as general performance degradation, I’ve noticed IFDB seems to specifically go hnnnng when I’m messing around with it when the clock ticks over to midnight UTC. I guess it’s probably additional contention from some daily backup/maintenance job.

5 Likes

Oh that view is vicious. It looks to me like the query optimizer should have a chance to use the index on reviews.gameid.

This puzzles me: The select expression on line 1664 (hasReview) is not an aggregate function, nor a group by expression. It’s similar to the group by-expression on 1678-81, but not identical. This shouldn’t be allowed?

A view isn’t a table. The data isn’t ever stored in the view. Thus, it’s not possible to create an index on a view.

This looks like a great candidate for creating a materialized view. While MariaDB doesn’t support this, you can emulate it:

Create a table gameRatingsSandbox0MV with the same columns as the view. Make gameid the primary key.

Every night, do this:

truncate table gameRatingsSandbox0MV;

insert into gameRatingsSandbox0MV
select * from gameRatingsSandbox0;

Now, change the queries which read from gameRatingsSandbox0 to read from gameRatingsSandbox0MV instead.

If populating this table is terribly slow, and this leads to the site not working properly for an unacceptable amount of time, you could either:

(A) create a secondary table just like the MV table above, truncate and populate that table, then truncate the main table and copy the data from the secondary table to the main table in one go,

or (B) add a column ISCURRENT, which you set to 0 for the new data you add, then when you’re done, you do:

delete from gameRatingsSandbox0MV where iscurrent = 1;
update gameRatingsSandbox0MV set iscurrent = 1 where iscurrent = 0;
commit;

(This would require the primary key to be gameid + iscurrent)

Depending on how MariaDB handles transactions, this (alternative B) should give you a very brief time or no time at all that users can’t query this table and get proper results back.

4 Likes

You can drop the index on userid on reviews, as it doesn’t serve a purpose. It takes up space and makes inserts and updates slower. All selects that can use this index, can use the index on userid + gameid instead.

The query in Slow Query: Cross recommendations · Issue #410 · iftechfoundation/ifdb-suggestion-tracker · GitHub may benefit from an index on userid + rating and an index on gameid + rating.

3 Likes

And of course, if you add an index on gameid + rating, you can drop the index on gameid.

Unfortunately, MariaDB doesn’t offer any out-of-the-box mechanism for materializing views. https://jira.mariadb.org/browse/MDEV-12163

Here’s an article on how to do materialized views in MySQL using stored procedures. Materialized Views with MySQL | FromDual My general attitude toward stored procedures is, uh, reticence :grimacing: but this could be the right time for it.

Probably the most important view to materialize is that giant gamesRatingsSandbox0 view, which computes the “sort by reviews” (starsort) score for all games. It’s used all over the place.

Still not sure I’m likely to work on this any time soon, but we’ll see if I can find a few minutes for this.

1 Like

I’m thinking that getting rid of personalized recommendations on the front page would be useful (I accidentally posted this on the wrong thread earlier). The front page gets loaded more than anything else, and I’m not convinced the recommendations are useful.

I think it would be more helpful just to make global randomized recommendations that are the same for all visitors. Most people have played <10% of the top 100 games, so just storing the top 100 games once a week and recommending them might work better.

Or replacing them with a feed of the three most recent competitions.

I don’t know if that would give significant time savings.

4 Likes

Or put personalized recommendations on a separate page so it’s only loaded if people want it.

4 Likes

Mariadbd has started running at 100% CPU all the time, or at least every time I look. Looking at my mail logs, this started happening Dec 13th. It’s been intermittent – I see alerts Dec 13, 14, 18, 19, and 20 (today).

This looks like a big shift, not an incremental increase over time. Either something changed in the code, or the site started getting a lot more hits from somewhere. It may be worth checking the logs to distinguish those cases.

5 Likes

Traffic is up, but not wildly so.

image

1 Like

Is this something that a small cache (memcache, Redis) might help with? (I’ve not looked at the ways the queries are structured, just spitballing.)

2 Likes

I think the usual enterprise answer is to use something like Elasticsearch but that might be too big a hammer to reach for.

2 Likes

Yes, I did say MariaDB doesn’t support Materialized views, but you can emulate it. It’s just a table to cache the data that the view shows.

I have no idea why you wouldn’t want to use stored procedures. One of the advantages of stored procedures compared to other programs, is that they can read and write data without causing any network traffic or overhead on the server.

2 Likes

Don’t be afraid for stored procedures. They are a valuable tool.

1 Like

The outgoing requests to IFWiki on game pages can also be reduced, but I don’t know what percentage that is of the overall load time: Avoid querying IFWiki whenever a game page is loaded · Issue #273 · iftechfoundation/ifdb-suggestion-tracker · GitHub

(BTW I think the issue should be tagged with “performance”)

1 Like

It wouldn’t be silly to push a change turning off a specific feature (recommendations, ifwiki requests, etc) and watch what happens to the mariadb process.

1 Like

I think I just find stored procedures hard to read, debug, and understand.

They’re like invisible goto statements. You can write application code that generates SQL statements to execute, but under the hood there’s a stored procedure doing magic that you may not be familiar with.

Stored procedures make it difficult to answer questions like these:

  • How did this table get updated?
  • Why didn’t this table get updated by stored procedure X?
  • This table was updated by a stored procedure incorrectly. What exactly did it do?

Still, clearly, stored procedures are the best way to implement materialized views, and we need a materialized view, so, I’ll do it. I’ll see if I can make some time for this in the next month or so…

4 Likes