I tried to find the most positive reviewers on IFDB

bkirwi · October 8, 2024, 4:35am

Neat! I don’t know much about the database schema etc., but if you’re hitting some specific issue I may be able to help.

mathbrush · October 8, 2024, 4:46am

I’m using the ifdb backup from the ifarchive. It says what version of MariaDB it was made with, and I just downloaded that version.

We have a table ‘reviews’ that contains columns userid, rating (a number from 1 to 5 or NULL), and gameid.

We have a table ‘games’ that contains columns id and title.

We have a table ‘users’ that contains columns name and id.

There are other columns, of course.

What I’d like to do is to take the average of each game, then subtract the it from the player’s score to find the deviation from standard, and then add that up to get the player’s net overpositivity or overnegativity.

I can SELECT AVG(reviews.rating) OVER (partition by reviews.gameid) as avgrating to get the average rating for a game added to each review row. I’d then like to find AVG(reviews.rating-avgrating) grouped by userid.

The trouble is, I don’t know how to create a new column from one table and use it in the same command set. I could create a new table that’s the same as the old one but with one more column, or just add one more column to reviews, and then do the sort, but I don’t know how to do that either. I’ve done a lot of messing around with SQL but creating new tables isn’t one thing I’ve done.

dfabulich · October 8, 2024, 5:08am

If you want to fool around with it yourself, the quickest way to do it is with Github Codespaces.

Once you’ve set up the Codespace, you can use the phpMyAdmin GUI to query the database, or ./query-docker.sh to use the MariaDB CLI.

dfabulich · October 8, 2024, 5:30am

re: creating tables, when I want to create a table, I generally like to look at examples.

After you ./prepare-dev-environment.sh, you can open the initdb/00-init.sql file to see all the tables we create and all the data we’re inserting. (Except a few extras are in sql/patch-schema.sql, restoring tables containing private data that we scrubbed before uploading to the IF Archive.)

Note that 00-init.sql file is 90MB long because it contains all the data as well as the table creation lines, so, when I just want to skim the table definitions, I like to do this on the Terminal:

grep -v "^INSERT INTO" initdb/00-init.sql > schema.sql

That gives me a short and sweet schema.sql file, easily skimmed, with a bunch of examples of creating tables.

Once you create a table to your liking, you can populate it like this:

INSERT INTO briantable SELECT blah FROM blahblah;

I’d also like to call your attention to the gameRatingsSandbox0_mv table. It’s a materialized view (“mv”), a cached copy of the data in the gameRatingsSandbox0 view, which computes a bunch of game-ratings statistics for every game.

MariaDB [ifdb]> select * from gameRatingsSandbox0_mv where gameid='aearuuxv83plclpl' \G
*************************** 1. row ***************************
          gameid: aearuuxv83plclpl
          rated1: 4
          rated2: 0
          rated3: 6
          rated4: 34
          rated5: 201
 numRatingsInAvg: 245
 numRatingsTotal: 245
numMemberReviews: 23
       avgRating: 4.7469
    stdDevRating: 0.658725436227564
        starsort: 4.636534249345067
1 row in set (0.010 sec)

You may find the starsort column more useful than the Bayesian estimator you’ve been using.

I think you might especially benefit from reviewing the definition of the gameRatingsSandbox0 view in sql/patch-schema.sql. It’s a very complicated SQL query, with three layers of subselects. (My advice to understand it: run it yourself, inside out. First run the innermost subselect, perhaps adding an extra WHERE gameid='whatever' so you’re just looking at one game. Then run the next layer of subselect, to see those results, then the next layer, and then the next layer.)

I mention it because you might be able to get away without using temporary tables if you use subselects instead. (But temporary tables might be simpler anyway.)