How Metis Optimized Queries Executed by GORM - Part 2

GORM is an ORM for GO. It can connect to PostgreSQL and other SQL engines. It supports transactions, relations, eager loading, lazy loading, functions, and other common operations. The previous part covered some of the queries from Part 1 of this series. This time we are going to implement more sophisticated scenarios to see how GORM can deal with them. Let’s go.

Given two people, list what movies they appeared in together

Let’s find common movies for two people. That’s the application code doing that:

var first []entities.TitlePrincipal
db.
	Model(&entities.TitlePrincipal{}).
	Where("nconst = ?", actor1).
	Find(&first)


var second []entities.TitlePrincipal
db.
	Model(&entities.TitlePrincipal{}).
	Where("nconst = ?", actor2).
	Find(&second)


var firstTconsts []string


for _, item := range first {
	firstTconsts = append(firstTconsts, item.Tconst)
}


var secondTconsts []string


for _, item := range second {
	secondTconsts = append(secondTconsts, item.Tconst)
}


var result []entities.TitleBasic
db.
	Model(&entities.TitleBasic{}).
	Where("tconst IN ?", firstTconsts).
	Where("tconst IN ?", secondTconsts).
	Find(&result)


return result

We get titles for the first person, then for the second one, and finally we get titles matching both. The code generates the following queries:

SELECT
  *
FROM
  imdb.title_principals
WHERE
  nconst = $1 /*traceparent='00-5d75853696868445b9550836b39af973-10c475875b500e28-01'*/

And then the query for getting all the titles

SELECT
  *
FROM
  imdb.title_basics
WHERE
  tconst IN (
    $1,
    $2,
    ...
    $12,
    $13
  )
  AND tconst IN (
    $14,
    $15,
    ...
    $544,
    $545
  ) /*traceparent='00-5d75853696868445b9550836b39af973-10c475875b500e28-01'*/

Metis indicates a missing index:

‍

We have two issues here. First, we generate a very big query. Not all ORMs can deal with that. Let’s fix that first:

var first []entities.TitlePrincipal
db.
	Model(&entities.TitlePrincipal{}).
	Where("nconst = ?", actor1).
	Find(&first)


var second []entities.TitlePrincipal
db.
	Model(&entities.TitlePrincipal{}).
	Where("nconst = ?", actor2).
	Find(&second)


knownTitles := map[string]struct{}{}


for _, item := range first {
	knownTitles[item.Tconst] = struct{}{}
}


var finalTconsts []string


for _, item := range second {
	if _, exists := knownTitles[item.Tconst]; exists {
		finalTconsts = append(finalTconsts, item.Tconst)
	}
}


var result []entities.TitleBasic
db.
	Model(&entities.TitleBasic{}).
	Where("tconst IN ?", finalTconsts).
	Find(&result)


return result

We get this query now:

SELECT
  *
FROM
  imdb.title_basics
WHERE
  tconst IN ($1) /*traceparent='00-5afead8f3e591a8d55a7fded030cc334-1bca0a48a3a10b1f-01'*/

These are the insights:

‍

We can now further improve the query performance by adding an index for title_principals. This is the code to do that:

CREATE INDEX IF NOT EXISTS title_principals_nconst_idx ON imdb.title_principals(nconst) INCLUDE (tconst)

Here are the insights after adding the index:

‍

We can see all is good now. However, we can get the same result with just one SQL query:

var result []entities.TitleBasic
db.
	Model(&entities.TitleBasic{}).
	Joins("JOIN imdb.title_principals AS TP1 ON TP1.tconst = title_basics.tconst").
	Joins("JOIN imdb.title_principals AS TP2 ON TP2.tconst = title_basics.tconst").
	Where("TP1.nconst = ?", actor1).
	Where("TP2.nconst = ?", actor2).
	Find(&result)
return result

This is the generated query:

SELECT
  title_basics.tconst,
  title_basics.titletype,
  title_basics.primarytitle,
  title_basics.originaltitle,
  title_basics.isadult,
  title_basics.startyear,
  title_basics.endyear,
  title_basics.runtimeminutes,
  title_basics.genres
FROM
  imdb.title_basics
  JOIN imdb.title_principals AS TP1 ON TP1.tconst = title_basics.tconst
  JOIN imdb.title_principals AS TP2 ON TP2.tconst = title_basics.tconst
WHERE
  TP1.nconst = $1
  AND TP2.nconst = $2 /*traceparent='00-4c9aa715d7d7535dfadcd63e6840ae11-d0f3731f8b6fac30-01'*/

And we can now see these insights:

This is really good. The codebase is much easier to follow now.

List all of the cast and crew in a given movie

Let’s now find all the people involved in the movie. Here is the first solution. We can just send the raw query:

var result []entities.NameBasic

db.Raw("SELECT DISTINCT NB.*\n"+
	"FROM imdb.title_basics AS TB\n"+
	"LEFT JOIN imdb.title_principals AS TP ON TP.tconst = TB.tconst\n"+
	"LEFT JOIN imdb.title_crew AS TC ON TC.tconst = TB.tconst\n"+
	"LEFT JOIN imdb.name_basics AS NB ON \n"+
	"       NB.nconst = TP.nconst \n"+
	"       OR TC.directors = NB.nconst\n"+
	"       OR TC.directors LIKE NB.nconst || ',%'::text\n"+
	"       OR TC.directors LIKE '%,'::text || NB.nconst || ',%'::text\n"+
	"       OR TC.directors LIKE '%,'::text || NB.nconst\n"+
	"       OR TC.writers = NB.nconst\n"+
	"       OR TC.writers LIKE NB.nconst || ',%'::text\n"+
	"       OR TC.writers LIKE '%,'::text || NB.nconst || ',%'::text\n"+
	"       OR TC.writers LIKE '%,'::text || NB.nconst\n"+
	"WHERE TB.tconst = ?", tconst).
	Scan(&result)


return result

This is the query:

SELECT DISTINCT
  NB.*
FROM
  imdb.title_basics AS TB
  LEFT JOIN imdb.title_principals AS TP ON TP.tconst = TB.tconst
  LEFT JOIN imdb.title_crew AS TC ON TC.tconst = TB.tconst
  LEFT JOIN imdb.name_basics AS NB ON NB.nconst = TP.nconst
  OR TC.directors = NB.nconst
  OR TC.directors LIKE NB.nconst || ',%'::text
  OR TC.directors LIKE '%,'::text || NB.nconst || ',%'::text
  OR TC.directors LIKE '%,'::text || NB.nconst
  OR TC.writers = NB.nconst
  OR TC.writers LIKE NB.nconst || ',%'::text
  OR TC.writers LIKE '%,'::text || NB.nconst || ',%'::text
  OR TC.writers LIKE '%,'::text || NB.nconst
WHERE
  TB.tconst = $1 /*traceparent='00-cafd685027549f951151d2748b7bfeec-c6466989569b4703-01'*/

We get the following insights:

‍

‍

This runs in nearly 100 seconds. We can split the query into UNIONs to see if it’s faster:

var result []entities.NameBasic


db.Raw("SELECT DISTINCT NB.*\n"+
	"   FROM imdb.title_principals AS TP\n"+
	"   JOIN (\n"+
	"   SELECT tconst, directors, writers\n"+
	"   FROM imdb.title_crew\n"+
	"   WHERE tconst = @tconst\n"+
	"   ) AS TC ON TC.tconst = TP.tconst\n"+
	"   LEFT JOIN imdb.name_basics AS NB ON NB.nconst = TP.nconst\n"+
	"UNION\n"+
	"   SELECT DISTINCT NB.*\n"+
	"   FROM imdb.title_principals AS TP\n"+
	"   JOIN (\n"+
	"   SELECT tconst, directors, writers\n"+
	"   FROM imdb.title_crew\n"+
	"   WHERE tconst = @tconst\n"+
	"   ) AS TC ON TC.tconst = TP.tconst\n"+
	"   LEFT JOIN imdb.name_basics AS NB ON TC.directors LIKE NB.nconst || ',%'::text\n"+
	"UNION\n"+
	"   SELECT DISTINCT NB.*\n"+
	"   FROM imdb.title_principals AS TP\n"+
	"   JOIN (\n"+
	"   SELECT tconst, directors, writers\n"+
	"   FROM imdb.title_crew\n"+
	"   WHERE tconst = @tconst\n"+
	"   ) AS TC ON TC.tconst = TP.tconst\n"+
	"   LEFT JOIN imdb.name_basics AS NB ON TC.directors LIKE '%,'::text || NB.nconst || ',%'::text\n"+
	"UNION\n"+
	"   SELECT DISTINCT NB.*\n"+
	"   FROM imdb.title_principals AS TP\n"+
	"   JOIN (\n"+
	"   SELECT tconst, directors, writers\n"+
	"   FROM imdb.title_crew\n"+
	"   WHERE tconst = @tconst\n"+
	"   ) AS TC ON TC.tconst = TP.tconst\n"+
	"   LEFT JOIN imdb.name_basics AS NB ON TC.directors LIKE '%,'::text || NB.nconst\n"+
	"UNION\n"+
	"   SELECT DISTINCT NB.*\n"+
	"   FROM imdb.title_principals AS TP\n"+
	"   JOIN (\n"+
	"   SELECT tconst, directors, writers\n"+
	"   FROM imdb.title_crew\n"+
	"   WHERE tconst = @tconst\n"+
	"   ) AS TC ON TC.tconst = TP.tconst\n"+
	"   LEFT JOIN imdb.name_basics AS NB ON TC.writers = NB.nconst\n"+
	"UNION\n"+
	"   SELECT DISTINCT NB.*\n"+
	"   FROM imdb.title_principals AS TP\n"+
	"   JOIN (\n"+
	"   SELECT tconst, directors, writers\n"+
	"   FROM imdb.title_crew\n"+
	"   WHERE tconst = @tconst\n"+
	"   ) AS TC ON TC.tconst = TP.tconst\n"+
	"   LEFT JOIN imdb.name_basics AS NB ON TC.writers LIKE NB.nconst || ',%'::text\n"+
	"UNION\n"+
	"   SELECT DISTINCT NB.*\n"+
	"   FROM imdb.title_principals AS TP\n"+
	"   JOIN (\n"+
	"   SELECT tconst, directors, writers\n"+
	"   FROM imdb.title_crew\n"+
	"   WHERE tconst = @tconst\n"+
	"   ) AS TC ON TC.tconst = TP.tconst\n"+
	"   LEFT JOIN imdb.name_basics AS NB ON TC.writers LIKE '%,'::text || NB.nconst || ',%'::text\n"+
	"UNION\n"+
	"   SELECT DISTINCT NB.*\n"+
	"   FROM imdb.title_principals AS TP\n"+
	"   JOIN (\n"+
	"   SELECT tconst, directors, writers\n"+
	"   FROM imdb.title_crew\n"+
	"   WHERE tconst = @tconst\n"+
	"   ) AS TC ON TC.tconst = TP.tconst\n"+
	"   LEFT JOIN imdb.name_basics AS NB ON TC.writers LIKE '%,'::text || NB.nconst", sql.Named("tconst", tconst)).
	Scan(&result)


return result

Let’s see the insights:

‍

This is a major improvement, however, queries are still slow. Let’s try implementing the same in the application code:

knownNames := map[string]struct{}{}


var crewViaTitlePrincipalsEntities []entities.TitlePrincipal
db.
	Model(&entities.TitlePrincipal{}).
	Where("tconst = ?", tconst).
	Find(&crewViaTitlePrincipalsEntities)


var allMatchingNames []string


for _, item := range crewViaTitlePrincipalsEntities {
	if _, exists := knownNames[item.Nconst]; !exists {
		knownNames[item.Nconst] = struct{}{}
		allMatchingNames = append(allMatchingNames, item.Nconst)
	}
}


var crewViaTitleCrewEntities []entities.TitleCrew
db.
	Model(&entities.TitleCrew{}).
	Where("tconst = ?", tconst).
	Find(&crewViaTitleCrewEntities)


for _, item := range crewViaTitleCrewEntities {
	for _, director := range strings.Split(item.Directors, ",") {
		if _, exists := knownNames[director]; !exists {
			knownNames[director] = struct{}{}
			allMatchingNames = append(allMatchingNames, director)
		}
	}


	for _, writer := range strings.Split(item.Writers, ",") {
		if _, exists := knownNames[writer]; !exists {
			allMatchingNames = append(allMatchingNames, writer)
		}
	}
}


var result []entities.NameBasic
db.
	Model(&entities.NameBasic{}).
	Where("nconst IN ?", allMatchingNames).
	Find(&result)


return result

We first take the crew from the title_principals table. Next, we extract the crew from the title_crew. Finally, we split directors and writers, and we calculate the intersection of both datasets to get the names.

This generates the following trace in Metis:

‍

It’s all green! We can see that we have three queries sent, but they are blazingly fast because of indexes. This time it was much better to do processing in the application instead of on the SQL level.

However, we can do even better with this SQL:

var result []entities.NameBasic

db.Raw("WITH RECURSIVE numbers AS (\n"+
	"   SELECT 1 AS number\n"+
	"   UNION ALL\n"+
	"   SELECT number + 1 AS number FROM numbers WHERE number < 1500\n"+
	"),\n"+
	"split_associations AS (\n"+
	"     SELECT SPLIT_PART(TC.directors, ',', N.number) AS nconst\n"+
	"     FROM imdb.title_crew AS TC\n"+
	"     CROSS JOIN numbers AS N\n"+
	"     WHERE tconst = @tconst AND directors IS NOT NULL AND CHAR_LENGTH(directors) - CHAR_LENGTH(REPLACE(directors, ',', '')) + 1 >= N.number\n"+
	"   UNION\n"+
	"     SELECT SPLIT_PART(TC.writers, ',', N.number) AS nconst\n"+
	"     FROM imdb.title_crew AS TC\n"+
	"     CROSS JOIN numbers AS N\n"+
	"     WHERE tconst = @tconst AND writers IS NOT NULL AND CHAR_LENGTH(writers) - CHAR_LENGTH(REPLACE(writers, ',', '')) + 1 >= N.number\n"+
	"),\n"+
	"all_associations AS (\n"+
	"   SELECT SA.nconst\n"+
	"   FROM split_associations AS SA\n"+
	"   UNION\n"+
	"   SELECT TP.nconst\n"+
	"   FROM imdb.title_principals AS TP\n"+
	"   WHERE TP.tconst = @tconst\n"+
	")\n"+
	"SELECT NB.*\n"+
	"FROM imdb.name_basics AS NB\n"+
	"JOIN all_associations AS AA ON AA.nconst = NB.nconst", sql.Named("tconst", tconst)).
	Scan(&result)

This is just one query that does the job. Here are the insights:

However, the query is hard to follow and maintain. We should always consider this aspect before pushing the code to production.

Find the most prolific actor in a given period

Let’s now find the person who starred in the most movies in a given period. Let’s do that using the application code:

var titlesMatchingPeriodEntities []entities.TitleBasic
db.
	Model(&entities.TitleBasic{}).
	Where("startyear >= ?", startYear).
	Where("startyear <= ?", endYear).
	Find(&titlesMatchingPeriodEntities)


var titlesMatchingPeriod = lo.Map(titlesMatchingPeriodEntities, func(x entities.TitleBasic, index int) string {
	return x.Tconst
})


var principals []entities.TitlePrincipal


for _, chunk := range lo.Chunk(titlesMatchingPeriod, 10000) {
	var chunkResult []entities.TitlePrincipal
	db.
		Model(&entities.TitlePrincipal{}).
		Where("tconst IN ?", chunk).
		Find(&chunkResult)


	principals = append(principals, chunkResult...)
}


counts := make(map[string]int)
for _, principal := range principals {
	counts[principal.Nconst] = counts[principal.Nconst] + 1
}


var countsWithKeys []lo.Tuple2[string, int]
for _, key := range lo.Keys(counts) {
	countsWithKeys = append(countsWithKeys, lo.T2(key, counts[key]))
}


sort.SliceStable(countsWithKeys, func(i, j int) bool {
	return countsWithKeys[i].B > countsWithKeys[j].B
})


topResults := lo.Subset(countsWithKeys, 0, 1)


var result []entities.NameBasic
db.
	Model(&entities.NameBasic{}).
	Where("nconst IN ?", lo.Map(topResults, func(x lo.Tuple2[string, int], index int) string {
		return x.A
	})).
	Find(&result)


return result

We first find titles in a given period. Next, we take the crew for each of them. Finally, we need to group the identifiers, get the top one, and get the details. This is a bit lengthy and is rather slow because we get all the movies in a given period. Also, notice that we need to call the final query many times for different parameters because we can’t submit just one query with tens of thousands of parameters. That’s a lot of data. Let’s see the insights:

‍

Metis shows that we read 60 million rows and return just one. That’s a lot to process. What’s more, we then take all those identifiers and send them in many queries which is slow. Can we do better? Yes, we can send a raw query:

var result []entities.NameBasic

db.Raw("SELECT NB.nconst, MAX(NB.primaryname) AS primaryname, MAX(nb.birthyear) AS birthyear, MAX(NB.deathyear) AS deathyear, MAX(nb.primaryprofession) AS primaryprofession, COUNT(*) AS number_of_titles\n"+
	"FROM imdb.title_basics AS TB\n"+
	"RIGHT JOIN imdb.title_principals AS TP ON TP.tconst = TB.tconst\n"+
	"RIGHT JOIN imdb.name_basics AS NB ON NB.nconst = TP.nconst\n"+
	"WHERE TB.startyear >= @startyear AND TB.startyear <= @endyear\n"+
	"GROUP BY NB.nconst\n"+
	"ORDER BY number_of_titles DESC\n"+
	"LIMIT 1", sql.Named("startyear", startYear), sql.Named("endyear", endYear)).
	Scan(&result)

return result

Metis shows the following:

‍

We can optimize this query a bit by using CTE:

var result []entities.NameBasic

db.Raw("WITH best_actor AS (\n"+
	"       SELECT TP.nconst, COUNT(*) AS number_of_titles\n"+
	"       FROM imdb.title_basics AS TB\n"+
	"       LEFT JOIN imdb.title_principals AS TP ON TP.tconst = TB.tconst\n"+
	"       WHERE TB.startyear >= @startyear AND TB.startyear <= @endyear AND TP.nconst IS NOT NULL\n"+
	"       GROUP BY TP.nconst\n"+
	"       ORDER BY number_of_titles DESC\n"+
	"       LIMIT 1\n"+
	")\n"+
	"SELECT BA.nconst, BA.number_of_titles, NB.primaryname, nb.birthyear, NB.deathyear, nb.primaryprofession\n"+
	"FROM best_actor AS BA\n"+
	"JOIN imdb.name_basics AS NB ON NB.nconst = BA.nconst", sql.Named("startyear", startYear), sql.Named("endyear", endYear)).
	Scan(&result)

return result

Metis shows the following:

‍

There are two tables that we can configure indexes on: title_principals and title_basics. Once we do that, we get the best performance.

Find the most prolific actors in a given genre

Let’s now find actors who did the most movies in a given genre. This is a very similar scenario to the previous one. Let’s start with the application code again:

var mostProlificActorInGenreInApp = func() []entities.NameBasic {
var titlesMatchingGenreEntities []entities.TitleBasic
db.
	Model(&entities.TitleBasic{}).
	Where("genres LIKE ?", "%"+genre+"%").
	Find(&titlesMatchingGenreEntities)


var titlesMatchingGenre = lo.Map(lo.Filter(titlesMatchingGenreEntities, func(x entities.TitleBasic, index int) bool {
	return lo.Contains(strings.Split(x.Genres, ","), genre)
}), func(x entities.TitleBasic, index int) string {
	return x.Tconst
})


var principals []entities.TitlePrincipal


for _, chunk := range lo.Chunk(titlesMatchingGenre, 10000) {
	var chunkResult []entities.TitlePrincipal
	db.
		Model(&entities.TitlePrincipal{}).
		Where("tconst IN ?", chunk).
		Find(&chunkResult)


	principals = append(principals, chunkResult...)
}


counts := make(map[string]int)
for _, principal := range principals {
	counts[principal.Nconst] = counts[principal.Nconst] + 1
}


var countsWithKeys []lo.Tuple2[string, int]
for _, key := range lo.Keys(counts) {
	countsWithKeys = append(countsWithKeys, lo.T2(key, counts[key]))
}


sort.SliceStable(countsWithKeys, func(i, j int) bool {
	return countsWithKeys[i].B > countsWithKeys[j].B
})


topResults := lo.Subset(countsWithKeys, 0, 10)


var result []entities.NameBasic
db.
	Model(&entities.NameBasic{}).
	Where("nconst IN ?", lo.Map(topResults, func(x lo.Tuple2[string, int], index int) string {
		return x.A
	})).
	Find(&result)


return result

This is nearly the same as before, only this time we query for genre. Metis shows the following:

This is even worse than the previous scenario. We now extract around 400,000 rows. While it takes 3 seconds to execute a single query, we have many of them. Also, sending the data over the wire takes much longer.

To fix that, we can use the raw query:

var result []entities.NameBasic

db.Raw("SELECT NB.nconst, NB.primaryname, NB.birthyear, COUNT(*) AS movies_count\n"+
	"FROM imdb.name_basics AS NB\n"+
	"LEFT JOIN imdb.title_principals AS TP ON TP.nconst = NB.nconst\n"+
	"LEFT JOIN imdb.title_basics AS TB ON TB.tconst = TP.tconst\n"+
	"WHERE TB.genres = @genre OR TB.genres LIKE (@genre || ',%') OR TB.genres LIKE ('%,' || @genre || ',%') OR TB.genres LIKE ('%,' || @genre)\n"+
	"GROUP BY NB.nconst, NB.primaryname, NB.birthyear\n"+
	"ORDER BY movies_count DESC\n"+
	"LIMIT 10", sql.Named("genre", genre)).
	Scan(&result)

return result

This gives the following:

‍

Much better. We have one query that runs faster now. We can also optimize the query by doing this:

var result []entities.NameBasic

db.Raw("WITH best_actors AS (\n"+
	"   SELECT TP.nconst, COUNT(*) AS movies_count\n"+
	"   FROM imdb.title_basics AS TB\n"+
	"   LEFT JOIN imdb.title_principals AS TP ON TP.tconst = TB.tconst\n"+
	"   WHERE TB.genres = @genre OR TB.genres LIKE (@genre || ',%') OR TB.genres LIKE ('%,' || @genre || ',%') OR TB.genres LIKE ('%,' || @genre)\n"+
	"   GROUP BY TP.nconst\n"+
	"   ORDER BY movies_count DESC\n"+
	"   LIMIT 10\n"+
	"  )\n"+
	"  SELECT BA.nconst, NB.primaryname, NB.birthyear, BA.movies_count\n"+
	"  FROM best_actors AS BA\n"+
	"  JOIN imdb.name_basics AS NB ON NB.nconst = BA.nconst\n"+
	"  ORDER BY movies_count DESC", sql.Named("genre", genre)).
	Scan(&result)

return result

Metis gives the following:

‍

Finally, we can configure indexes and make this query much faster.

Finding most common coworkers

Let’s now try to find people that work together the most. We can do that with a raw query:

var result []entities.NameBasic

db.Raw("WITH RECURSIVE numbers AS (\n"+
	"   SELECT 1 AS number\n"+
	"   UNION ALL\n"+
	"   SELECT number + 1 AS number FROM numbers WHERE number < 1500\n"+
	"),\n"+
	"titles_for_person AS (\n"+
	"     SELECT TC.tconst\n"+
	"     FROM imdb.title_crew AS TC\n"+
	"     WHERE directors = @nconst OR directors LIKE @nconst || ',%' OR directors LIKE '%,' || @nconst || ',%' OR directors LIKE '%,' || @nconst\n"+
	"   UNION\n"+
	"     SELECT TC.tconst\n"+
	"     FROM imdb.title_crew AS TC\n"+
	"     WHERE writers = @nconst OR writers LIKE @nconst || ',%' OR writers LIKE '%,' || @nconst || ',%' OR writers LIKE '%,' || @nconst\n"+
	"   UNION\n"+
	"     SELECT tconst\n"+
	"     FROM imdb.title_principals\n"+
	"     WHERE nconst = @nconst\n"+
	"),\n"+
	"titles_corresponding AS (\n"+
	"   SELECT TC.tconst, TC.directors, TC.writers\n"+
	"   FROM imdb.title_crew AS TC\n"+
	"   JOIN titles_for_person AS TFP ON TFP.tconst = TC.tconst\n"+
	"),\n"+
	"split_associations AS (\n"+
	"     SELECT TC.tconst, SPLIT_PART(TC.directors, ',', N.number) AS nconst\n"+
	"     FROM titles_corresponding AS TC\n"+
	"     CROSS JOIN numbers AS N\n"+
	"     WHERE directors IS NOT NULL AND CHAR_LENGTH(directors) - CHAR_LENGTH(REPLACE(directors, ',', '')) + 1 >= N.number\n"+
	"   UNION\n"+
	"     SELECT TC.tconst, SPLIT_PART(TC.writers, ',', N.number) AS nconst\n"+
	"     FROM titles_corresponding AS TC\n"+
	"     CROSS JOIN numbers AS N\n"+
	"     WHERE writers IS NOT NULL AND CHAR_LENGTH(writers) - CHAR_LENGTH(REPLACE(writers, ',', '')) + 1 >= N.number\n"+
	"),\n"+
	"all_associations AS (\n"+
	"     SELECT SA.tconst, SA.nconst\n"+
	"     FROM split_associations AS SA\n"+
	"   UNION\n"+
	"     SELECT TP.tconst, TP.nconst\n"+
	"     FROM imdb.title_principals AS TP\n"+
	"     JOIN titles_for_person AS TFP ON TFP.tconst = TP.tconst\n"+
	"),\n"+
	"other_people AS (\n"+
	"   SELECT nconst\n"+
	"   FROM all_associations\n"+
	"   WHERE nconst != @nconst\n"+
	"),\n"+
	"top_peers AS (\n"+
	"   SELECT OP.nconst, COUNT(*) as common_titles\n"+
	"   FROM other_people AS OP\n"+
	"   GROUP BY nconst\n"+
	"   ORDER BY common_titles DESC\n"+
	"   LIMIT 5\n"+
	")\n"+
	"SELECT TP.nconst, TP.common_titles, NB.*\n"+
	"FROM top_peers AS TP\n"+
	"JOIN imdb.name_basics AS NB ON NB.nconst = TP.nconst\n"+
	"ORDER BY TP.common_titles DESC", sql.Named("nconst", name)).
	Scan(&result)

return result

Metis shows the following:

‍

We can see multiple table scans and millions of rows read. We can optimize that by adding indexes. Let’s see if we can make it faster just by running the code in the application:

var titlesPrincipalMatchingPersonEntities []entities.TitlePrincipal
db.
	Model(&entities.TitlePrincipal{}).
	Where("nconst = ?", name).
	Find(&titlesPrincipalMatchingPersonEntities)

var titlesPrincipalMatchingPerson = lo.Map(titlesPrincipalMatchingPersonEntities, func(x entities.TitlePrincipal, index int) string {
	return x.Tconst
})

var otherTitlePrincipalsEntities []entities.TitlePrincipal
db.
	Model(&entities.TitlePrincipal{}).
	Where("nconst != ?", name).
	Where("tconst IN ?", titlesPrincipalMatchingPerson).
	Find(&otherTitlePrincipalsEntities)

var otherTitlePrincipals = lo.Map(otherTitlePrincipalsEntities, func(x entities.TitlePrincipal, index int) string {
	return x.Nconst
})

var titleCrewMatchingPersonEntities []entities.TitleCrew
db.
	Model(&entities.TitleCrew{}).
	Where("directors LIKE @name OR writers LIKE @name", sql.Named("name", "%"+name+"%")).
	Find(&titleCrewMatchingPersonEntities)

var titleCrewMatchingPerson = lo.FlatMap(lo.Filter(titleCrewMatchingPersonEntities, func(x entities.TitleCrew, index int) bool {
	return lo.Contains(strings.Split(x.Directors, ","), name) ||
		lo.Contains(strings.Split(x.Writers, ","), name)
}), func(x entities.TitleCrew, index int) []string {
	return lo.Uniq(append(strings.Split(x.Directors, ","), strings.Split(x.Writers, ",")...))
})

var allTeammates = lo.Filter(append(otherTitlePrincipals, titleCrewMatchingPerson...), func(x string, index int) bool {
	return x != "" && x != name
})

counts := make(map[string]int)
for _, teammate := range allTeammates {
	counts[teammate] = counts[teammate] + 1
}

var countsWithKeys []lo.Tuple2[string, int]
for _, key := range lo.Keys(counts) {
	countsWithKeys = append(countsWithKeys, lo.T2(key, counts[key]))
}

sort.SliceStable(countsWithKeys, func(i, j int) bool {
	return countsWithKeys[i].B > countsWithKeys[j].B
})

topResults := lo.Subset(countsWithKeys, 0, 5)

var result []entities.NameBasic
db.
	Model(&entities.NameBasic{}).
	Where("nconst IN ?", lo.Map(topResults, func(x lo.Tuple2[string, int], index int) string {
		return x.A
	})).
	Find(&result)

return result

First, we want to take that data from the title_principals table. We start with a known nconst, find all movies for a given person, and then for all the movies we find other people.

Next, we take similar data from title_crew. We then parse the data, aggregate it, and find top teammates. Finally, we get the data from the name_basics table.

We see that Metis identifies indexes that may improve the scenario. Even without these indexes, the query running in the application finishes faster than the raw query. That’s a good starting point. Once again, we need to add improvements and measure them to see how things work in practice.

Summary

We examined various real scenarios with GORM to see how we could improve the performance. In previous parts of this series, we relied on the database only to make it faster. This time we can do some processing with ORM and the application to change the performance characteristics. Metis shows good insights into all that we do, so we can track the optimizations and performance benefits as we proceed with the implementation changes. We can do the same with our actual applications even if we have very few rows in the database.

How Metis Optimized Queries Executed by GORM - Part 2

Given two people, list what movies they appeared in together

List all of the cast and crew in a given movie

Find the most prolific actor in a given period

Find the most prolific actors in a given genre

Finding most common coworkers

Summary

Keep Your Databases Fast

Harness Your Database Complexity

Prevent Your Data Workflows From Breaking

Never worry about your
database again!

Keep Your Databases Fast

Harness Your Database Complexity

Prevent Your Data Workflows From Breaking

Never worry about your database again!

Contact Us

Never worry about your
database again!