How Metis Optimized Queries Executed by GORM - Part 2
We continue our journey into how Metis can prevent, monitor, and troubleshoot our databases. In the previous part we started playing with GORM, a feature-rich ORM for GO. This time we’re going to see more challenging examples. We are going to see actual insights provided by Metis and how they improve the query performance.
GORM is an ORM for GO. It can connect to PostgreSQL and other SQL engines. It supports transactions, relations, eager loading, lazy loading, functions, and other common operations. The previous part covered some of the queries from Part 1 of this series. This time we are going to implement more sophisticated scenarios to see how GORM can deal with them. Let’s go.
Given two people, list what movies they appeared in together
Let’s find common movies for two people. That’s the application code doing that:
var first []entities.TitlePrincipal
db.
Model(&entities.TitlePrincipal{}).
Where("nconst = ?", actor1).
Find(&first)
var second []entities.TitlePrincipal
db.
Model(&entities.TitlePrincipal{}).
Where("nconst = ?", actor2).
Find(&second)
var firstTconsts []string
for _, item := range first {
firstTconsts = append(firstTconsts, item.Tconst)
}
var secondTconsts []string
for _, item := range second {
secondTconsts = append(secondTconsts, item.Tconst)
}
var result []entities.TitleBasic
db.
Model(&entities.TitleBasic{}).
Where("tconst IN ?", firstTconsts).
Where("tconst IN ?", secondTconsts).
Find(&result)
return result
We get titles for the first person, then for the second one, and finally we get titles matching both. The code generates the following queries:
SELECT
*
FROM
imdb.title_principals
WHERE
nconst = $1 /*traceparent='00-5d75853696868445b9550836b39af973-10c475875b500e28-01'*/
And then the query for getting all the titles
SELECT
*
FROM
imdb.title_basics
WHERE
tconst IN (
$1,
$2,
...
$12,
$13
)
AND tconst IN (
$14,
$15,
...
$544,
$545
) /*traceparent='00-5d75853696868445b9550836b39af973-10c475875b500e28-01'*/
Metis indicates a missing index:
We have two issues here. First, we generate a very big query. Not all ORMs can deal with that. Let’s fix that first:
var first []entities.TitlePrincipal
db.
Model(&entities.TitlePrincipal{}).
Where("nconst = ?", actor1).
Find(&first)
var second []entities.TitlePrincipal
db.
Model(&entities.TitlePrincipal{}).
Where("nconst = ?", actor2).
Find(&second)
knownTitles := map[string]struct{}{}
for _, item := range first {
knownTitles[item.Tconst] = struct{}{}
}
var finalTconsts []string
for _, item := range second {
if _, exists := knownTitles[item.Tconst]; exists {
finalTconsts = append(finalTconsts, item.Tconst)
}
}
var result []entities.TitleBasic
db.
Model(&entities.TitleBasic{}).
Where("tconst IN ?", finalTconsts).
Find(&result)
return result
We get this query now:
SELECT
*
FROM
imdb.title_basics
WHERE
tconst IN ($1) /*traceparent='00-5afead8f3e591a8d55a7fded030cc334-1bca0a48a3a10b1f-01'*/
These are the insights:
We can now further improve the query performance by adding an index for title_principals. This is the code to do that:
CREATE INDEX IF NOT EXISTS title_principals_nconst_idx ON imdb.title_principals(nconst) INCLUDE (tconst)
Here are the insights after adding the index:
We can see all is good now. However, we can get the same result with just one SQL query:
var result []entities.TitleBasic
db.
Model(&entities.TitleBasic{}).
Joins("JOIN imdb.title_principals AS TP1 ON TP1.tconst = title_basics.tconst").
Joins("JOIN imdb.title_principals AS TP2 ON TP2.tconst = title_basics.tconst").
Where("TP1.nconst = ?", actor1).
Where("TP2.nconst = ?", actor2).
Find(&result)
return result
This is the generated query:
SELECT
title_basics.tconst,
title_basics.titletype,
title_basics.primarytitle,
title_basics.originaltitle,
title_basics.isadult,
title_basics.startyear,
title_basics.endyear,
title_basics.runtimeminutes,
title_basics.genres
FROM
imdb.title_basics
JOIN imdb.title_principals AS TP1 ON TP1.tconst = title_basics.tconst
JOIN imdb.title_principals AS TP2 ON TP2.tconst = title_basics.tconst
WHERE
TP1.nconst = $1
AND TP2.nconst = $2 /*traceparent='00-4c9aa715d7d7535dfadcd63e6840ae11-d0f3731f8b6fac30-01'*/
And we can now see these insights:
This is really good. The codebase is much easier to follow now.
List all of the cast and crew in a given movie
Let’s now find all the people involved in the movie. Here is the first solution. We can just send the raw query:
var result []entities.NameBasic
db.Raw("SELECT DISTINCT NB.*\n"+
"FROM imdb.title_basics AS TB\n"+
"LEFT JOIN imdb.title_principals AS TP ON TP.tconst = TB.tconst\n"+
"LEFT JOIN imdb.title_crew AS TC ON TC.tconst = TB.tconst\n"+
"LEFT JOIN imdb.name_basics AS NB ON \n"+
" NB.nconst = TP.nconst \n"+
" OR TC.directors = NB.nconst\n"+
" OR TC.directors LIKE NB.nconst || ',%'::text\n"+
" OR TC.directors LIKE '%,'::text || NB.nconst || ',%'::text\n"+
" OR TC.directors LIKE '%,'::text || NB.nconst\n"+
" OR TC.writers = NB.nconst\n"+
" OR TC.writers LIKE NB.nconst || ',%'::text\n"+
" OR TC.writers LIKE '%,'::text || NB.nconst || ',%'::text\n"+
" OR TC.writers LIKE '%,'::text || NB.nconst\n"+
"WHERE TB.tconst = ?", tconst).
Scan(&result)
return result
This is the query:
SELECT DISTINCT
NB.*
FROM
imdb.title_basics AS TB
LEFT JOIN imdb.title_principals AS TP ON TP.tconst = TB.tconst
LEFT JOIN imdb.title_crew AS TC ON TC.tconst = TB.tconst
LEFT JOIN imdb.name_basics AS NB ON NB.nconst = TP.nconst
OR TC.directors = NB.nconst
OR TC.directors LIKE NB.nconst || ',%'::text
OR TC.directors LIKE '%,'::text || NB.nconst || ',%'::text
OR TC.directors LIKE '%,'::text || NB.nconst
OR TC.writers = NB.nconst
OR TC.writers LIKE NB.nconst || ',%'::text
OR TC.writers LIKE '%,'::text || NB.nconst || ',%'::text
OR TC.writers LIKE '%,'::text || NB.nconst
WHERE
TB.tconst = $1 /*traceparent='00-cafd685027549f951151d2748b7bfeec-c6466989569b4703-01'*/
We get the following insights:
This runs in nearly 100 seconds. We can split the query into UNIONs to see if it’s faster:
var result []entities.NameBasic
db.Raw("SELECT DISTINCT NB.*\n"+
" FROM imdb.title_principals AS TP\n"+
" JOIN (\n"+
" SELECT tconst, directors, writers\n"+
" FROM imdb.title_crew\n"+
" WHERE tconst = @tconst\n"+
" ) AS TC ON TC.tconst = TP.tconst\n"+
" LEFT JOIN imdb.name_basics AS NB ON NB.nconst = TP.nconst\n"+
"UNION\n"+
" SELECT DISTINCT NB.*\n"+
" FROM imdb.title_principals AS TP\n"+
" JOIN (\n"+
" SELECT tconst, directors, writers\n"+
" FROM imdb.title_crew\n"+
" WHERE tconst = @tconst\n"+
" ) AS TC ON TC.tconst = TP.tconst\n"+
" LEFT JOIN imdb.name_basics AS NB ON TC.directors LIKE NB.nconst || ',%'::text\n"+
"UNION\n"+
" SELECT DISTINCT NB.*\n"+
" FROM imdb.title_principals AS TP\n"+
" JOIN (\n"+
" SELECT tconst, directors, writers\n"+
" FROM imdb.title_crew\n"+
" WHERE tconst = @tconst\n"+
" ) AS TC ON TC.tconst = TP.tconst\n"+
" LEFT JOIN imdb.name_basics AS NB ON TC.directors LIKE '%,'::text || NB.nconst || ',%'::text\n"+
"UNION\n"+
" SELECT DISTINCT NB.*\n"+
" FROM imdb.title_principals AS TP\n"+
" JOIN (\n"+
" SELECT tconst, directors, writers\n"+
" FROM imdb.title_crew\n"+
" WHERE tconst = @tconst\n"+
" ) AS TC ON TC.tconst = TP.tconst\n"+
" LEFT JOIN imdb.name_basics AS NB ON TC.directors LIKE '%,'::text || NB.nconst\n"+
"UNION\n"+
" SELECT DISTINCT NB.*\n"+
" FROM imdb.title_principals AS TP\n"+
" JOIN (\n"+
" SELECT tconst, directors, writers\n"+
" FROM imdb.title_crew\n"+
" WHERE tconst = @tconst\n"+
" ) AS TC ON TC.tconst = TP.tconst\n"+
" LEFT JOIN imdb.name_basics AS NB ON TC.writers = NB.nconst\n"+
"UNION\n"+
" SELECT DISTINCT NB.*\n"+
" FROM imdb.title_principals AS TP\n"+
" JOIN (\n"+
" SELECT tconst, directors, writers\n"+
" FROM imdb.title_crew\n"+
" WHERE tconst = @tconst\n"+
" ) AS TC ON TC.tconst = TP.tconst\n"+
" LEFT JOIN imdb.name_basics AS NB ON TC.writers LIKE NB.nconst || ',%'::text\n"+
"UNION\n"+
" SELECT DISTINCT NB.*\n"+
" FROM imdb.title_principals AS TP\n"+
" JOIN (\n"+
" SELECT tconst, directors, writers\n"+
" FROM imdb.title_crew\n"+
" WHERE tconst = @tconst\n"+
" ) AS TC ON TC.tconst = TP.tconst\n"+
" LEFT JOIN imdb.name_basics AS NB ON TC.writers LIKE '%,'::text || NB.nconst || ',%'::text\n"+
"UNION\n"+
" SELECT DISTINCT NB.*\n"+
" FROM imdb.title_principals AS TP\n"+
" JOIN (\n"+
" SELECT tconst, directors, writers\n"+
" FROM imdb.title_crew\n"+
" WHERE tconst = @tconst\n"+
" ) AS TC ON TC.tconst = TP.tconst\n"+
" LEFT JOIN imdb.name_basics AS NB ON TC.writers LIKE '%,'::text || NB.nconst", sql.Named("tconst", tconst)).
Scan(&result)
return result
Let’s see the insights:
This is a major improvement, however, queries are still slow. Let’s try implementing the same in the application code:
knownNames := map[string]struct{}{}
var crewViaTitlePrincipalsEntities []entities.TitlePrincipal
db.
Model(&entities.TitlePrincipal{}).
Where("tconst = ?", tconst).
Find(&crewViaTitlePrincipalsEntities)
var allMatchingNames []string
for _, item := range crewViaTitlePrincipalsEntities {
if _, exists := knownNames[item.Nconst]; !exists {
knownNames[item.Nconst] = struct{}{}
allMatchingNames = append(allMatchingNames, item.Nconst)
}
}
var crewViaTitleCrewEntities []entities.TitleCrew
db.
Model(&entities.TitleCrew{}).
Where("tconst = ?", tconst).
Find(&crewViaTitleCrewEntities)
for _, item := range crewViaTitleCrewEntities {
for _, director := range strings.Split(item.Directors, ",") {
if _, exists := knownNames[director]; !exists {
knownNames[director] = struct{}{}
allMatchingNames = append(allMatchingNames, director)
}
}
for _, writer := range strings.Split(item.Writers, ",") {
if _, exists := knownNames[writer]; !exists {
allMatchingNames = append(allMatchingNames, writer)
}
}
}
var result []entities.NameBasic
db.
Model(&entities.NameBasic{}).
Where("nconst IN ?", allMatchingNames).
Find(&result)
return result
We first take the crew from the title_principals table. Next, we extract the crew from the title_crew. Finally, we split directors and writers, and we calculate the intersection of both datasets to get the names.
This generates the following trace in Metis:
It’s all green! We can see that we have three queries sent, but they are blazingly fast because of indexes. This time it was much better to do processing in the application instead of on the SQL level.
However, we can do even better with this SQL:
var result []entities.NameBasic
db.Raw("WITH RECURSIVE numbers AS (\n"+
" SELECT 1 AS number\n"+
" UNION ALL\n"+
" SELECT number + 1 AS number FROM numbers WHERE number < 1500\n"+
"),\n"+
"split_associations AS (\n"+
" SELECT SPLIT_PART(TC.directors, ',', N.number) AS nconst\n"+
" FROM imdb.title_crew AS TC\n"+
" CROSS JOIN numbers AS N\n"+
" WHERE tconst = @tconst AND directors IS NOT NULL AND CHAR_LENGTH(directors) - CHAR_LENGTH(REPLACE(directors, ',', '')) + 1 >= N.number\n"+
" UNION\n"+
" SELECT SPLIT_PART(TC.writers, ',', N.number) AS nconst\n"+
" FROM imdb.title_crew AS TC\n"+
" CROSS JOIN numbers AS N\n"+
" WHERE tconst = @tconst AND writers IS NOT NULL AND CHAR_LENGTH(writers) - CHAR_LENGTH(REPLACE(writers, ',', '')) + 1 >= N.number\n"+
"),\n"+
"all_associations AS (\n"+
" SELECT SA.nconst\n"+
" FROM split_associations AS SA\n"+
" UNION\n"+
" SELECT TP.nconst\n"+
" FROM imdb.title_principals AS TP\n"+
" WHERE TP.tconst = @tconst\n"+
")\n"+
"SELECT NB.*\n"+
"FROM imdb.name_basics AS NB\n"+
"JOIN all_associations AS AA ON AA.nconst = NB.nconst", sql.Named("tconst", tconst)).
Scan(&result)
This is just one query that does the job. Here are the insights:
However, the query is hard to follow and maintain. We should always consider this aspect before pushing the code to production.
Find the most prolific actor in a given period
Let’s now find the person who starred in the most movies in a given period. Let’s do that using the application code:
var titlesMatchingPeriodEntities []entities.TitleBasic
db.
Model(&entities.TitleBasic{}).
Where("startyear >= ?", startYear).
Where("startyear <= ?", endYear).
Find(&titlesMatchingPeriodEntities)
var titlesMatchingPeriod = lo.Map(titlesMatchingPeriodEntities, func(x entities.TitleBasic, index int) string {
return x.Tconst
})
var principals []entities.TitlePrincipal
for _, chunk := range lo.Chunk(titlesMatchingPeriod, 10000) {
var chunkResult []entities.TitlePrincipal
db.
Model(&entities.TitlePrincipal{}).
Where("tconst IN ?", chunk).
Find(&chunkResult)
principals = append(principals, chunkResult...)
}
counts := make(map[string]int)
for _, principal := range principals {
counts[principal.Nconst] = counts[principal.Nconst] + 1
}
var countsWithKeys []lo.Tuple2[string, int]
for _, key := range lo.Keys(counts) {
countsWithKeys = append(countsWithKeys, lo.T2(key, counts[key]))
}
sort.SliceStable(countsWithKeys, func(i, j int) bool {
return countsWithKeys[i].B > countsWithKeys[j].B
})
topResults := lo.Subset(countsWithKeys, 0, 1)
var result []entities.NameBasic
db.
Model(&entities.NameBasic{}).
Where("nconst IN ?", lo.Map(topResults, func(x lo.Tuple2[string, int], index int) string {
return x.A
})).
Find(&result)
return result
We first find titles in a given period. Next, we take the crew for each of them. Finally, we need to group the identifiers, get the top one, and get the details. This is a bit lengthy and is rather slow because we get all the movies in a given period. Also, notice that we need to call the final query many times for different parameters because we can’t submit just one query with tens of thousands of parameters. That’s a lot of data. Let’s see the insights:
Metis shows that we read 60 million rows and return just one. That’s a lot to process. What’s more, we then take all those identifiers and send them in many queries which is slow. Can we do better? Yes, we can send a raw query:
var result []entities.NameBasic
db.Raw("SELECT NB.nconst, MAX(NB.primaryname) AS primaryname, MAX(nb.birthyear) AS birthyear, MAX(NB.deathyear) AS deathyear, MAX(nb.primaryprofession) AS primaryprofession, COUNT(*) AS number_of_titles\n"+
"FROM imdb.title_basics AS TB\n"+
"RIGHT JOIN imdb.title_principals AS TP ON TP.tconst = TB.tconst\n"+
"RIGHT JOIN imdb.name_basics AS NB ON NB.nconst = TP.nconst\n"+
"WHERE TB.startyear >= @startyear AND TB.startyear <= @endyear\n"+
"GROUP BY NB.nconst\n"+
"ORDER BY number_of_titles DESC\n"+
"LIMIT 1", sql.Named("startyear", startYear), sql.Named("endyear", endYear)).
Scan(&result)
return result
Metis shows the following:
We can optimize this query a bit by using CTE:
var result []entities.NameBasic
db.Raw("WITH best_actor AS (\n"+
" SELECT TP.nconst, COUNT(*) AS number_of_titles\n"+
" FROM imdb.title_basics AS TB\n"+
" LEFT JOIN imdb.title_principals AS TP ON TP.tconst = TB.tconst\n"+
" WHERE TB.startyear >= @startyear AND TB.startyear <= @endyear AND TP.nconst IS NOT NULL\n"+
" GROUP BY TP.nconst\n"+
" ORDER BY number_of_titles DESC\n"+
" LIMIT 1\n"+
")\n"+
"SELECT BA.nconst, BA.number_of_titles, NB.primaryname, nb.birthyear, NB.deathyear, nb.primaryprofession\n"+
"FROM best_actor AS BA\n"+
"JOIN imdb.name_basics AS NB ON NB.nconst = BA.nconst", sql.Named("startyear", startYear), sql.Named("endyear", endYear)).
Scan(&result)
return result
Metis shows the following:
There are two tables that we can configure indexes on: title_principals and title_basics. Once we do that, we get the best performance.
Find the most prolific actors in a given genre
Let’s now find actors who did the most movies in a given genre. This is a very similar scenario to the previous one. Let’s start with the application code again:
var mostProlificActorInGenreInApp = func() []entities.NameBasic {
var titlesMatchingGenreEntities []entities.TitleBasic
db.
Model(&entities.TitleBasic{}).
Where("genres LIKE ?", "%"+genre+"%").
Find(&titlesMatchingGenreEntities)
var titlesMatchingGenre = lo.Map(lo.Filter(titlesMatchingGenreEntities, func(x entities.TitleBasic, index int) bool {
return lo.Contains(strings.Split(x.Genres, ","), genre)
}), func(x entities.TitleBasic, index int) string {
return x.Tconst
})
var principals []entities.TitlePrincipal
for _, chunk := range lo.Chunk(titlesMatchingGenre, 10000) {
var chunkResult []entities.TitlePrincipal
db.
Model(&entities.TitlePrincipal{}).
Where("tconst IN ?", chunk).
Find(&chunkResult)
principals = append(principals, chunkResult...)
}
counts := make(map[string]int)
for _, principal := range principals {
counts[principal.Nconst] = counts[principal.Nconst] + 1
}
var countsWithKeys []lo.Tuple2[string, int]
for _, key := range lo.Keys(counts) {
countsWithKeys = append(countsWithKeys, lo.T2(key, counts[key]))
}
sort.SliceStable(countsWithKeys, func(i, j int) bool {
return countsWithKeys[i].B > countsWithKeys[j].B
})
topResults := lo.Subset(countsWithKeys, 0, 10)
var result []entities.NameBasic
db.
Model(&entities.NameBasic{}).
Where("nconst IN ?", lo.Map(topResults, func(x lo.Tuple2[string, int], index int) string {
return x.A
})).
Find(&result)
return result
This is nearly the same as before, only this time we query for genre. Metis shows the following:
This is even worse than the previous scenario. We now extract around 400,000 rows. While it takes 3 seconds to execute a single query, we have many of them. Also, sending the data over the wire takes much longer.
To fix that, we can use the raw query:
var result []entities.NameBasic
db.Raw("SELECT NB.nconst, NB.primaryname, NB.birthyear, COUNT(*) AS movies_count\n"+
"FROM imdb.name_basics AS NB\n"+
"LEFT JOIN imdb.title_principals AS TP ON TP.nconst = NB.nconst\n"+
"LEFT JOIN imdb.title_basics AS TB ON TB.tconst = TP.tconst\n"+
"WHERE TB.genres = @genre OR TB.genres LIKE (@genre || ',%') OR TB.genres LIKE ('%,' || @genre || ',%') OR TB.genres LIKE ('%,' || @genre)\n"+
"GROUP BY NB.nconst, NB.primaryname, NB.birthyear\n"+
"ORDER BY movies_count DESC\n"+
"LIMIT 10", sql.Named("genre", genre)).
Scan(&result)
return result
This gives the following:
Much better. We have one query that runs faster now. We can also optimize the query by doing this:
var result []entities.NameBasic
db.Raw("WITH best_actors AS (\n"+
" SELECT TP.nconst, COUNT(*) AS movies_count\n"+
" FROM imdb.title_basics AS TB\n"+
" LEFT JOIN imdb.title_principals AS TP ON TP.tconst = TB.tconst\n"+
" WHERE TB.genres = @genre OR TB.genres LIKE (@genre || ',%') OR TB.genres LIKE ('%,' || @genre || ',%') OR TB.genres LIKE ('%,' || @genre)\n"+
" GROUP BY TP.nconst\n"+
" ORDER BY movies_count DESC\n"+
" LIMIT 10\n"+
" )\n"+
" SELECT BA.nconst, NB.primaryname, NB.birthyear, BA.movies_count\n"+
" FROM best_actors AS BA\n"+
" JOIN imdb.name_basics AS NB ON NB.nconst = BA.nconst\n"+
" ORDER BY movies_count DESC", sql.Named("genre", genre)).
Scan(&result)
return result
Metis gives the following:
Finally, we can configure indexes and make this query much faster.
Finding most common coworkers
Let’s now try to find people that work together the most. We can do that with a raw query:
var result []entities.NameBasic
db.Raw("WITH RECURSIVE numbers AS (\n"+
" SELECT 1 AS number\n"+
" UNION ALL\n"+
" SELECT number + 1 AS number FROM numbers WHERE number < 1500\n"+
"),\n"+
"titles_for_person AS (\n"+
" SELECT TC.tconst\n"+
" FROM imdb.title_crew AS TC\n"+
" WHERE directors = @nconst OR directors LIKE @nconst || ',%' OR directors LIKE '%,' || @nconst || ',%' OR directors LIKE '%,' || @nconst\n"+
" UNION\n"+
" SELECT TC.tconst\n"+
" FROM imdb.title_crew AS TC\n"+
" WHERE writers = @nconst OR writers LIKE @nconst || ',%' OR writers LIKE '%,' || @nconst || ',%' OR writers LIKE '%,' || @nconst\n"+
" UNION\n"+
" SELECT tconst\n"+
" FROM imdb.title_principals\n"+
" WHERE nconst = @nconst\n"+
"),\n"+
"titles_corresponding AS (\n"+
" SELECT TC.tconst, TC.directors, TC.writers\n"+
" FROM imdb.title_crew AS TC\n"+
" JOIN titles_for_person AS TFP ON TFP.tconst = TC.tconst\n"+
"),\n"+
"split_associations AS (\n"+
" SELECT TC.tconst, SPLIT_PART(TC.directors, ',', N.number) AS nconst\n"+
" FROM titles_corresponding AS TC\n"+
" CROSS JOIN numbers AS N\n"+
" WHERE directors IS NOT NULL AND CHAR_LENGTH(directors) - CHAR_LENGTH(REPLACE(directors, ',', '')) + 1 >= N.number\n"+
" UNION\n"+
" SELECT TC.tconst, SPLIT_PART(TC.writers, ',', N.number) AS nconst\n"+
" FROM titles_corresponding AS TC\n"+
" CROSS JOIN numbers AS N\n"+
" WHERE writers IS NOT NULL AND CHAR_LENGTH(writers) - CHAR_LENGTH(REPLACE(writers, ',', '')) + 1 >= N.number\n"+
"),\n"+
"all_associations AS (\n"+
" SELECT SA.tconst, SA.nconst\n"+
" FROM split_associations AS SA\n"+
" UNION\n"+
" SELECT TP.tconst, TP.nconst\n"+
" FROM imdb.title_principals AS TP\n"+
" JOIN titles_for_person AS TFP ON TFP.tconst = TP.tconst\n"+
"),\n"+
"other_people AS (\n"+
" SELECT nconst\n"+
" FROM all_associations\n"+
" WHERE nconst != @nconst\n"+
"),\n"+
"top_peers AS (\n"+
" SELECT OP.nconst, COUNT(*) as common_titles\n"+
" FROM other_people AS OP\n"+
" GROUP BY nconst\n"+
" ORDER BY common_titles DESC\n"+
" LIMIT 5\n"+
")\n"+
"SELECT TP.nconst, TP.common_titles, NB.*\n"+
"FROM top_peers AS TP\n"+
"JOIN imdb.name_basics AS NB ON NB.nconst = TP.nconst\n"+
"ORDER BY TP.common_titles DESC", sql.Named("nconst", name)).
Scan(&result)
return result
Metis shows the following:
We can see multiple table scans and millions of rows read. We can optimize that by adding indexes. Let’s see if we can make it faster just by running the code in the application:
var titlesPrincipalMatchingPersonEntities []entities.TitlePrincipal
db.
Model(&entities.TitlePrincipal{}).
Where("nconst = ?", name).
Find(&titlesPrincipalMatchingPersonEntities)
var titlesPrincipalMatchingPerson = lo.Map(titlesPrincipalMatchingPersonEntities, func(x entities.TitlePrincipal, index int) string {
return x.Tconst
})
var otherTitlePrincipalsEntities []entities.TitlePrincipal
db.
Model(&entities.TitlePrincipal{}).
Where("nconst != ?", name).
Where("tconst IN ?", titlesPrincipalMatchingPerson).
Find(&otherTitlePrincipalsEntities)
var otherTitlePrincipals = lo.Map(otherTitlePrincipalsEntities, func(x entities.TitlePrincipal, index int) string {
return x.Nconst
})
var titleCrewMatchingPersonEntities []entities.TitleCrew
db.
Model(&entities.TitleCrew{}).
Where("directors LIKE @name OR writers LIKE @name", sql.Named("name", "%"+name+"%")).
Find(&titleCrewMatchingPersonEntities)
var titleCrewMatchingPerson = lo.FlatMap(lo.Filter(titleCrewMatchingPersonEntities, func(x entities.TitleCrew, index int) bool {
return lo.Contains(strings.Split(x.Directors, ","), name) ||
lo.Contains(strings.Split(x.Writers, ","), name)
}), func(x entities.TitleCrew, index int) []string {
return lo.Uniq(append(strings.Split(x.Directors, ","), strings.Split(x.Writers, ",")...))
})
var allTeammates = lo.Filter(append(otherTitlePrincipals, titleCrewMatchingPerson...), func(x string, index int) bool {
return x != "" && x != name
})
counts := make(map[string]int)
for _, teammate := range allTeammates {
counts[teammate] = counts[teammate] + 1
}
var countsWithKeys []lo.Tuple2[string, int]
for _, key := range lo.Keys(counts) {
countsWithKeys = append(countsWithKeys, lo.T2(key, counts[key]))
}
sort.SliceStable(countsWithKeys, func(i, j int) bool {
return countsWithKeys[i].B > countsWithKeys[j].B
})
topResults := lo.Subset(countsWithKeys, 0, 5)
var result []entities.NameBasic
db.
Model(&entities.NameBasic{}).
Where("nconst IN ?", lo.Map(topResults, func(x lo.Tuple2[string, int], index int) string {
return x.A
})).
Find(&result)
return result
First, we want to take that data from the title_principals table. We start with a known nconst, find all movies for a given person, and then for all the movies we find other people.
Next, we take similar data from title_crew. We then parse the data, aggregate it, and find top teammates. Finally, we get the data from the name_basics table.
We see that Metis identifies indexes that may improve the scenario. Even without these indexes, the query running in the application finishes faster than the raw query. That’s a good starting point. Once again, we need to add improvements and measure them to see how things work in practice.
Summary
We examined various real scenarios with GORM to see how we could improve the performance. In previous parts of this series, we relied on the database only to make it faster. This time we can do some processing with ORM and the application to change the performance characteristics. Metis shows good insights into all that we do, so we can track the optimizations and performance benefits as we proceed with the implementation changes. We can do the same with our actual applications even if we have very few rows in the database.
This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block. This is some text inside of a div block.