Tuesday, June 15, 2021

Materialized Views in RDBMS - Is it a View or Table?

 Materialized View Image

Google uses structured data to understand the content on the page and use that data to display in richer features in the search results.

Below you can see the difference in the search result data provided by Google for the OnePlus 8T product from official site vs amazon site. The amazon one has additional data rendered in the same search results.

Google Search Results
Image Courtesy - Shushma

As we started to build the feature to feed the product details to Google, the data has to be retrieved from various tables from the catalog schema based on multiple  conditions.

Once we had the API ready, we did load testing, having 25 threads with a ramp-up time of 5 seconds and test duration of 30 seconds. Though we know the search engine bots won't put so much load on the site but the performance of the API was a concern to check.

Performance Test Result 01

If you look at the 95 percentile of time, it took 10 seconds to respond and the number of requests reached 430 only. Team members came up with the idea of caching the data on redis, so that next call will be faster. Caching at application layer for this use case will not be of that great help as we have a TTL of 600 seconds on Redis.

I recalled in one of my previous assignments, where we had a similar use case and we had utilised the caching of the data on the database (materialized view) side instead on the application layer which had given better results.

Understand Terminologies

Let's start with TABLE, it's basically an organized storage for your data in the form of rows and columns. You can easily query the table using predicates on the columns. 

To simplify the queries or maybe to apply different security mechanisms on data being accessed you can use VIEWs. Think of a view as glasses through which you can look at your data without knowing the actual tables and realtionship details etc.

If the table is a storage, a view is just a way of looking at it or a projection of the data as view doesn't store the data physically. If you query a table, you fetch its data directly. On the other hand, when you query a view, you are basically querying another query that is stored in the view's definition. But the query planner is aware of that and build a plan to merge the two together and give the results.

Between the table and view, we have the MATERIALIZED VIEW. It's a view that has a query in its definition and uses this query to fetch the data directly from the storage, but it also has it's own storage that basically acts as a cache in between the underlying table(s). 

Now you might have a question, if any data is updated in the underlying table(s) whether they will reflect in materialized view. The simple answer is NO as materialized view acts like a cache, it wont reflect the changes. We need to refresh the materialized view, through a process that would cause it's definition's query to be executed again on actual data and rebuilds the cache.

Materialized View Approach

We created a materialized view to cache the data on the database side, as we were retrieving the data from multiple tables. The required data is available in it, so we can avoid lot of computation at the real-time and also the fetched data won't change very frequently.

With the code changes, we did same load testing having 25 threads with a ramp-up time of 5 seconds and test duration of 30 seconds.

Performance Test Result 02

If you look at the 95 percentile of time it took 75 milli seconds to respond and the number of requests reached to 8376. The performance gain with respect to throughput is 19x and 95 percentile of time is 134x time faster. The results looks wonderful and it's without caching on the application layer side.

Query Plan

  • The original query which used to join multiple tables with various conditions had the query planning time of 0.852 ms and execution time of 5.511 ms.
  • The query which we fire on materialized view from the application layer had the query planning time of 0.087 ms and execution time of 0.052 ms.

Conclusion

  • The Materialized View is supported by all relational databases like Oracle, MySQL, Postgres etc.
  • The  Materialized View is a powerful tool enabling many performance improvements while providing another way of ensuring data consistency.

Tags: , , , , ,
Location: Mysuru, Karnataka, India

2 comments:

  1. This blog is written in an amazing way, Writer has explained the tech concepts like a story, totally understood materialized view concepts.
    One of the best tech blog I have seen... hoping to see next one soon :)

    ReplyDelete
  2. How about the refresh interval between the underlying tables and materialised view. Was it handled via scheduler ? If yes, then any insights on execution plans for varying amount of data for each refresh.

    ReplyDelete

Featured Post

Benefits & Best Practices of Code Review

Photo by Bochelly Code reviews are methodical assessments of code designed to identify bugs, increase code quality, and help developers lear...