Performance Tips

The following tips for performance improvement is not tight to a specific DB solution. They can be applied to many of the DB solutions nowadays.

Prepared Statements

Like GraphQL, each document attached in a request needs to be parsed and validated before execution. If a statement is prepared, the same requests that come with the same statement can skip this process, and thus improves the performance.

However, prepared statements only work in the same session. If a statement is not prepared in a session, it needs to be prepared before execution.

This also doesn't help in performance if a statement itself is time consuming - e.g. loading a large amount of data.

Index

This is also a common technique in database to accelerate data lookup. This technique works similarly to a hash table where the key is generated by the specified column, and the value is the corresponding row of records. A common implementation of Index is B+ Tree. However, how an index maps to a record could be various in different DB implementation.

Partition

This technique segments the data into different partitions according to the column(s) specified, such that queries can scan only the partitions of data that are interested without scanning through all the records one by one.

However, be sure to benchmark the performance before determining which columns to use for partitioning. Scanning through a large amount of partitions would not result in performance improvement.

Also, be sure to keep the partitions up-to-date when data are updated in the database. A common practice is to leverage cron job to regularly update the partitions created such that queries run against the partitions would not be performed with stale data.

Copy

To improve INSERT performance for a large amount of data, the COPY command can be used to import the data from either a file (e.g. a csv file) or standard input to the database.

Separation of Concerns

This is basically an idea of horizontal scaling, where READ replicas are created to take care of the READ concerns specifically, such that the main instance of the DB can focus on scaling for the WRITE operations. This is usually a technique used when the amount of READ operations is significantly larger than the WRITE operations.