PostgreSQL benchmarking over a RAMdisk?

Question

I have been considering the idea of moving to a RAMdisk for a while. I know its risks, but just wanted to do a little benchmark. I just had two questions: (a) when reading the query plan, will it still differentiate between disk and buffers hits? If so, should I assume that both are equally expensive or should I assume that there is a difference between them? (b) a RAM disk is not persistent, but if I want to export some results to persistent storage, are there some precautions I would need to take? Is it the same as usual e.g. COPY command?

You do not need a RAM disk to have the database in memory. Use pg_prewarm. What do you want to benchmark with a strange setup like that? — Laurenz Albe, Commented Mar 23, 2020 at 7:05
I am creating tables, and I want to minimize disk write cost: I want main-memory tables. Is this not a suitable application? I am not saying that I care about persistence of my data. I suppose PostgreSQL would not be the best for this goal, but I am restricted to using it. — Zeruno, Commented Mar 23, 2020 at 13:08

Laurenz Albe · Accepted Answer · 2020-03-23 14:14:18Z

I do not recommend using RAM disks in PostgreSQL for persistent storage. With careful tuning, you can get PostgreSQL not to use more disk I/O than what is required to make your data persistent.

I recommend doing this:

Have more RAM in your machine than the size of the database.
Define shared_buffers big enough to contain the database (on Linux, define memory hugepages to contain them).
Increase checkpoint_timeout and max_wal_size to get fewer checkpoints.
Set synchronous_commit = off to keep PostgreSQL from syncing WAL to disk on every commit.
If you are happy to lose all your data in the case of a crash, define your tables UNLOGGED. The data will survive a normal shutdown.

Anyway, to answer your questions:

(a) You should set seq_page_cost and random_page_cost way lower to tell PostgreSQL how fast your storage is.

(b) You could run backups with either pg_dump or pg_basebackup, they don't care what kind of storage you have got.

Thanks for your comment. My concern is simply that despite all the parameters you have mentioned, I still can't avoid the actual writing to disk bottleneck (even if I can minimize it). — Zeruno, Commented Mar 23, 2020 at 14:16

jjanes · Accepted Answer · 2020-03-23 15:07:38Z

when reading the query plan, will it still differentiate between disk and buffers hits?

It never distinguished between them in the first place. It distinguishes between "hit" and "read", but the "read" can't tell which are truly from disk and which are from OS/FS cache.

PostgreSQL has no idea you are running on a RAM disk, so will continue to report those as it always has.

If so, should I assume that both are equally expensive or should I assume that there is a difference between them?

This is a question that should be answered through your benchmarking. On some systems, memory can be read-ahead from main memory into the faster caches, making sequential reads still faster than random reads. If you care, you will have to benchmark it on your own system.

Reading data from RAM into shared_buffers is still surprisingly expensive due to things like lock management. So as a rough starting point, maybe seq_page_cost=0.1 and random_page_cost=0.15.

a RAM disk is not persistent, but if I want to export some results to persistent storage, are there some precautions I would need to take?

The risk would be that your system crashes before the export has finished. But what precaution can you take against that?

Collectives™ on Stack Overflow

PostgreSQL benchmarking over a RAMdisk?

2 Answers 2

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Related