3

I have been considering the idea of moving to a RAMdisk for a while. I know its risks, but just wanted to do a little benchmark. I just had two questions: (a) when reading the query plan, will it still differentiate between disk and buffers hits? If so, should I assume that both are equally expensive or should I assume that there is a difference between them? (b) a RAM disk is not persistent, but if I want to export some results to persistent storage, are there some precautions I would need to take? Is it the same as usual e.g. COPY command?

4
  • You do not need a RAM disk to have the database in memory. Use pg_prewarm. What do you want to benchmark with a strange setup like that? Commented Mar 23, 2020 at 7:05
  • I am creating tables, and I want to minimize disk write cost: I want main-memory tables. Is this not a suitable application? I am not saying that I care about persistence of my data. I suppose PostgreSQL would not be the best for this goal, but I am restricted to using it.
    – Zeruno
    Commented Mar 23, 2020 at 13:08
  • What is the thing you want to measure with your benchmark? Commented Mar 23, 2020 at 13:53
  • How much faster creating table in memory is
    – Zeruno
    Commented Mar 23, 2020 at 13:59

2 Answers 2

5

I do not recommend using RAM disks in PostgreSQL for persistent storage. With careful tuning, you can get PostgreSQL not to use more disk I/O than what is required to make your data persistent.

I recommend doing this:

  • Have more RAM in your machine than the size of the database.

  • Define shared_buffers big enough to contain the database (on Linux, define memory hugepages to contain them).

  • Increase checkpoint_timeout and max_wal_size to get fewer checkpoints.

  • Set synchronous_commit = off to keep PostgreSQL from syncing WAL to disk on every commit.

  • If you are happy to lose all your data in the case of a crash, define your tables UNLOGGED. The data will survive a normal shutdown.

Anyway, to answer your questions:

(a) You should set seq_page_cost and random_page_cost way lower to tell PostgreSQL how fast your storage is.

(b) You could run backups with either pg_dump or pg_basebackup, they don't care what kind of storage you have got.

1
  • Thanks for your comment. My concern is simply that despite all the parameters you have mentioned, I still can't avoid the actual writing to disk bottleneck (even if I can minimize it).
    – Zeruno
    Commented Mar 23, 2020 at 14:16
2

when reading the query plan, will it still differentiate between disk and buffers hits?

It never distinguished between them in the first place. It distinguishes between "hit" and "read", but the "read" can't tell which are truly from disk and which are from OS/FS cache.

PostgreSQL has no idea you are running on a RAM disk, so will continue to report those as it always has.

If so, should I assume that both are equally expensive or should I assume that there is a difference between them?

This is a question that should be answered through your benchmarking. On some systems, memory can be read-ahead from main memory into the faster caches, making sequential reads still faster than random reads. If you care, you will have to benchmark it on your own system.

Reading data from RAM into shared_buffers is still surprisingly expensive due to things like lock management. So as a rough starting point, maybe seq_page_cost=0.1 and random_page_cost=0.15.

a RAM disk is not persistent, but if I want to export some results to persistent storage, are there some precautions I would need to take?

The risk would be that your system crashes before the export has finished. But what precaution can you take against that?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.