How arangodb serves the query request if data not present in main memory #9531
Comments
Hi,
For the rocksdb storage engine we explain whats actually happening in this blog article: |
Hi @dothebart , Is there any specific reason why the Disk Reads on ArangoDB will sudden spike up, is it due to data indexing from main memory to secondary storage? Attached the CPU,Disk Reads and memory utilization of our read only arangodb Disk reads will be almost constant through out the day, but suddenly it will spike drastically and becomes normal after some time could you please explain this behavior |
Do you also see high(er) write activity in that period? If yes, the reason could be the rocksdb compaction, which has to re-oranize your data. |
its read only machine which we create daily by taking AMI backup of master db machine. so they no disk writes on this machine. we observe this behavior daily, due to high disk reads(on read only machine) all queries getting queued up and shown as slow queries and we have restart the machine. we have increased the IOPS from 1.5k to 3k though we are observing this behavior. if it is due to compaction? is there any way to schedule it in off-load hours |
well, if the master has had writes, it derives that state of its data structures, right? |
if it is due to compaction? is there any way to schedule it in off-load hours |
@dothebart can you update on this |
I'm closing this since ArangoDB 3.4 is EOLed meanwhile. The best way to observe what process is doing in such a situation is, to run gcore (please note that gdb 9 is required with recent arangods at least) and Please note that ArangoDB 3.7 is available. |
ArangoDB Version: 3.4.4
Storage Engine : RocksDB
Deployment Mode: Single node
Deployment Strategy : Manual Start
Infrastructure: AWS
Operating System: Ubuntu 16.04
Total RAM in your machine: 32Gb.
Hi @jsteemann @graetzer @lservini
As per my understanding from this FAQ:
I have few queries that needs to be addressed:
ArangoDB stores the working set in main memory(the set of pages that are frequently accessed) It’s left to the operating system to determine the working set and to transfer pages between main memory and secondary storage. The data that are currently not needed are kept only on secondary storage.
Our total data size is 50GB and RAM size is 32GB
1)For a particular query result if data is not present in working set(main memory) will it call the secondary storage to fetch the data and return the result set?
2)if so will the disk reads on machine will spike drastically and CPU usage will increase at that time?
3)After serving the query request will the disk reads come down ? or will it try to swap the working set with new working set from secondary storage?
4)if its trying to swap the working set how much time will it take ( in my case whole data is 50GB and RAM 332GB) and during this time disk reads will be high and constant till the swap happens?
5)Is there any way to schedule the swap interval?
The text was updated successfully, but these errors were encountered: