fullCount performance #4801

ldap4life · 2018-03-09T19:48:01Z

my environment running ArangoDB

I'm using the latest ArangoDB of the respective release series:

Mode:

Cluster
Single-Server

Storage-Engine:

mmfiles
rocksdb

On this operating system:

this is an AQL-related issue:

[ ] I'm using graph features

I'm issuing AQL via:

web interface with this browser: running on this OS:
arangosh
this Driver: Java

Doing aql queries with a limit and the fullCount option perform very poorly in my testing. It turns out to be faster to do the query with full count off, and then a second query to get the full count manually.

jsteemann · 2018-03-12T08:50:43Z

Using fullCount will make a query run further even if the request number of documents have been reached. So it will naturally be slower than when omitting fullCount altogether.
Using fullCount may be slower or faster than when using a separate query just for counting. This depends on whether the fullCount query contains complex calculations that may be omitted in an extra counting query.
For example,

FOR doc IN collection
  LIMIT 0, 40
  RETURN doc

with fullCount will likely be slower than the same query with fullCount turned off and an extra couting query. The reason here is that counting the number of documents in a collection is fast and does not need to enumerate them individually.
Another example:

FOR doc IN collection
  LET subs = (FOR sub IN collection2 FILTER sub.attr == doc._key RETURN sub)
  LIMIT 0,40 
  RETURN { doc, subs }

In this example, the fullCount variant of the query will be much more expensive than the original query without fullCount and a separate counting query that can omit the subquery.

Can you provide an example of the query that is particularly slow with fullCount, plus the number of documents in the respective collections.

dothebart · 2018-03-12T10:08:42Z

We should explain this here too: https://docs.arangodb.com/3.3/AQL/ExecutionAndPerformance/QueryStatistics.html

jsteemann · 2018-03-12T10:09:41Z

@dothebart: go ahead!

ldap4life · 2018-03-12T13:30:36Z

my query is similar to

for log in logs sort log.date limit 0, 20 return log

there are about 500k docs. It runs in about 200ms with full count on, 2ms with full count off.

fwiw to get the count manually im adding this to the same query COLLECT WITH COUNT INTO length RETURN length

jsteemann · 2018-03-12T23:38:20Z

@ldap4life : ok, if there is a sorted index present on the date attribute in the logs collection, the non-fullCount query will execute almost instantly. It will only process 20 documents in index order and is done then.
The fullCount variant of the query will return 20 documents in index order, but it will iterate over the remaining 499980 documents in the index for counting them. So it will have to do a lot more work, especially in the RocksDB engine, where index iteration may (or may not) cause disk I/O.

If your original query does not contain any filters, the counting is probably most efficiently done by using just RETURN COUNT(logs). This will use an optimized code path that simply returns the number of documents in the collection without even looking at any document.

matcho · 2018-08-13T11:53:48Z

Could this be detected by optimizer ? For example, if a query has no FILTER statement, return collection size as fullCount instead of iterating over every document ?

dothebart added the 3 Documentation label Mar 12, 2018

Simran-B added the 3 Optimizer label Aug 13, 2018

vinaypyati added the 1 Analyzing label Sep 28, 2018

dothebart removed the 1 Analyzing label Sep 3, 2020

arangodb / arangodb

fullCount performance #4801

fullCount performance #4801

ldap4life commented Mar 9, 2018

jsteemann commented Mar 12, 2018

dothebart commented Mar 12, 2018

jsteemann commented Mar 12, 2018

ldap4life commented Mar 12, 2018

jsteemann commented Mar 12, 2018

matcho commented Aug 13, 2018

arangodb / arangodb

fullCount performance #4801

fullCount performance #4801

Comments

ldap4life commented Mar 9, 2018

my environment running ArangoDB

this is an AQL-related issue:

jsteemann commented Mar 12, 2018

dothebart commented Mar 12, 2018

jsteemann commented Mar 12, 2018

ldap4life commented Mar 12, 2018

jsteemann commented Mar 12, 2018

matcho commented Aug 13, 2018