Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArangoSearch SORT with LIMIT (offset/size) #9975

Open
aveiros opened this issue Sep 11, 2019 · 3 comments
Open

ArangoSearch SORT with LIMIT (offset/size) #9975

aveiros opened this issue Sep 11, 2019 · 3 comments

Comments

@aveiros
Copy link

@aveiros aveiros commented Sep 11, 2019

My Environment

  • ArangoDB Version: 3.5.0
  • Storage Engine: RocksDB
  • Deployment Mode: Single Server
  • Deployment Strategy: ArangoDB Starter in Docker
  • Infrastructure: own
  • Operating System: MacOS 10.13.4
  • Total RAM in your machine: 32Gb
  • Disks in use: SSD
  • Used Package: Docker - official Docker library

Description: SORT and LIMIT on a View returning incorrect results

  1. Create a docs collection

  2. Create a docs_view view

{
  "links": {
    "docs": {
      "analyzers": [
        "identity", "text_en"
      ],
      "fields": {},
      "includeAllFields": true,
      "storeValues": "id",
      "trackListPositions": false
    }
  }
}
  1. Create the following 4 documents as follow
{ "_key": "doc1", "keywords": "development business" }
{ "_key": "doc2", "keywords": "development business" }
{ "_key": "doc3", "keywords": "development business" }
{ "_key": "doc4", "keywords": "development business" }
  1. Perform the following queries
// page 1
FOR d IN docs_view SEARCH ANALYZER(d.keywords IN TOKENS('development', 'text_en'), 'text_en') SORT BM25(d) DESC LIMIT 0, 2 RETURN d
// page 2
FOR d IN docs_view SEARCH ANALYZER(d.keywords IN TOKENS('development', 'text_en'), 'text_en') SORT BM25(d) DESC LIMIT 2, 2 RETURN d

both queries return doc3 and doc4

@Dronplane
Copy link
Contributor

@Dronplane Dronplane commented Sep 11, 2019

Hi, Aquilino Viveiros!

Could you please post output of the following query?
FOR d IN docs_view SEARCH ANALYZER(d.keywords IN TOKENS('development', 'text_en'), 'text_en') SORT BM25(d) DESC RETURN {d, s: BM25(d)}

@aveiros
Copy link
Author

@aveiros aveiros commented Sep 11, 2019

@Dronplane

[
  {
    "d": {
      "_key": "doc1",
      "_id": "docs/doc1",
      "_rev": "_ZP9wplu--_",
      "keywords": "development business"
    },
    "s": 0.1432415097951889
  },
  {
    "d": {
      "_key": "doc2",
      "_id": "docs/doc2",
      "_rev": "_ZP9wyne--_",
      "keywords": "development business"
    },
    "s": 0.1432415097951889
  },
  {
    "d": {
      "_key": "doc3",
      "_id": "docs/doc3",
      "_rev": "_ZP9w5gy--_",
      "keywords": "development business"
    },
    "s": 0.1432415097951889
  },
  {
    "d": {
      "_key": "doc4",
      "_id": "docs/doc4",
      "_rev": "_ZP9xAqO--_",
      "keywords": "development business"
    },
    "s": 0.1432415097951889
  }
]
@Dronplane
Copy link
Contributor

@Dronplane Dronplane commented Sep 11, 2019

The reason is in equal values for scorers and not having default "tie breaking" in sorting.
Currently to workaround the issue additional sorting by _key value could be used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants
You can’t perform that action at this time.