Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQLite rowcount is corrupted when combining UPDATE RETURNING w/ table that is dropped and recreated #93421

Closed
zzzeek opened this issue Jun 1, 2022 · 22 comments · Fixed by #93526
Assignees
Labels
3.11 3.12 expert-sqlite3 type-bug

Comments

@zzzeek
Copy link

@zzzeek zzzeek commented Jun 1, 2022

version info:

$ python
Python 3.10.0 (default, Nov  5 2021, 17:23:47) [GCC 11.2.1 20210728 (Red Hat 11.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> sqlite3.sqlite_version
'3.36.0'

we have a test suite that creates a table, runs some SQL, then drops it. if multiple tests run that each perform this task, if the same SQLite connection is used, rowcount starts returning "0". Seems to also require RETURNING to be used. Full demonstration:

import os
import sqlite3



def go():
    """function creates a new table, runs INSERT/UPDATE, drops table,
    commits connection.

    """

    # create table
    cursor = conn.cursor()
    cursor.execute(
        """CREATE TABLE some_table (
        id INTEGER NOT NULL,
        value VARCHAR(40) NOT NULL,
        PRIMARY KEY (id)
    )
    """
    )
    cursor.close()
    conn.commit()

    # run operation
    cursor = conn.cursor()
    cursor.execute(
        "INSERT INTO some_table (id, value) VALUES (1, 'v1')"
    )
    ident = 1

    cursor.execute(
        "UPDATE some_table SET value='v2' "
        "WHERE id=? RETURNING id",
        (ident,),
    )
    new_ident = cursor.fetchone()[0]
    assert ident == new_ident
    assert cursor.rowcount == 1, cursor.rowcount
    cursor.close()

    # drop table
    cursor = conn.cursor()
    cursor.execute("DROP TABLE some_table")
    cursor.close()

    conn.commit()

if os.path.exists("file.db"):
    os.unlink("file.db")

# passes
conn = sqlite3.connect("file.db")
go()

# run again w/ new connection (same DB), passes
conn = sqlite3.connect("file.db")
go()

print("FAILURE NOW OCCURS")
# run again w/ same connection, fails
go()

on the third run, where we ran the "test" on the same connection twice, it fails:

$  python test3.py 
FAILURE NOW OCCURS
Traceback (most recent call last):
  File "/home/classic/dev/sqlalchemy/test3.py", line 62, in <module>
    go()
  File "/home/classic/dev/sqlalchemy/test3.py", line 39, in go
    assert cursor.rowcount == 1, cursor.rowcount
AssertionError: 0

it would appear there's some internal caching of table state that needs to be cleared.

@zzzeek zzzeek added the type-bug label Jun 1, 2022
@AA-Turner
Copy link
Member

@AA-Turner AA-Turner commented Jun 1, 2022

cc: @erlend-aasland

A

@erlend-aasland erlend-aasland self-assigned this Jun 2, 2022
@erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Jun 2, 2022

Thanks for the report!

Internally, we simply use the function sqlite3_changes to set rowcount1, implying that we do not calculate rowcount ourself; the value comes straight from the underlying SQLite library. If sqlite3_changes returns zero, rowcount will be zero.

I've verified that this only happens with UPDATE ... RETURNING statements, which also implies that the behaviour we are observing comes from the underlying SQLite library. I suggest to report this on the SQLite forum (they do not have a bug tracker).

Note: I did some tests where I used sqlite3_total_changes() in addition to sqlite3_changes(), and they revealed that the number of total changes on the database connection did not increase whith UPDATE ... RETURNING statements, only with normal UPDATE statements.

Closing this as not-a-bug.

See also:

Footnotes

  1. Assuming that you've supplied a valid statement, and that no database errors were raised by SQLite. In case of errors, rowcount will normally be set to -1.

@erlend-aasland erlend-aasland closed this as not planned Jun 2, 2022
@erlend-aasland erlend-aasland removed type-bug 3.11 3.10 3.12 labels Jun 2, 2022
@erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Jun 2, 2022

it would appear there's some internal caching of table state that needs to be cleared.

For the record; we explicitly clear rowcount at the start of every execute and executemany.

@zzzeek
Copy link
Author

@zzzeek zzzeek commented Jun 2, 2022

post is up at https://sqlite.org/forum/forumpost/33ea0f369f

@zzzeek
Copy link
Author

@zzzeek zzzeek commented Jun 4, 2022

just to keep track here, the equivalent commands in the sqlite3 shell should be (please review to make sure this is accurate):

CREATE TABLE some_table (
        id INTEGER NOT NULL,
        value VARCHAR(40) NOT NULL,
        PRIMARY KEY (id)
);
BEGIN;
INSERT INTO some_table (value) VALUES ('v1');
SELECT last_insert_rowid();

UPDATE some_table SET value='v2' WHERE id=1 RETURNING id;
SELECT changes();
DROP TABLE some_table;
COMMIT;

these commands can be run repeatedly in the sqlite3 shell and the results are correct each time.

@zzzeek
Copy link
Author

@zzzeek zzzeek commented Jun 4, 2022

hi there -

I'd like to reopen this. per discussion at https://sqlite.org/forum/forumpost/397abefa91 this is still a bug in pysqlite. The incorrect cursor.rowcount value returned by sqlite does not match running the query manually on the failed run, indicating SQLite is in fact returning the correct value, pysqlite is somehow getting the wrong value. Can we please look into this? new demonstration below compares cursor.rowcount to "select changes()". the discrepancy happens on the failure run. if this is in fact a C api issue they would likely need more detail.

import os
import sqlite3


def go():
    """function creates a new table, runs INSERT/UPDATE, drops table,
    commits connection.

    """

    # create table
    cursor = conn.cursor()
    cursor.execute(
        """CREATE TABLE some_table (
        id INTEGER NOT NULL,
        value VARCHAR(40) NOT NULL,
        PRIMARY KEY (id)
    )
    """
    )
    cursor.close()
    conn.commit()

    # run operation
    cursor = conn.cursor()
    cursor.execute("INSERT INTO some_table (value) VALUES ('v1')")
    ident = cursor.lastrowid

    cursor.execute(
        "UPDATE some_table SET value='v2' " "WHERE id=? RETURNING id",
        (ident,),
    )
    new_ident = cursor.fetchone()[0]
    assert ident == new_ident

    cursor_rowcount = cursor.rowcount
    manual_rowcount = cursor.execute("select changes()").fetchone()[0]

    assert cursor_rowcount == manual_rowcount, (
        f"Cursor rowcount {cursor_rowcount} does not match "
        f"SELECTed rowcount {manual_rowcount}"
    )

    print(
        f"Cursor rowcount {cursor_rowcount} matches SELECTed rowcount "
        f"{manual_rowcount}"
    )

    cursor.close()

    # drop table
    cursor = conn.cursor()
    cursor.execute("DROP TABLE some_table")
    cursor.close()

    conn.commit()


if os.path.exists("file.db"):
    os.unlink("file.db")

print("------------------- pass one -----------------------")
# passes
conn = sqlite3.connect("file.db")
go()

print("------------------- pass two -----------------------")
# run again w/ new connection (same DB), passes
conn = sqlite3.connect("file.db")
go()


print("------------------- pass three -----------------------")
# run again w/ same connection, fails
go()

output:

------------------- pass one -----------------------
Cursor rowcount 1 matches SELECTed rowcount 1
------------------- pass two -----------------------
Cursor rowcount 1 matches SELECTed rowcount 1
------------------- pass three -----------------------
Traceback (most recent call last):
  File "/home/classic/dev/sqlalchemy/test3.py", line 75, in <module>
    go()
  File "/home/classic/dev/sqlalchemy/test3.py", line 39, in go
    assert cursor_rowcount == manual_rowcount, (
AssertionError: Cursor rowcount 0 does not match SELECTed rowcount 1

@AA-Turner AA-Turner reopened this Jun 4, 2022
@AA-Turner
Copy link
Member

@AA-Turner AA-Turner commented Jun 4, 2022

@erlend-aasland - I reproduced Mike's script locally and reopened for more discussion as it seems there might be something sqlite3 could do.

A

@erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Jun 5, 2022

Thanks for reopening and investigating this. I can reproduce it, and I've got a fix for it. I want to narrow down a C reproducer before I open a PR.

FTR, we are calling the sqlite3_changes method and presenting that result; this has to do with when we call it.

And, per the discussion over at the SQLite forum, yes there are some inconveniences in the sqlite3 API, but we cannot just change them, because people depend on the inconveniences. The SQLite C API also have some inconveniences1, but I would not call it names because of that.

Footnotes

  1. sqlite3_create_collation is a well known example: https://www.sqlite.org/c3ref/create_collation.html:

    The xDestroy callback is not called if the sqlite3_create_collation_v2() function fails. Applications that invoke sqlite3_create_collation_v2() with a non-NULL xDestroy argument should check the return code and dispose of the application data pointer themselves rather than expecting SQLite to deal with it for them. This is different from every other SQLite interface. The inconsistency is unfortunate but cannot be changed without breaking backwards compatibility.

@erlend-aasland erlend-aasland added 3.11 3.12 type-bug labels Jun 5, 2022
@erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Jun 5, 2022

Here's a SQLite C API repro: https://gist.github.com/erlend-aasland/5e48b920d882e04f911fe7871fd5ccd9

As you can see, sqlite3_changes clearly return 0 when reusing the cached statement. Bug, feature? I don't know, but it is possible to work around it to make the DB API happy :)

EDIT: sqlite3_changes can only be trusted when a statement has run to completion, according to the SQLite C API docs.

@erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Jun 5, 2022

If we change the implementation of .rowcount to be a tp_getset slot, we can lazily call sqlite3_count() at attribute read time. This will solve @zzzeek's problem, but it will break Cursor.executemany() followed by a .rowcount read.

An alternative could be to update.rowcount for each step (through the fetch-one-row helper).

Another solution can be to save the total change count (sqlite3_total_changes) when we enter _pysqlite_query_execute, and then at .rowcount read (through a tp_getset slot), call sqlite3_total_changes again and return the difference. That should work with both .execute() and .executemany.

I don't have a strong preference; but I feel the latter solution will result in the most sane behaviour for the .rowcount attribute.

Slightly related: we can fix .rowcount for .executescript if we choose the sqlite3_total_changes approach.

@serhiy-storchaka, do you have an opinion here?

@erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Jun 5, 2022

Fun fact: changing Mike's reproducer to insert and update two or more values results in interesting values returned by sqlite3_changes.

Two rows, same update query
$ ./test
Compiled with SQLite 3.26.0
Executing with SQLite 3.36.0
executing 'CREATE TABLE some_table ( id INTEGER NOT NULL, value VARCHAR(40) NOT NULL, PRIMARY KEY (id))' 
- step, changes=0
executing 'BEGIN' 
- step, changes=0
executing 'INSERT INTO some_table (id, value) VALUES (1, 'v1'), (2, 'v1')' 
- step, changes=2
executing 'UPDATE some_table SET value='v2' WHERE id=1 RETURNING id' 
- step, changes=2
- step, changes=1
executing 'DROP TABLE some_table' 
- step, changes=1
executing 'COMMIT' 
- step, changes=1
executing 'CREATE TABLE some_table ( id INTEGER NOT NULL, value VARCHAR(40) NOT NULL, PRIMARY KEY (id))' (cached)
- step, changes=1
executing 'BEGIN' (cached)
- step, changes=1
executing 'INSERT INTO some_table (id, value) VALUES (1, 'v1'), (2, 'v1')' (cached)
- step, changes=2
executing 'UPDATE some_table SET value='v2' WHERE id=1 RETURNING id' (cached)
- step, changes=0
- step, changes=1
executing 'DROP TABLE some_table' (cached)
- step, changes=1
executing 'COMMIT' (cached)
- step, changes=1
Three rows, loosened update query
$ ./test
Compiled with SQLite 3.26.0
Executing with SQLite 3.36.0
executing 'CREATE TABLE some_table ( id INTEGER NOT NULL, value VARCHAR(40) NOT NULL, PRIMARY KEY (id))' 
- step, changes=0
executing 'BEGIN' 
- step, changes=0
executing 'INSERT INTO some_table (id, value) VALUES (1, 'v1'), (2, 'v1'), (3, 'v1')' 
- step, changes=3
executing 'UPDATE some_table SET value='v2' WHERE id<3 RETURNING id' 
- step, changes=3
- step, changes=3
- step, changes=2
executing 'DROP TABLE some_table' 
- step, changes=2
executing 'COMMIT' 
- step, changes=2
executing 'CREATE TABLE some_table ( id INTEGER NOT NULL, value VARCHAR(40) NOT NULL, PRIMARY KEY (id))' (cached)
- step, changes=2
executing 'BEGIN' (cached)
- step, changes=2
executing 'INSERT INTO some_table (id, value) VALUES (1, 'v1'), (2, 'v1'), (3, 'v1')' (cached)
- step, changes=3
executing 'UPDATE some_table SET value='v2' WHERE id<3 RETURNING id' (cached)
- step, changes=0
- step, changes=0
- step, changes=2
executing 'DROP TABLE some_table' (cached)
- step, changes=2
executing 'COMMIT' (cached)
- step, changes=2

I guess that rules out the update-rowcount-via-the-fetch-one-handler approach, so we are stuck with the total-changes solution, AFAICS.

@erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Jun 6, 2022

An alternative could be to update.rowcount for each step (through the fetch-one-row helper).

I'll create a competing PR for this approach.

UPDATE: I created #93526 which updates .rowcount after SQLITE_DONE. I'm not sure if that is the best approach.

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Jun 6, 2022

I have no opinion and I think you are more expedient in this.

Only a question. If executemany() affects the same row multiple times, is the same row counted multiple times? What sqlite3_count() and sqlite3_total_changes() return and what is the expected behavior for .rowcount?

@erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Jun 6, 2022

Only a question. If executemany() affects the same row multiple times, is the same row counted multiple times?

According to the DB API (as I understand it), .rowcount should reflect the total number of rows affected by the executemany call. executemany executes the same statement multiple times, so if a row is changed multiple times, it will be counted multiple times, no matter which SQLite API we use.

So either we use sqlite3_total_changes before/after the executemany and use the delta as the row count, or we use sqlite3_changes and accumulate the changes. I'm fine with either approach.

What sqlite3_count() and sqlite3_total_changes() return and what is the expected behavior for .rowcount?

  • sqlite3_changes(): "return the number of rows modified, inserted or deleted by the most recently completed INSERT, UPDATE or DELETE statement on the database connection"1
  • sqlite3_total_changes(): "return the total number of rows inserted, modified or deleted by all INSERT, UPDATE or DELETE statements completed since the database connection was opened, including those executed as part of trigger programs"1
  • .rowcount: "This read-only attribute specifies the number of rows that the last .execute*() produced (for DQL statements like SELECT) or affected (for DML statements like UPDATE or INSERT).
    The attribute is -1 in case no .execute*() has been performed on the cursor or the rowcount of the last operation is cannot be determined by the interface."
    2

Footnotes

  1. direct quote from the SQLite C API docs 2

  2. direct quote from PEP 249

@serhiy-storchaka
Copy link
Member

@serhiy-storchaka serhiy-storchaka commented Jun 6, 2022

Ah, so sqlite3_total_changes() includes the number of rows affected by trigger programs, but sqlite3_changes() does not? Then the result of two approaches should be different.

@erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Jun 6, 2022

Ah, so sqlite3_total_changes() includes the number of rows affected by trigger programs, but sqlite3_changes() does not? Then the result of two approaches should be different.

Good observation. So until recently, I was leaning towards the total-changes solution (gh-93520). But I just found out that there are other differences.

For example, sqlite3_changes() will return 1 for a CREATE TABLE query. If you drop the table and create it again, sqlite3_changes() will return 1 when you create the table again. Reusing statements does not matter; the change is counted by sqlite3_changes(). But, it is not for sqlite3_total_changes(). Reusing statements does not matter here either; if we use a total changes delta, a change is not recorded the second time we create the table. This is probably because of some caching in SQLite, and it makes me lean towards sqlite3_changes (gh-93526).

Let me explain that test again; it came out all wrong:

  1. CREATE TABLE t(t) => no changes returned by either API
  2. INSERT INTO t ... => changes returned by both APIs
  3. DROP TABLE t => changes returned1 by sqlite3_changes, but not by sqlite3_total_changes
  4. CREATE TABLE t(t) => changes returned1 by sqlite3_changes, but not by sqlite3_total_changes
  5. INSERT INTO t ... => changes returned by both APIs
  6. DROP TABLE t => changes returned1 by sqlite3_changes, but not by sqlite3_total_changes

Footnotes

  1. that is, they "bleed" over from the previous DML query 2 3

@erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Jun 6, 2022

So until recently, I was leaning towards the total-changes solution (#93520). But I just found out that there are other differences.

For the record, here's the test code I've been using; it's a variant of the SQLite C API reproducer for this issue:

https://gist.github.com/erlend-aasland/98b41dab6cc51c1082c70eef3659d8d1

@erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Jun 6, 2022

Regarding gh-93520, here's an excerpt from the SQLite docs for sqlite3_total_changes (and sqlite3_total_changes64:

These functions return the total number of rows inserted, modified or deleted by all INSERT, UPDATE or DELETE statements completed since the database connection was opened, including those executed as part of trigger programs. The two functions are identical except for the type of the return value and that if the number of rows modified by the connection exceeds the maximum value supported by type "int", then the return value of sqlite3_total_changes() is undefined.

The last part is troublesome, and makes me lean towards gh-93526, though 2^31 is a pretty large number.

What speaks for gh-93520, is that it will make it possible to solve #79579 by using sqlite3_stmt_readonly; that will be harder with gh-93526.

cc. @animalize

miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jun 8, 2022
…pythonGH-93526)

(cherry picked from commit 875de61)

Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com>
erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Jun 8, 2022
@erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Jun 8, 2022

Fixed by:

@erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Jun 8, 2022

Thanks for the report, and for pushing this through, Mike. Thanks for reviewing, Ma Lin!

miss-islington added a commit that referenced this issue Jun 8, 2022
…3526)

(cherry picked from commit 875de61)

Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com>
@animalize
Copy link
Contributor

@animalize animalize commented Jun 8, 2022

Just thought, if the next row is cached (like pre 3df0fc8), the .rowcount will be more correct?

For example, in these conditions:

  • use UPDATE ... RETURNING
  • use executemany()
  • the rows are not fetched

.rowcount will be incorrectly set to 0.

edit: There may not be too many code with such condition combination, and the changes are relatively large, so it can be fixed in 3.12.

@erlend-aasland
Copy link
Contributor

@erlend-aasland erlend-aasland commented Jun 8, 2022

Just thought, if the next row is cached (like pre 3df0fc8), the .rowcount will be more correct?

We'd still only update .rowcount after SQLITE_DONE, so when the statement is stepped through, you'll have the correct row count, caching or not.

For example, in these conditions: [...]

That is a separate issue (and it has always been). It is not very large or complex fix; we just need to step through a statement for each loop iteration in _pysqlite_query_execute if multiple is true, kinda like we do in executescript. It is a bug, though, so it would be quite ok to backport it through 3.10.

However, as you point out, such code is probably rare; it does not make sense with UPDATE ... RETURNING in executemany.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment