SQLite rowcount is corrupted when combining UPDATE RETURNING w/ table that is dropped and recreated #93421

zzzeek · 2022-06-01T22:50:28Z

version info:

$ python
Python 3.10.0 (default, Nov  5 2021, 17:23:47) [GCC 11.2.1 20210728 (Red Hat 11.2.1-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> sqlite3.sqlite_version
'3.36.0'

we have a test suite that creates a table, runs some SQL, then drops it. if multiple tests run that each perform this task, if the same SQLite connection is used, rowcount starts returning "0". Seems to also require RETURNING to be used. Full demonstration:

import os
import sqlite3



def go():
    """function creates a new table, runs INSERT/UPDATE, drops table,
    commits connection.

    """

    # create table
    cursor = conn.cursor()
    cursor.execute(
        """CREATE TABLE some_table (
        id INTEGER NOT NULL,
        value VARCHAR(40) NOT NULL,
        PRIMARY KEY (id)
    )
    """
    )
    cursor.close()
    conn.commit()

    # run operation
    cursor = conn.cursor()
    cursor.execute(
        "INSERT INTO some_table (id, value) VALUES (1, 'v1')"
    )
    ident = 1

    cursor.execute(
        "UPDATE some_table SET value='v2' "
        "WHERE id=? RETURNING id",
        (ident,),
    )
    new_ident = cursor.fetchone()[0]
    assert ident == new_ident
    assert cursor.rowcount == 1, cursor.rowcount
    cursor.close()

    # drop table
    cursor = conn.cursor()
    cursor.execute("DROP TABLE some_table")
    cursor.close()

    conn.commit()

if os.path.exists("file.db"):
    os.unlink("file.db")

# passes
conn = sqlite3.connect("file.db")
go()

# run again w/ new connection (same DB), passes
conn = sqlite3.connect("file.db")
go()

print("FAILURE NOW OCCURS")
# run again w/ same connection, fails
go()

on the third run, where we ran the "test" on the same connection twice, it fails:

$  python test3.py 
FAILURE NOW OCCURS
Traceback (most recent call last):
  File "/home/classic/dev/sqlalchemy/test3.py", line 62, in <module>
    go()
  File "/home/classic/dev/sqlalchemy/test3.py", line 39, in go
    assert cursor.rowcount == 1, cursor.rowcount
AssertionError: 0

it would appear there's some internal caching of table state that needs to be cleared.

The text was updated successfully, but these errors were encountered:

AA-Turner · 2022-06-01T22:53:54Z

cc: @erlend-aasland

A

erlend-aasland · 2022-06-02T08:35:56Z

Thanks for the report!

Internally, we simply use the function sqlite3_changes to set rowcount¹, implying that we do not calculate rowcount ourself; the value comes straight from the underlying SQLite library. If sqlite3_changes returns zero, rowcount will be zero.

I've verified that this only happens with UPDATE ... RETURNING statements, which also implies that the behaviour we are observing comes from the underlying SQLite library. I suggest to report this on the SQLite forum (they do not have a bug tracker).

Note: I did some tests where I used sqlite3_total_changes() in addition to sqlite3_changes(), and they revealed that the number of total changes on the database connection did not increase whith UPDATE ... RETURNING statements, only with normal UPDATE statements.

Closing this as not-a-bug.

See also:

Assuming that you've supplied a valid statement, and that no database errors were raised by SQLite. In case of errors, rowcount will normally be set to -1. ↩

erlend-aasland · 2022-06-02T09:16:58Z

it would appear there's some internal caching of table state that needs to be cleared.

For the record; we explicitly clear rowcount at the start of every execute and executemany.

zzzeek · 2022-06-02T12:54:25Z

post is up at https://sqlite.org/forum/forumpost/33ea0f369f

zzzeek · 2022-06-04T15:45:14Z

just to keep track here, the equivalent commands in the sqlite3 shell should be (please review to make sure this is accurate):

CREATE TABLE some_table (
        id INTEGER NOT NULL,
        value VARCHAR(40) NOT NULL,
        PRIMARY KEY (id)
);
BEGIN;
INSERT INTO some_table (value) VALUES ('v1');
SELECT last_insert_rowid();

UPDATE some_table SET value='v2' WHERE id=1 RETURNING id;
SELECT changes();
DROP TABLE some_table;
COMMIT;

these commands can be run repeatedly in the sqlite3 shell and the results are correct each time.

zzzeek · 2022-06-04T21:21:59Z

hi there -

I'd like to reopen this. per discussion at https://sqlite.org/forum/forumpost/397abefa91 this is still a bug in pysqlite. The incorrect cursor.rowcount value returned by sqlite does not match running the query manually on the failed run, indicating SQLite is in fact returning the correct value, pysqlite is somehow getting the wrong value. Can we please look into this? new demonstration below compares cursor.rowcount to "select changes()". the discrepancy happens on the failure run. if this is in fact a C api issue they would likely need more detail.

import os
import sqlite3


def go():
    """function creates a new table, runs INSERT/UPDATE, drops table,
    commits connection.

    """

    # create table
    cursor = conn.cursor()
    cursor.execute(
        """CREATE TABLE some_table (
        id INTEGER NOT NULL,
        value VARCHAR(40) NOT NULL,
        PRIMARY KEY (id)
    )
    """
    )
    cursor.close()
    conn.commit()

    # run operation
    cursor = conn.cursor()
    cursor.execute("INSERT INTO some_table (value) VALUES ('v1')")
    ident = cursor.lastrowid

    cursor.execute(
        "UPDATE some_table SET value='v2' " "WHERE id=? RETURNING id",
        (ident,),
    )
    new_ident = cursor.fetchone()[0]
    assert ident == new_ident

    cursor_rowcount = cursor.rowcount
    manual_rowcount = cursor.execute("select changes()").fetchone()[0]

    assert cursor_rowcount == manual_rowcount, (
        f"Cursor rowcount {cursor_rowcount} does not match "
        f"SELECTed rowcount {manual_rowcount}"
    )

    print(
        f"Cursor rowcount {cursor_rowcount} matches SELECTed rowcount "
        f"{manual_rowcount}"
    )

    cursor.close()

    # drop table
    cursor = conn.cursor()
    cursor.execute("DROP TABLE some_table")
    cursor.close()

    conn.commit()


if os.path.exists("file.db"):
    os.unlink("file.db")

print("------------------- pass one -----------------------")
# passes
conn = sqlite3.connect("file.db")
go()

print("------------------- pass two -----------------------")
# run again w/ new connection (same DB), passes
conn = sqlite3.connect("file.db")
go()


print("------------------- pass three -----------------------")
# run again w/ same connection, fails
go()

output:

------------------- pass one -----------------------
Cursor rowcount 1 matches SELECTed rowcount 1
------------------- pass two -----------------------
Cursor rowcount 1 matches SELECTed rowcount 1
------------------- pass three -----------------------
Traceback (most recent call last):
  File "/home/classic/dev/sqlalchemy/test3.py", line 75, in <module>
    go()
  File "/home/classic/dev/sqlalchemy/test3.py", line 39, in go
    assert cursor_rowcount == manual_rowcount, (
AssertionError: Cursor rowcount 0 does not match SELECTed rowcount 1

AA-Turner · 2022-06-04T21:25:41Z

@erlend-aasland - I reproduced Mike's script locally and reopened for more discussion as it seems there might be something sqlite3 could do.

A

erlend-aasland · 2022-06-05T08:46:27Z

Thanks for reopening and investigating this. I can reproduce it, and I've got a fix for it. I want to narrow down a C reproducer before I open a PR.

FTR, we are calling the sqlite3_changes method and presenting that result; this has to do with when we call it.

And, per the discussion over at the SQLite forum, yes there are some inconveniences in the sqlite3 API, but we cannot just change them, because people depend on the inconveniences. The SQLite C API also have some inconveniences¹, but I would not call it names because of that.

sqlite3_create_collation is a well known example: https://www.sqlite.org/c3ref/create_collation.html:

The xDestroy callback is not called if the sqlite3_create_collation_v2() function fails. Applications that invoke sqlite3_create_collation_v2() with a non-NULL xDestroy argument should check the return code and dispose of the application data pointer themselves rather than expecting SQLite to deal with it for them. This is different from every other SQLite interface. The inconsistency is unfortunate but cannot be changed without breaking backwards compatibility.

↩

erlend-aasland · 2022-06-05T13:29:22Z

Here's a SQLite C API repro: https://gist.github.com/erlend-aasland/5e48b920d882e04f911fe7871fd5ccd9

As you can see, sqlite3_changes clearly return 0 when reusing the cached statement. Bug, feature? I don't know, but it is possible to work around it to make the DB API happy :)

EDIT: sqlite3_changes can only be trusted when a statement has run to completion, according to the SQLite C API docs.

erlend-aasland · 2022-06-05T19:57:26Z

If we change the implementation of .rowcount to be a tp_getset slot, we can lazily call sqlite3_count() at attribute read time. This will solve @zzzeek's problem, but it will break Cursor.executemany() followed by a .rowcount read.

An alternative could be to update.rowcount for each step (through the fetch-one-row helper).

Another solution can be to save the total change count (sqlite3_total_changes) when we enter _pysqlite_query_execute, and then at .rowcount read (through a tp_getset slot), call sqlite3_total_changes again and return the difference. That should work with both .execute() and .executemany.

I don't have a strong preference; but I feel the latter solution will result in the most sane behaviour for the .rowcount attribute.

Slightly related: we can fix .rowcount for .executescript if we choose the sqlite3_total_changes approach.

@serhiy-storchaka, do you have an opinion here?

erlend-aasland · 2022-06-05T20:03:25Z

Fun fact: changing Mike's reproducer to insert and update two or more values results in interesting values returned by sqlite3_changes.

Two rows, same update query

$ ./test
Compiled with SQLite 3.26.0
Executing with SQLite 3.36.0
executing 'CREATE TABLE some_table ( id INTEGER NOT NULL, value VARCHAR(40) NOT NULL, PRIMARY KEY (id))' 
- step, changes=0
executing 'BEGIN' 
- step, changes=0
executing 'INSERT INTO some_table (id, value) VALUES (1, 'v1'), (2, 'v1')' 
- step, changes=2
executing 'UPDATE some_table SET value='v2' WHERE id=1 RETURNING id' 
- step, changes=2
- step, changes=1
executing 'DROP TABLE some_table' 
- step, changes=1
executing 'COMMIT' 
- step, changes=1
executing 'CREATE TABLE some_table ( id INTEGER NOT NULL, value VARCHAR(40) NOT NULL, PRIMARY KEY (id))' (cached)
- step, changes=1
executing 'BEGIN' (cached)
- step, changes=1
executing 'INSERT INTO some_table (id, value) VALUES (1, 'v1'), (2, 'v1')' (cached)
- step, changes=2
executing 'UPDATE some_table SET value='v2' WHERE id=1 RETURNING id' (cached)
- step, changes=0
- step, changes=1
executing 'DROP TABLE some_table' (cached)
- step, changes=1
executing 'COMMIT' (cached)
- step, changes=1

Three rows, loosened update query

$ ./test
Compiled with SQLite 3.26.0
Executing with SQLite 3.36.0
executing 'CREATE TABLE some_table ( id INTEGER NOT NULL, value VARCHAR(40) NOT NULL, PRIMARY KEY (id))' 
- step, changes=0
executing 'BEGIN' 
- step, changes=0
executing 'INSERT INTO some_table (id, value) VALUES (1, 'v1'), (2, 'v1'), (3, 'v1')' 
- step, changes=3
executing 'UPDATE some_table SET value='v2' WHERE id<3 RETURNING id' 
- step, changes=3
- step, changes=3
- step, changes=2
executing 'DROP TABLE some_table' 
- step, changes=2
executing 'COMMIT' 
- step, changes=2
executing 'CREATE TABLE some_table ( id INTEGER NOT NULL, value VARCHAR(40) NOT NULL, PRIMARY KEY (id))' (cached)
- step, changes=2
executing 'BEGIN' (cached)
- step, changes=2
executing 'INSERT INTO some_table (id, value) VALUES (1, 'v1'), (2, 'v1'), (3, 'v1')' (cached)
- step, changes=3
executing 'UPDATE some_table SET value='v2' WHERE id<3 RETURNING id' (cached)
- step, changes=0
- step, changes=0
- step, changes=2
executing 'DROP TABLE some_table' (cached)
- step, changes=2
executing 'COMMIT' (cached)
- step, changes=2

I guess that rules out the update-rowcount-via-the-fetch-one-handler approach, so we are stuck with the total-changes solution, AFAICS.

erlend-aasland · 2022-06-06T05:49:24Z

An alternative could be to update.rowcount for each step (through the fetch-one-row helper).

I'll create a competing PR for this approach.

UPDATE: I created #93526 which updates .rowcount after SQLITE_DONE. I'm not sure if that is the best approach.

serhiy-storchaka · 2022-06-06T08:20:30Z

I have no opinion and I think you are more expedient in this.

Only a question. If executemany() affects the same row multiple times, is the same row counted multiple times? What sqlite3_count() and sqlite3_total_changes() return and what is the expected behavior for .rowcount?

erlend-aasland · 2022-06-06T10:22:02Z

Only a question. If executemany() affects the same row multiple times, is the same row counted multiple times?

According to the DB API (as I understand it), .rowcount should reflect the total number of rows affected by the executemany call. executemany executes the same statement multiple times, so if a row is changed multiple times, it will be counted multiple times, no matter which SQLite API we use.

So either we use sqlite3_total_changes before/after the executemany and use the delta as the row count, or we use sqlite3_changes and accumulate the changes. I'm fine with either approach.

What sqlite3_count() and sqlite3_total_changes() return and what is the expected behavior for .rowcount?

sqlite3_changes(): "return the number of rows modified, inserted or deleted by the most recently completed INSERT, UPDATE or DELETE statement on the database connection"¹
sqlite3_total_changes(): "return the total number of rows inserted, modified or deleted by all INSERT, UPDATE or DELETE statements completed since the database connection was opened, including those executed as part of trigger programs"¹
.rowcount: "This read-only attribute specifies the number of rows that the last .execute*() produced (for DQL statements like SELECT) or affected (for DML statements like UPDATE or INSERT).
The attribute is -1 in case no .execute*() has been performed on the cursor or the rowcount of the last operation is cannot be determined by the interface."²

direct quote from the SQLite C API docs ↩ ↩²
direct quote from PEP 249 ↩

serhiy-storchaka · 2022-06-06T10:39:21Z

Ah, so sqlite3_total_changes() includes the number of rows affected by trigger programs, but sqlite3_changes() does not? Then the result of two approaches should be different.

erlend-aasland · 2022-06-06T11:08:31Z

Ah, so sqlite3_total_changes() includes the number of rows affected by trigger programs, but sqlite3_changes() does not? Then the result of two approaches should be different.

Good observation. So until recently, I was leaning towards the total-changes solution (gh-93520). But I just found out that there are other differences.

For example, sqlite3_changes() will return 1 for a CREATE TABLE query. If you drop the table and create it again, sqlite3_changes() will return 1 when you create the table again. Reusing statements does not matter; the change is counted by sqlite3_changes(). But, it is not for sqlite3_total_changes(). Reusing statements does not matter here either; if we use a total changes delta, a change is not recorded the second time we create the table. This is probably because of some caching in SQLite, and it makes me lean towards sqlite3_changes (gh-93526).

Let me explain that test again; it came out all wrong:

CREATE TABLE t(t) => no changes returned by either API
INSERT INTO t ... => changes returned by both APIs
DROP TABLE t => changes returned¹ by sqlite3_changes, but not by sqlite3_total_changes
CREATE TABLE t(t) => changes returned¹ by sqlite3_changes, but not by sqlite3_total_changes
INSERT INTO t ... => changes returned by both APIs
DROP TABLE t => changes returned¹ by sqlite3_changes, but not by sqlite3_total_changes

that is, they "bleed" over from the previous DML query ↩ ↩² ↩³

erlend-aasland · 2022-06-06T11:34:28Z

So until recently, I was leaning towards the total-changes solution (#93520). But I just found out that there are other differences.

For the record, here's the test code I've been using; it's a variant of the SQLite C API reproducer for this issue:

https://gist.github.com/erlend-aasland/98b41dab6cc51c1082c70eef3659d8d1

erlend-aasland · 2022-06-06T21:30:20Z

Regarding gh-93520, here's an excerpt from the SQLite docs for sqlite3_total_changes (and sqlite3_total_changes64:

These functions return the total number of rows inserted, modified or deleted by all INSERT, UPDATE or DELETE statements completed since the database connection was opened, including those executed as part of trigger programs. The two functions are identical except for the type of the return value and that if the number of rows modified by the connection exceeds the maximum value supported by type "int", then the return value of sqlite3_total_changes() is undefined.

The last part is troublesome, and makes me lean towards gh-93526, though 2^31 is a pretty large number.

What speaks for gh-93520, is that it will make it possible to solve #79579 by using sqlite3_stmt_readonly; that will be harder with gh-93526.

cc. @animalize

…pythonGH-93526) (cherry picked from commit 875de61) Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com>

…TE_DONE (pythonGH-93526) (cherry picked from commit 875de61)

erlend-aasland · 2022-06-08T11:01:28Z

Fixed by:

main: #93526
3.11: #93598
3.10: #93599

erlend-aasland · 2022-06-08T11:05:10Z

Thanks for the report, and for pushing this through, Mike. Thanks for reviewing, Ma Lin!

…3526) (cherry picked from commit 875de61) Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com>

GH-93526) (GH-93599) (cherry picked from commit 875de61)

animalize · 2022-06-08T13:59:04Z

Just thought, if the next row is cached (like pre 3df0fc8), the .rowcount will be more correct?

For example, in these conditions:

use UPDATE ... RETURNING
use executemany()
the rows are not fetched

.rowcount will be incorrectly set to 0.

edit: There may not be too many code with such condition combination, and the changes are relatively large, so it can be fixed in 3.12.

erlend-aasland · 2022-06-08T21:09:43Z

Just thought, if the next row is cached (like pre 3df0fc8), the .rowcount will be more correct?

We'd still only update .rowcount after SQLITE_DONE, so when the statement is stepped through, you'll have the correct row count, caching or not.

For example, in these conditions: [...]

That is a separate issue (and it has always been). It is not very large or complex fix; we just need to step through a statement for each loop iteration in _pysqlite_query_execute if multiple is true, kinda like we do in executescript. It is a bug, though, so it would be quite ok to backport it through 3.10.

However, as you point out, such code is probably rare; it does not make sense with UPDATE ... RETURNING in executemany.

zzzeek added the type-bug label Jun 1, 2022

AA-Turner added 3.11 3.10 3.12 expert-sqlite3 labels Jun 1, 2022

erlend-aasland self-assigned this Jun 2, 2022

erlend-aasland closed this as not planned Jun 2, 2022

erlend-aasland removed type-bug 3.11 3.10 3.12 labels Jun 2, 2022

AA-Turner reopened this Jun 4, 2022

erlend-aasland added 3.11 3.12 type-bug labels Jun 5, 2022

erlend-aasland linked a pull request Jun 5, 2022 that will close this issue

gh-93421: Fix sqlite3 cursor .rowcount for UPDATE ... RETURNING queries #93520

Closed

erlend-aasland mentioned this issue Jun 5, 2022

gh-93421: Fix sqlite3 cursor .rowcount for UPDATE ... RETURNING queries #93520

Closed

erlend-aasland mentioned this issue Jun 6, 2022

gh-93421: Update sqlite3 cursor.rowcount after SQLITE_DONE #93526

Merged

erlend-aasland mentioned this issue Jun 6, 2022

gh-79579: Update sqlite3.Cursor.rowcount for all data-modifying queries #93532

Closed

erlend-aasland closed this as completed in #93526 Jun 8, 2022

erlend-aasland added a commit that referenced this issue Jun 8, 2022

gh-93421: Update sqlite3 cursor.rowcount only after SQLITE_DONE (#93526)

875de61

erlend-aasland added a commit to erlend-aasland/cpython that referenced this issue Jun 8, 2022

[3.10] pythongh-93421: Update sqlite3 cursor.rowcount only after SQLI…

5df5447

…TE_DONE (pythonGH-93526) (cherry picked from commit 875de61)

miss-islington added a commit that referenced this issue Jun 8, 2022

gh-93421: Update sqlite3 cursor.rowcount only after SQLITE_DONE (GH-9…

68aae80

…3526) (cherry picked from commit 875de61) Co-authored-by: Erlend Egeberg Aasland <erlend.aasland@protonmail.com>

erlend-aasland added a commit that referenced this issue Jun 8, 2022

[3.10] gh-93421: Update sqlite3 cursor.rowcount only after SQLITE_DONE (

9cc0afc

GH-93526) (GH-93599) (cherry picked from commit 875de61)

SQLite rowcount is corrupted when combining UPDATE RETURNING w/ table that is dropped and recreated #93421

SQLite rowcount is corrupted when combining UPDATE RETURNING w/ table that is dropped and recreated #93421

zzzeek commented Jun 1, 2022 •

edited

AA-Turner commented Jun 1, 2022

erlend-aasland commented Jun 2, 2022 •

edited

erlend-aasland commented Jun 2, 2022 •

edited

zzzeek commented Jun 2, 2022

zzzeek commented Jun 4, 2022

zzzeek commented Jun 4, 2022

AA-Turner commented Jun 4, 2022

erlend-aasland commented Jun 5, 2022

erlend-aasland commented Jun 5, 2022 •

edited

erlend-aasland commented Jun 5, 2022

erlend-aasland commented Jun 5, 2022 •

edited

erlend-aasland commented Jun 6, 2022 •

edited

serhiy-storchaka commented Jun 6, 2022

erlend-aasland commented Jun 6, 2022 •

edited

serhiy-storchaka commented Jun 6, 2022

erlend-aasland commented Jun 6, 2022 •

edited

erlend-aasland commented Jun 6, 2022

erlend-aasland commented Jun 6, 2022 •

edited

erlend-aasland commented Jun 8, 2022

erlend-aasland commented Jun 8, 2022

animalize commented Jun 8, 2022 •

edited

erlend-aasland commented Jun 8, 2022 •

edited

SQLite rowcount is corrupted when combining UPDATE RETURNING w/ table that is dropped and recreated #93421

SQLite rowcount is corrupted when combining UPDATE RETURNING w/ table that is dropped and recreated #93421

Comments

zzzeek commented Jun 1, 2022 • edited

AA-Turner commented Jun 1, 2022

erlend-aasland commented Jun 2, 2022 • edited

Footnotes

erlend-aasland commented Jun 2, 2022 • edited

zzzeek commented Jun 2, 2022

zzzeek commented Jun 4, 2022

zzzeek commented Jun 4, 2022

AA-Turner commented Jun 4, 2022

erlend-aasland commented Jun 5, 2022

Footnotes

erlend-aasland commented Jun 5, 2022 • edited

erlend-aasland commented Jun 5, 2022

erlend-aasland commented Jun 5, 2022 • edited

erlend-aasland commented Jun 6, 2022 • edited

serhiy-storchaka commented Jun 6, 2022

erlend-aasland commented Jun 6, 2022 • edited

Footnotes

serhiy-storchaka commented Jun 6, 2022

erlend-aasland commented Jun 6, 2022 • edited

Footnotes

erlend-aasland commented Jun 6, 2022

erlend-aasland commented Jun 6, 2022 • edited

erlend-aasland commented Jun 8, 2022

erlend-aasland commented Jun 8, 2022

animalize commented Jun 8, 2022 • edited

erlend-aasland commented Jun 8, 2022 • edited

zzzeek commented Jun 1, 2022 •

edited

erlend-aasland commented Jun 2, 2022 •

edited

erlend-aasland commented Jun 2, 2022 •

edited

erlend-aasland commented Jun 5, 2022 •

edited

erlend-aasland commented Jun 5, 2022 •

edited

erlend-aasland commented Jun 6, 2022 •

edited

erlend-aasland commented Jun 6, 2022 •

edited

erlend-aasland commented Jun 6, 2022 •

edited

erlend-aasland commented Jun 6, 2022 •

edited

animalize commented Jun 8, 2022 •

edited

erlend-aasland commented Jun 8, 2022 •

edited