Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-39170: Sqlite3 row_factory for attribute access: sqlite3.NamedRow #17768

Open
wants to merge 10 commits into
base: master
from

Conversation

@jidn
Copy link

jidn commented Dec 31, 2019

Currently, sqlite3 returns rows by tuple or sqlite3.Row for dict-style, index access. I constantly find myself wanting attribute access like namedtuple for rows. I find attribute access cleaner
without the brackets and quoting field names. However, unlike previous discussions (https://bugs.python.org/issue13299), I don't want to use the namedtuple object. I appreciate the simple API and minimal memory consumption of sqlite3.Row and used it as my guide in creating sqlite3.NamedRow to allow access by index and attribute.

Why a new object instead of adding attribute access to the existing sqlite3.Row?
There is an existing member method keys and any table with the field "keys" would cause a hard to debug, easily avoidable, collision.

Features

  • Optimized in C, so it will be faster than any python implementation.
  • Access columns by attribute for all valid names and by index for all names.
  • Iterate over fields by name/value pairs.
  • Works with standard function len and operator contains.
  • Identical memory consumption to sqlite3.Row with two references: the data tuple and the cursor description.
  • Identical speed to sqlite3.Row if not faster. Timing usually has it slightly faster for index by name or attribute, but it is almost identical.

Examples

    >>> import sqlite3
    >>> c = sqlite3.Connection(":memory:").cursor()
    >>> c.row_factory = sqlite3.NamedRow
    >>> named_row = c.execute("SELECT 'A' AS letter, '.-' AS morse, 65 AS ord").fetchone()

    >>> len(named_row)
    3
    >>> 'letter' in named_row
    true
    >>> named_row == named_row
    true
    >>> hash(named_row)
    5512444875192833987

    >>> named_row[0]
    'A'
    >>> named_row[1:]
    ('.-', 65)

    >>> named_row["ord"]
    65

    >>> named_row.morse
    '.-'

    >>> dict(named_row)
    {'letter': 'A', 'morse': '.-', 'ord': 65}
    >>> tuple(named_row)
    (('letter', 'A'), ('morse', '.-'), ('ord', 65))

How sqlite3.NamedRow differs from sqlite3.Row

The class only has class dunder methods to allow any valid attribute name. When the column name would be an invalid attribute name, you have two options: either use the SQL AS in the select statement or index by name.

    >>> row = cursor.execute("SELECT count(*) FROM mytable").fetchone()
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      AttributeError: 'sqlite3.NamedRow' object has no attribute 'count'
    >>> tuple(row)
    (('count(*)', 104),)
    >>> row["count(*)"], row[0]
    (104, 104)
    >>> row = cursor.execute("SELECT  count(*) AS cnt FROM mytable").fetchone()
    (('count', 2))
    row.count
    104

To get the field names, use the iterator [x[0] for x in row] or do the same from the
cursor.description.

    titles = [x[0] for x in row]
    titles = [x[0] for x in cursor.description]
    titles = dict(row).keys()

Name and dict access are no longer case-insensitive. There are four reasons for this.

  1. Case-insensitive comparison only works well for ASCII characters. In a Unicode world, case-insensitive edge cases create unnecessary errors. Looking at several existing codebases, this feature of Row is almost never used and I believe is not needed in NamedRow.
  2. Case-insensitivity is not allowed for attribute access. This "feature" would treat attribute access differently from the rest of Python and "special cases aren't special enough to break the rules". Where row.name, row.Name, and row.NAME are all the same it gives off the faint code smell of something wrong. When case-insensitively is needed and the query SELECT can not be modified, sqlite3.Row is still there.
  3. Code is simpler and easier to maintain.
  4. It is faster.

Timing Results

NamedRow is faster than sqlite3.Row for index-by-name access. Speed is not the focus. I just want to show that this is not any slower than sqlite3.Row when dealing with thousands for results.
I have published a graph and the methodology of my testing. In the worst case scenario, it is just as fast as sqlite3.Row without any extra memory.
For more information, see the post at https://jidn.com/posts/2019/10/namedrow-better-python-sqlite3-row-factory/

https://bugs.python.org/issue39170

Clinton James added 9 commits Sep 8, 2019
+ Creates object in sqlite3
+ row_factory works
+ Get by index
+ Get by dict
    + Case insensitive
    + Underscore acceptable replacement for space or dash
+ AttributeError works
+ IndexError works
Test both the cursor descriptor and data.
Does the cursor.execute().fetchone() effect the general shape of the
data?  Add testing which only covers object creation and access.
+ Faster
+ Remove case insensitive
Clinton James
@jidn jidn requested review from berkerpeksag and python/windows-team as code owners Dec 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.