Skip to content

False negative from csv.Sniffer.has_header with only strings #102140

Open
@Midnighter

Description

@Midnighter

Bug report

On the following CSV content, the csv.Sniffer.has_header method returns False although it clearly has a header.

sample,fastq_1,fastq_2
A1-35-8,/mnt/scratch/sarek/data/A1-35-8/A1-35-8_R1.fastq.gz,/mnt/scratch/sarek/data/A1-35-8/A1-35-8_R2.fastq.gz
A2-102-5,/mnt/scratch/sarek/data/A2-102-5/A2-102-5_R1.fastq.gz,/mnt/scratch/sarek/data/A2-102-5/A2-102-5_R2.fastq.gz
A5-35-17,/mnt/scratch/sarek/data/A5-35-17/A5-35-17_R1.fastq.gz,/mnt/scratch/sarek/data/A5-35-17/A5-35-17_R2.fastq.gz
AD1-7a,/mnt/scratch/sarek/data/AD1-7a/AD1-7a_R1.fastq.gz,/mnt/scratch/sarek/data/AD1-7a/AD1-7a_R2.fastq.gz
AD1-83a,/mnt/scratch/sarek/data/AD1-83a/AD1-83a_R1.fastq.gz,/mnt/scratch/sarek/data/AD1-83a/AD1-83a_R2.fastq.gz
AD2-60a,/mnt/scratch/sarek/data/AD2-60a/AD2-60a_R1.fastq.gz,/mnt/scratch/sarek/data/AD2-60a/AD2-60a_R2.fastq.gz
Arg1366,/mnt/scratch/sarek/data/Arg1366/Arg1366_R1.fastq.gz,/mnt/scratch/sarek/data/Arg1366/Arg1366_R2.fastq.gz
Br795,/mnt/scratch/sarek/data/Br795/Br795_R1.fastq.gz,/mnt/scratch/sarek/data/Br795/Br795_R2.fastq.gz
Bt100,/mnt/scratch/sarek/data/Bt100/Bt100_R1.fastq.gz,/mnt/scratch/sarek/data/Bt100/Bt100_R2.fastq.gz

I believe this is due to the following lines in the has_header method.

                if thisType != columnTypes[col]:
                    if columnTypes[col] is None: # add new column type
                        columnTypes[col] = thisType
                    else:
                        # type is inconsistent, remove column from
                        # consideration
                        del columnTypes[col]

When all columns are strings both thisType and columnTypes[col] are integers denoting their length. Since they are of different lengths all the column are removed and columnTypes ends up being an empty dictionary which leads to the false negative down the line.

I believe there needs to be a special case introduced to avoid this when comparing integers rather than types.

Your environment

  • CPython versions tested on: Python 3.10.10
  • Operating system and architecture: Linux helios 5.17.5-76051705-generic #202204271406~1653440576~20.04~6277a18-Ubuntu SMP PREEMPT Thu Ma x86_64 x86_64 x86_64 GNU/Linux

Linked PRs

Metadata

Metadata

Labels

3.11only security fixes3.12only security fixes3.13bugs and security fixesstdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions