Open
Description
Bug report
On the following CSV content, the csv.Sniffer.has_header
method returns False
although it clearly has a header.
sample,fastq_1,fastq_2
A1-35-8,/mnt/scratch/sarek/data/A1-35-8/A1-35-8_R1.fastq.gz,/mnt/scratch/sarek/data/A1-35-8/A1-35-8_R2.fastq.gz
A2-102-5,/mnt/scratch/sarek/data/A2-102-5/A2-102-5_R1.fastq.gz,/mnt/scratch/sarek/data/A2-102-5/A2-102-5_R2.fastq.gz
A5-35-17,/mnt/scratch/sarek/data/A5-35-17/A5-35-17_R1.fastq.gz,/mnt/scratch/sarek/data/A5-35-17/A5-35-17_R2.fastq.gz
AD1-7a,/mnt/scratch/sarek/data/AD1-7a/AD1-7a_R1.fastq.gz,/mnt/scratch/sarek/data/AD1-7a/AD1-7a_R2.fastq.gz
AD1-83a,/mnt/scratch/sarek/data/AD1-83a/AD1-83a_R1.fastq.gz,/mnt/scratch/sarek/data/AD1-83a/AD1-83a_R2.fastq.gz
AD2-60a,/mnt/scratch/sarek/data/AD2-60a/AD2-60a_R1.fastq.gz,/mnt/scratch/sarek/data/AD2-60a/AD2-60a_R2.fastq.gz
Arg1366,/mnt/scratch/sarek/data/Arg1366/Arg1366_R1.fastq.gz,/mnt/scratch/sarek/data/Arg1366/Arg1366_R2.fastq.gz
Br795,/mnt/scratch/sarek/data/Br795/Br795_R1.fastq.gz,/mnt/scratch/sarek/data/Br795/Br795_R2.fastq.gz
Bt100,/mnt/scratch/sarek/data/Bt100/Bt100_R1.fastq.gz,/mnt/scratch/sarek/data/Bt100/Bt100_R2.fastq.gz
I believe this is due to the following lines in the has_header
method.
if thisType != columnTypes[col]:
if columnTypes[col] is None: # add new column type
columnTypes[col] = thisType
else:
# type is inconsistent, remove column from
# consideration
del columnTypes[col]
When all columns are strings both thisType
and columnTypes[col]
are integers denoting their length. Since they are of different lengths all the column are removed and columnTypes
ends up being an empty dictionary which leads to the false negative down the line.
I believe there needs to be a special case introduced to avoid this when comparing integers rather than types.
Your environment
- CPython versions tested on: Python 3.10.10
- Operating system and architecture:
Linux helios 5.17.5-76051705-generic #202204271406~1653440576~20.04~6277a18-Ubuntu SMP PREEMPT Thu Ma x86_64 x86_64 x86_64 GNU/Linux
Linked PRs
Metadata
Metadata
Assignees
Labels
Projects
Status
In Progress