Skip to content

csv.py _guess_quote_and_delimiter should be able to handle windows \r #103925

Open
@lilaboc

Description

@lilaboc

Bug report

in _guess_quote_and_delimiter function in csv.py, regular expressions end with "(?:$|\n)" won't be able to handle \r in windows line end

>>> import re
>>> re.search(r'(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?:$|\n)', '2020-10-01 17:17:37+08:00,https://www.mozilla.org/en-US/firefox/welcome/2/,"Pocket - Save news, videos, stories and more"\r\n', re.M|re.S)
>>> re.search(r'(?P<delim>[^\w\n"\'])(?P<space> ?)(?P<quote>["\']).*?(?P=quote)(?:$|\r|\n)', '2020-10-01 17:17:37+08:00,https://www.mozilla.org/en-US/firefox/welcome/2/,"Pocket - Save news, videos, stories and more"\r\n', re.M|re.S)
<re.Match object; span=(74, 122), match=',"Pocket - Save news, videos, stories and more"\r>
import csv
a = 'Timestamp,URL,Title\r\n2020-10-01 17:17:37+08:00,https://www.mozilla.org/en-US/firefox/welcome/2/,"Pocket - Save news, videos, stories and more"\r\n'
sniffer = csv.Sniffer()
result = sniffer.sniff(a)
print(result.delimiter)
print(result.quotechar)

The wrong output is "p"

After the change, output becomes ","

Your environment

Windows

  • CPython versions tested on: 3.11.2
  • Operating system and architecture: windows 64

Linked PRs

Metadata

Metadata

Labels

stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions