Skip to content

pythongh-102140 : False neg csv header bug fix #102787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

Drakariboo
Copy link

gh-102140 : We've improved the heuristic of has_header() method in Lib/csv.py. We wanted to respect the methodology on how this function was created, even if the determining factor string length is meaningless . We added more verifications before deleting a column for its inconsistency :

similiratyWords is a dictionnary in which we stock the number of repetitions of words per column.

compareWords is a list in which we stock every word of an element by using regex. By comparing two lists which represent for example element_line1 and element_line2, we increment similarityWords if there are same values in element_line1 and element_line2.
Thanks to this, we respect, the methology of comparing each element of every row in a column.

We've made the average of string lengths and compared it to the header length to keep the consistency.

Checking the header if it's a single word.

We group up all of that in the vote at the end of the function.

Where: gh-102140

Contributors : @Drakariboo & @Vanille-22

@ghost
Copy link

ghost commented Mar 17, 2023

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-bot
Copy link

Most changes to Python require a NEWS entry.

Please add it using the blurb_it web app or the blurb command-line tool.

@Drakariboo Drakariboo closed this Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants