
Research
NPM targeted by malware campaign mimicking familiar library names
Socket uncovered npm malware campaign mimicking popular Node.js libraries and packages from other ecosystems; packages steal data and execute remote code.
A modern, fully typed Python library for converting HTML to Markdown. This library is a completely rewritten fork of markdownify with a modernized codebase, strict type safety and support for Python 3.9+.
pip install html-to-markdown
Convert HTML to Markdown with a single function call:
from html_to_markdown import convert_to_markdown
html = """
<article>
<h1>Welcome</h1>
<p>This is a <strong>sample</strong> with a <a href="https://example.com">link</a>.</p>
<ul>
<li>Item 1</li>
<li>Item 2</li>
</ul>
</article>
"""
markdown = convert_to_markdown(html)
print(markdown)
Output:
# Welcome
This is a **sample** with a [link](https://example.com).
* Item 1
* Item 2
If you need more control over HTML parsing, you can pass a pre-configured BeautifulSoup instance:
from bs4 import BeautifulSoup
from html_to_markdown import convert_to_markdown
# Configure BeautifulSoup with your preferred parser
soup = BeautifulSoup(html, "lxml") # Note: lxml requires additional installation
markdown = convert_to_markdown(soup)
The library offers extensive customization through various options:
from html_to_markdown import convert_to_markdown
html = "<div>Your content here...</div>"
markdown = convert_to_markdown(
html,
heading_style="atx", # Use # style headers
strong_em_symbol="*", # Use * for bold/italic
bullets="*+-", # Define bullet point characters
wrap=True, # Enable text wrapping
wrap_width=100, # Set wrap width
escape_asterisks=True, # Escape * characters
code_language="python", # Default code block language
)
You can provide your own conversion functions for specific HTML tags:
from bs4.element import Tag
from html_to_markdown import convert_to_markdown
# Define a custom converter for the <b> tag
def custom_bold_converter(*, tag: Tag, text: str, **kwargs) -> str:
return f"IMPORTANT: {text}"
html = "<p>This is a <b>bold statement</b>.</p>"
markdown = convert_to_markdown(html, custom_converters={"b": custom_bold_converter})
print(markdown)
# Output: This is a IMPORTANT: bold statement.
Custom converters take precedence over the built-in converters and can be used alongside other configuration options.
Option | Type | Default | Description |
---|---|---|---|
autolinks | bool | True | Auto-convert URLs to Markdown links |
bullets | str | '*+-' | Characters to use for bullet points |
code_language | str | '' | Default language for code blocks |
heading_style | str | 'underlined' | Header style ('underlined' , 'atx' , 'atx_closed' ) |
escape_asterisks | bool | True | Escape * characters |
escape_underscores | bool | True | Escape _ characters |
wrap | bool | False | Enable text wrapping |
wrap_width | int | 80 | Text wrap width |
For a complete list of options, see the Configuration section below.
Convert HTML files directly from the command line:
# Convert a file
html_to_markdown input.html > output.md
# Process stdin
cat input.html | html_to_markdown > output.md
# Use custom options
html_to_markdown --heading-style atx --wrap --wrap-width 100 input.html > output.md
View all available options:
html_to_markdown --help
For existing projects using Markdownify, a compatibility layer is provided:
# Old code
from markdownify import markdownify as md
# New code - works the same way
from html_to_markdown import markdownify as md
The markdownify
function is an alias for convert_to_markdown
and provides identical functionality.
Full list of configuration options:
autolinks
: Convert valid URLs to Markdown links automaticallybullets
: Characters to use for bullet points in listscode_language
: Default language for fenced code blockscode_language_callback
: Function to determine code block languageconvert
: List of HTML tags to convert (None = all supported tags)default_title
: Use default titles for elements like linksescape_asterisks
: Escape * charactersescape_misc
: Escape miscellaneous Markdown charactersescape_underscores
: Escape _ charactersheading_style
: Header style (underlined/atx/atx_closed)keep_inline_images_in
: Tags where inline images should be keptnewline_style
: Style for handling newlines (spaces/backslash)strip
: Tags to remove from outputstrong_em_symbol
: Symbol for strong/emphasized text (* or _)sub_symbol
: Symbol for subscript textsup_symbol
: Symbol for superscript textwrap
: Enable text wrappingwrap_width
: Width for text wrappingconvert_as_inline
: Treat content as inline elementscustom_converters
: A mapping of HTML tag names to custom converter functionsThis library is open to contribution. Feel free to open issues or submit PRs. Its better to discuss issues before submitting PRs to avoid disappointment.
Clone the repo
Install the system dependencies
Install the full dependencies with uv sync
Install the pre-commit hooks with:
pre-commit install && pre-commit install --hook-type commit-msg
Make your changes and submit a PR
This library uses the MIT license.
Special thanks to the original markdownify project creators and contributors.
FAQs
Convert HTML to markdown
We found that html-to-markdown demonstrated a healthy version release cadence and project activity because the last version was released less than a year ago. It has 1 open source maintainer collaborating on the project.
Did you know?
Socket for GitHub automatically highlights issues in each pull request and monitors the health of all your open source dependencies. Discover the contents of your packages and block harmful activity before you install or update your dependencies.
Research
Socket uncovered npm malware campaign mimicking popular Node.js libraries and packages from other ecosystems; packages steal data and execute remote code.
Research
Socket's research uncovers three dangerous Go modules that contain obfuscated disk-wiping malware, threatening complete data loss.
Research
Socket uncovers malicious packages on PyPI using Gmail's SMTP protocol for command and control (C2) to exfiltrate data and execute commands.