> ## Documentation Index
> Fetch the complete documentation index at: https://intunedhq.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# sanitize_html

Sanitizes and cleans HTML content by removing unwanted elements, attributes, and whitespace.
Provides fine-grained control over each cleaning operation through configurable options.

```python theme={null}
def sanitize_html(
    html: str,
    *,
    remove_scripts: bool,
    remove_styles: bool,
    remove_svgs: bool,
    remove_comments: bool,
    remove_long_attributes: bool,
    max_attribute_length: int,
    preserve_attributes: list[str] | None,
    remove_empty_tags: bool,
    preserve_empty_tags: list[str] | None,
    minify_whitespace: bool,
) -> str
```

## Examples

<CodeGroup>
  ```python Basic Sanitization theme={null}
  from typing import TypedDict
  from playwright.async_api import Page
  from intuned_browser import sanitize_html
  class Params(TypedDict):
      pass
  async def automation(page: Page, params: Params, **_kwargs):
      await page.goto("https://books.toscrape.com")
      first_row = page.locator("ol.row").locator("li").first
      # Get the HTML of the first row.
      html = await first_row.inner_html()
      # Sanitize the HTML.
      sanitized_html = sanitize_html(html)
      # Log the sanitized HTML.
      print(sanitized_html)
      # Return the sanitized HTML.
      return sanitized_html
  ```
</CodeGroup>

## Arguments

<ResponseField name="html" type="str" required>
  The HTML content to sanitize
</ResponseField>

<ResponseField name="remove_scripts" type="bool">
  Remove all `<script>` elements. Defaults to True.
</ResponseField>

<ResponseField name="remove_styles" type="bool">
  Remove all `<style>` elements. Defaults to True.
</ResponseField>

<ResponseField name="remove_svgs" type="bool">
  Remove all `<svg>` elements. Defaults to True.
</ResponseField>

<ResponseField name="remove_comments" type="bool">
  Remove HTML comments. Defaults to True.
</ResponseField>

<ResponseField name="remove_long_attributes" type="bool">
  Remove attributes longer than max\_attribute\_length. Defaults to True.
</ResponseField>

<ResponseField name="max_attribute_length" type="int">
  Maximum length for attributes before removal. Defaults to 500.
</ResponseField>

<ResponseField name="preserve_attributes" type="list[str]">
  List of attribute names to always preserve. Defaults to \["class", "src"].
</ResponseField>

<ResponseField name="remove_empty_tags" type="bool">
  Remove empty tags (except preserved ones). Defaults to True.
</ResponseField>

<ResponseField name="preserve_empty_tags" type="list[str]">
  List of tag names to preserve even when empty. Defaults to \["img"].
</ResponseField>

<ResponseField name="minify_whitespace" type="bool">
  Remove extra whitespace between tags and empty lines. Defaults to True.
</ResponseField>

## Returns: `str`

The sanitized HTML string
