> ## Documentation Index
> Fetch the complete documentation index at: https://intunedhq.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Crawl websites

## Recipe

This recipe shows how to crawl websites and extract content as markdown using [Crawl4AI](https://crawl4ai.com/) with Intuned's browser infrastructure.

## Project structure

```plaintext theme={null}
api/
  └── simple.py          # Simple crawling example
hooks/
  └── setup_context.py   # Browser context setup
utils/
  └── config.py          # Browser configuration
pyproject.toml         # Dependencies
```

## Setup

### pyproject.toml

```toml theme={null}
[build-system]
requires = ["hatchling>=1.18.0"]
build-backend = "hatchling.build"

[project]
name = "default"
version = "0.0.1"
description = "Empty Intuned project"
readme = "README.md"
requires-python = ">=3.12,<3.13"
authors = [
  { name = "Intuned", email = "service@intunedhq.com" }
]
keywords = ["Python", "intuned-browser-sdk"]

dependencies = [
  "playwright==1.55.0",
  "intuned-runtime==1.3.10",
  "intuned-browser==0.1.9",
  "crawl4ai==0.8.6",
]


[tool.uv]
package = false
```

### hooks/setup\_context.py

Store the CDP URL so Crawl4AI can connect to Intuned's browser:

```python theme={null}
from intuned_runtime import attempt_store


async def setup_context(*, api_name: str, api_parameters: str, cdp_url: str):
    attempt_store.set("cdp_url", cdp_url)
```

### utils/config.py

Create the browser configuration for Crawl4AI using the CDP URL:

```python theme={null}
from crawl4ai import BrowserConfig
from intuned_runtime import attempt_store


def get_browser_config() -> BrowserConfig:
    cdp_url = attempt_store.get("cdp_url")

    return BrowserConfig(
        verbose=True,
        cdp_url=cdp_url,
        headless=False,
        accept_downloads=True,
    )
```

## Crawl a single page

Crawl a single page and extract its content as markdown:

```python theme={null}
from playwright.async_api import Page
from typing import TypedDict
from crawl4ai import (
    AsyncWebCrawler,
    CrawlerRunConfig,
    DefaultMarkdownGenerator,
    PruningContentFilter,
    CrawlResult,
)
from utils.config import get_browser_config


class Params(TypedDict):
    pass


async def automation(page: Page, params: Params | None = None, **_kwargs):
    browser_config = get_browser_config()
    async with AsyncWebCrawler(config=browser_config) as crawler:
        crawler_config = CrawlerRunConfig(
            markdown_generator=DefaultMarkdownGenerator(
                content_filter=PruningContentFilter(),
            ),
        )
        result: CrawlResult = await crawler.arun(
            url="https://www.helloworld.org", config=crawler_config
        )
        return result.markdown.raw_markdown
```

## Related links

<CardGroup cols={2}>
  <Card title="Cookbook" icon="github" href="https://github.com/Intuned/cookbook/tree/main/python-examples/crawl4ai">
    Complete Crawl4AI example in the Intuned Cookbook
  </Card>
</CardGroup>
