Skip to content

docs: Add CRW document loader integration#3353

Open
Recep S (us) wants to merge 2 commits intolangchain-ai:mainfrom
us:feat/add-crw-integration
Open

docs: Add CRW document loader integration#3353
Recep S (us) wants to merge 2 commits intolangchain-ai:mainfrom
us:feat/add-crw-integration

Conversation

@us
Copy link
Copy Markdown

Summary

Adds documentation for CRW, an open-source web scraper built for AI agents, as a LangChain document loader integration.

Files added:

  • src/oss/python/integrations/providers/crw.mdx — Provider page with installation and setup
  • src/oss/python/integrations/document_loaders/crw.mdx — Document loader page with usage examples

PyPI package: langchain-crw

What CRW provides

  • CrwLoader — Document loader with three modes: scrape, crawl, and map
  • Self-hosted (free, no API key) or managed cloud (fastcrw.com)
  • High-performance Rust-based scraper with JS rendering support
  • Firecrawl-compatible API — includes migration guide from FireCrawlLoader

Documentation follows existing patterns

Modeled after the existing FireCrawl integration docs, including:

  • Overview tables (integration details + loader features)
  • Setup instructions (self-hosted + cloud)
  • Usage examples for all three modes (scrape, crawl, map)
  • RAG pipeline example
  • Migration guide from FireCrawlLoader

Copilot AI review requested due to automatic review settings March 27, 2026 20:50
@github-actions github-actions Bot added external User is not a member of langchain-ai langchain For docs changes to LangChain oss python For content related to the Python version of LangChain projects labels Mar 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds LangChain Python documentation for the CRW web-scraper integration, including a provider page and a document loader page with setup, usage patterns, and migration guidance.

Changes:

  • Added CRW provider documentation (installation + self-hosted/cloud setup).
  • Added CRW document loader documentation (modes, examples, RAG pipeline, FireCrawl migration).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.

File Description
src/oss/python/integrations/providers/crw.mdx New provider page documenting CRW installation and configuration plus a link to loader docs.
src/oss/python/integrations/document_loaders/crw.mdx New document loader page documenting CrwLoader usage, modes, options, RAG example, and migration from FireCrawl.


## Document loader

See a [usage example](/oss/integrations/document_loaders/crw).
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The link path looks inconsistent with this doc’s location under /oss/python/... and may 404 (it omits /python). Update the URL to match the site’s routing for Python integrations (e.g., include the /python segment if that’s the established pattern in this repo).

Suggested change
See a [usage example](/oss/integrations/document_loaders/crw).
See a [usage example](/oss/python/integrations/document_loaders/crw).

Copilot uses AI. Check for mistakes.
description: "Integrate with the CRW document loader using LangChain Python."
---

[CRW](https://github.com/us/crw) is an open-source, high-performance web scraper built for AI agents. Written in Rust, it crawls and converts any website into clean markdown with JS rendering support. Self-hosted (free) or managed cloud at [fastcrw.com](https://fastcrw.com).
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The page text states CRW has JS rendering support and the examples include render_js, but the “JS support” column is set to ❌. Either update the table to ✅ or adjust the description/examples to avoid contradicting the feature matrix.

Copilot uses AI. Check for mistakes.

| Class | Package | Local | Serializable | JS support |
| :--- | :--- | :---: | :---: | :---: |
| CrwLoader | [langchain-crw](https://pypi.org/project/langchain-crw/) | ✅ | ❌ | ❌ |
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The page text states CRW has JS rendering support and the examples include render_js, but the “JS support” column is set to ❌. Either update the table to ✅ or adjust the description/examples to avoid contradicting the feature matrix.

Suggested change
| CrwLoader | [langchain-crw](https://pypi.org/project/langchain-crw/) ||| |
| CrwLoader | [langchain-crw](https://pypi.org/project/langchain-crw/) ||| |

Copilot uses AI. Check for mistakes.
Comment on lines +62 to +68
pages = []
for doc in loader.lazy_load():
pages.append(doc)
if len(pages) >= 10:
# do some paged operation, e.g.
# index.upsert(pages)
pages = []
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snippet references loader but doesn’t define it in the same code block, so it fails if copied as-is. Include the CrwLoader(...) initialization in this block or explicitly note that it continues from the prior example.

Copilot uses AI. Check for mistakes.
Run CRW locally — no API key, no account, no limits:

```bash
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | bash
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Piping a remote script directly into bash is unsafe (users can’t easily verify what they’re executing). Prefer documenting a safer flow (download to a file, inspect, then execute) or link to CRW’s official installation instructions that provide verification steps.

Suggested change
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh | bash
# Download the installer script
curl -fsSL https://raw.githubusercontent.com/us/crw/main/install.sh -o crw_install.sh
# (Optional) Inspect the script before running it
cat crw_install.sh
# Run the installer
bash crw_install.sh
# Start CRW

Copilot uses AI. Check for mistakes.
---

>[CRW](https://github.com/us/crw) is an open-source, high-performance web scraper built for AI agents.
> Written in Rust, it crawls and converts any website into clean markdown.
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trailing whitespace at end of line. Remove it to avoid noisy diffs and markdown lint issues.

Suggested change
> Written in Rust, it crawls and converts any website into clean markdown.
> Written in Rust, it crawls and converts any website into clean markdown.

Copilot uses AI. Check for mistakes.
@mdrxy Mason Daugherty (mdrxy) added the integration For docs updates for LangChain integrations label Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external User is not a member of langchain-ai integration For docs updates for LangChain integrations langchain For docs changes to LangChain oss python For content related to the Python version of LangChain projects

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants