Skip to content

feat: add Feishu/Lark image support using doc2markdown#6511

Open
bittoby wants to merge 17 commits intolabring:mainfrom
bittoby:feat/feishu-image-support
Open

feat: add Feishu/Lark image support using doc2markdown#6511
bittoby wants to merge 17 commits intolabring:mainfrom
bittoby:feat/feishu-image-support

Conversation

@bittoby
Copy link
Copy Markdown

@bittoby bittoby commented Mar 5, 2026

Closes: #5998

Summary

When syncing Feishu/Lark documents into a knowledge base, images were lost because the old code used the /raw_content API which only returns plain text. This PR adds doc2markdown to fetch documents via the Block API, download embedded images, and upload them to S3.

What changed

  • Image support for Feishu/Lark docs: Documents are now converted to markdown with images preserved. Images are downloaded from Feishu and stored in MinIO/S3.
  • No duplicate images: S3 filenames are based on the image's resource token, so re-importing the same document won't create duplicates.
  • Proper cleanup on deletion: Deleting a Feishu collection from a dataset now removes its images from S3.
test.webm

Files changed

  • packages/service/core/dataset/apiDataset/feishuDataset/api.ts - main implementation
  • packages/service/core/dataset/collection/controller.ts - S3 cleanup on collection delete
  • packages/service/package.json - added doc2markdown dependency
  • .prettierignore - ignore deploy/ (Docker volume permission issues)

How to test

  • Import a Feishu/Lark doc with images and check that images show up in chunks
  • Import the same doc again - no new duplicate images in MinIO
  • Delete the collection - images should be removed from MinIO
  • Delete the whole dataset - everything cleaned up

@cla-assistant
Copy link
Copy Markdown

cla-assistant bot commented Mar 5, 2026

CLA assistant check
All committers have signed the CLA.

@cla-assistant
Copy link
Copy Markdown

cla-assistant bot commented Mar 5, 2026

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 5, 2026

Preview sandbox Image:

registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-pr:fatsgpt_sandbox_68702d4261a6dd3b0d98444ad73f77db5c3ff3fa

@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 5, 2026

Preview mcp_server Image:

registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-pr:fatsgpt_mcp_server_68702d4261a6dd3b0d98444ad73f77db5c3ff3fa

@bittoby bittoby force-pushed the feat/feishu-image-support branch from 37c53de to f1f8dfd Compare March 5, 2026 21:06
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 5, 2026

Preview fastgpt Image:

registry.cn-hangzhou.aliyuncs.com/fastgpt/fastgpt-pr:fatsgpt_68702d4261a6dd3b0d98444ad73f77db5c3ff3fa

…eishu-image-support

# Conflicts:
#	packages/service/package.json
#	pnpm-lock.yaml
@github-actions
Copy link
Copy Markdown

github-actions bot commented Mar 6, 2026

Docs Preview:


🚀 FastGPT Document Preview Ready!

🔗 👀 Click here to visit preview

@bittoby
Copy link
Copy Markdown
Author

bittoby commented Mar 6, 2026

@c121914yu @FinleyGe could you please review this PR? I'd appreciate your feedback.

const feishuBaseUrl = process.env.FEISHU_BASE_URL || 'https://open.feishu.cn';
const logger = getLogger(LogCategories.MODULE.DATASET.API_DATASET);

const uploadLocalFileToS3 = async ({
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The S3 code requires that the upload and acquisition of the prefix be uniformly carried out within the s3 instance of the corresponding module, and it is not allowed to define the prefix at will

@c121914yu
Copy link
Copy Markdown
Collaborator

This design scheme is not appropriate for image processing. The image processing methods of the knowledge base should be fully reused.

@bittoby
Copy link
Copy Markdown
Author

bittoby commented Mar 6, 2026

Thanks for your feedback. I will consider again

@bittoby
Copy link
Copy Markdown
Author

bittoby commented Mar 6, 2026

@c121914yu I updated to use existing image process pipeline. And I tested again and confirmed it is working well.
I'd appreciate you review again.

@bittoby bittoby requested a review from c121914yu March 6, 2026 16:39
@bittoby
Copy link
Copy Markdown
Author

bittoby commented Mar 13, 2026

@c121914yu @FinleyGe Please review this. All tests passed and the Feishu/Lark image support function works well. I attached test video in the PR description. I appreciate you review again

bittoby added 2 commits March 16, 2026 10:03
…ve baseUrl

- Upgrade doc2markdown to ^1.3.2 which supports baseUrl natively, removing
  all monkey-patching of internal methods (199 → 82 lines)
- Extract shared uploadMdImagesToS3 helper to eliminate duplicate image
  upload logic between read.ts and file/read/utils.ts
- Add deploy/ to .prettierignore to fix permission denied errors
@bittoby bittoby requested a review from AntiMoron March 16, 2026 14:01
@bittoby
Copy link
Copy Markdown
Author

bittoby commented Mar 16, 2026

@AntiMoron Thanks for your feedback. I updated all. Please review again

@bittoby
Copy link
Copy Markdown
Author

bittoby commented Mar 16, 2026

@c121914yu Hope you had a great weekend!
please review this PR. would appreciate to merge this.
thank you

Copy link
Copy Markdown

@AntiMoron AntiMoron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bittoby
Copy link
Copy Markdown
Author

bittoby commented Mar 18, 2026

@c121914yu I'd appreciate your feedback.

@bittoby
Copy link
Copy Markdown
Author

bittoby commented Mar 24, 2026

@c121914yu I submitted this PR a while ago. what else should I need to update more?

@c121914yu
Copy link
Copy Markdown
Collaborator

@c121914yu I submitted this PR a while ago. what else should I need to update more?

We have received your message. We will merge this pr at an appropriate time. You don't need to keep resolving pr conflicts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

建议接入 feishu2markdown 来确保 feishu 这边处理好图片功能

3 participants