feat: add Feishu/Lark image support using doc2markdown#6511
feat: add Feishu/Lark image support using doc2markdown#6511bittoby wants to merge 17 commits intolabring:mainfrom
Conversation
|
|
Preview sandbox Image: |
Preview mcp_server Image: |
37c53de to
f1f8dfd
Compare
Preview fastgpt Image: |
…eishu-image-support # Conflicts: # packages/service/package.json # pnpm-lock.yaml
Docs Preview:🚀 FastGPT Document Preview Ready! |
|
@c121914yu @FinleyGe could you please review this PR? I'd appreciate your feedback. |
| const feishuBaseUrl = process.env.FEISHU_BASE_URL || 'https://open.feishu.cn'; | ||
| const logger = getLogger(LogCategories.MODULE.DATASET.API_DATASET); | ||
|
|
||
| const uploadLocalFileToS3 = async ({ |
There was a problem hiding this comment.
The S3 code requires that the upload and acquisition of the prefix be uniformly carried out within the s3 instance of the corresponding module, and it is not allowed to define the prefix at will
|
This design scheme is not appropriate for image processing. The image processing methods of the knowledge base should be fully reused. |
|
Thanks for your feedback. I will consider again |
|
@c121914yu I updated to use existing image process pipeline. And I tested again and confirmed it is working well. |
|
@c121914yu @FinleyGe Please review this. All tests passed and the |
packages/service/core/dataset/apiDataset/feishuDataset/feishuDocToMarkdown.ts
Show resolved
Hide resolved
…ve baseUrl - Upgrade doc2markdown to ^1.3.2 which supports baseUrl natively, removing all monkey-patching of internal methods (199 → 82 lines) - Extract shared uploadMdImagesToS3 helper to eliminate duplicate image upload logic between read.ts and file/read/utils.ts - Add deploy/ to .prettierignore to fix permission denied errors
|
@AntiMoron Thanks for your feedback. I updated all. Please review again |
|
@c121914yu Hope you had a great weekend! |
|
@c121914yu I'd appreciate your feedback. |
|
@c121914yu I submitted this PR a while ago. what else should I need to update more? |
We have received your message. We will merge this pr at an appropriate time. You don't need to keep resolving pr conflicts |
Closes: #5998
Summary
When syncing Feishu/Lark documents into a knowledge base, images were lost because the old code used the
/raw_contentAPI which only returns plain text. This PR adds doc2markdown to fetch documents via the Block API, download embedded images, and upload them to S3.What changed
test.webm
Files changed
packages/service/core/dataset/apiDataset/feishuDataset/api.ts- main implementationpackages/service/core/dataset/collection/controller.ts- S3 cleanup on collection deletepackages/service/package.json- addeddoc2markdowndependency.prettierignore- ignoredeploy/(Docker volume permission issues)How to test