Version: 2.0 - Unified Documentation System Last Updated: 2025-09-06 Purpose: Optimize Playwright usage for web research and documentation extraction
Playwright MCP server provides web automation capabilities for research, documentation extraction, and web content analysis. It's particularly valuable for extracting comprehensive documentation, research papers, and technical content from web sources.
- Current Usage: 21 tool uses (moderately utilized)
- Usage Pattern: Documentation extraction and web research
- Primary Value: Comprehensive content extraction and analysis
- β Complete Documentation: Extract full documentation from web pages
- β Research Papers: Download and analyze academic papers
- β Technical Guides: Extract comprehensive implementation guides
- β API Documentation: Get complete API references and examples
- β Web Scraping: Extract structured data from websites
- β Research Synthesis: Gather information from multiple sources
- β Competitive Analysis: Analyze competitor implementations
- β Trend Research: Extract current technology trends and patterns
- β Pre-Implementation: Gather comprehensive requirements and examples
- β Technology Evaluation: Extract detailed comparisons and benchmarks
- β Best Practices: Collect real-world implementation patterns
- β Problem Research: Extract solutions from forums and communities
- Run install once per environment boot (ensures browser availability)
- Validate configuration (credentials/paths if needed) before first navigation
- Install (once): Ensure the browser is installed/bootstrapped
- Navigate: Go to target URL
- Wait: Wait for known on-page text (e.g., a heading string) to appear
- Evaluate or Extract:
- Prefer evaluating after the wait, targeting specific selectors or headings
- If
evaluatereturns nothing, fall back to extracting page text content (not raw HTML) using the extraction step
- If DOM evaluation yields empty content, run a text-extract to capture the rendered pageβs main content (headings + body) rather than HTML nodes.
- Treat Playwright and Puppeteer as independent engines. Do not assume shared state.
// Navigate to documentation page
browser_navigate({
url: "https://developer.android.com/guide/topics/media/mediarecorder"
})// Extract comprehensive documentation
browser_extract({
instruction: "Extract the complete MediaRecorder API documentation including all methods, parameters, examples, and best practices"
})// Save extracted content for analysis
browser_snapshot({
name: "android_mediarecorder_docs"
})// Navigate to research paper
browser_navigate({
url: "https://arxiv.org/abs/2401.12345"
})
// Extract complete paper content
browser_extract({
instruction: "Extract the complete research paper including abstract, methodology, results, and conclusions"
})
// Save PDF for detailed analysis
browser_pdf_save({
name: "audio_processing_research_paper"
})// Research complex topics from multiple sources
const researchSources = [
"https://developer.android.com/topic/architecture",
"https://medium.com/androiddevelopers/modern-android-development",
"https://proandroiddev.com/android-architecture-patterns"
];
for (const url of researchSources) {
await browser_navigate({ url });
const content = await browser_extract({
instruction: "Extract key architecture patterns, benefits, and implementation considerations"
});
await browser_snapshot({ name: `architecture_research_${url.split('/').pop()}` });
}- Primary Usage: Documentation extraction and web content analysis
- Content Quality: High for comprehensive documentation extraction
- Research Efficiency: Excellent for gathering information from multiple sources
- Integration Rate: Well-integrated into research and documentation workflows
- Content Completeness: Achieve 95%+ complete extraction of target content
- Research Efficiency: 70%+ time savings vs. manual research
- Source Coverage: Extract from 10+ sources per complex research topic
- Content Quality: Maintain high accuracy and relevance of extracted content
- High Completeness (8-10): 90%+ of target content extracted
- Medium Completeness (5-7): 70-89% of target content extracted
- Low Completeness (1-4): Less than 70% of target content extracted
- Incomplete (0): Major content sections missing
- High Accuracy: All extracted information correct and current
- Medium Accuracy: Mostly correct with minor errors
- Low Accuracy: Significant errors or outdated information
- Unreliable: Major inaccuracies requiring manual verification
// Define research requirements
const researchRequirements = {
topic: "Android real-time audio processing",
sources: ["official docs", "research papers", "implementation guides"],
depth: "comprehensive",
format: "technical implementation"
}// Identify high-quality sources
const researchSources = [
{
url: "https://developer.android.com/guide/topics/media/mediarecorder",
type: "official documentation",
priority: "high"
},
{
url: "https://arxiv.org/search/?query=audio+processing+mobile",
type: "academic research",
priority: "high"
},
{
url: "https://proandroiddev.com/tag/audio",
type: "community implementation",
priority: "medium"
}
];// Extract from each source
for (const source of researchSources) {
await browser_navigate({ url: source.url });
const content = await browser_extract({
instruction: `Extract comprehensive ${source.type} content about ${researchRequirements.topic}, including code examples, best practices, and implementation details`
});
await browser_snapshot({
name: `${researchRequirements.topic}_${source.type}_${Date.now()}`
});
}// Synthesize findings across sources
const synthesis = {
official_patterns: extractOfficialPatterns(content),
research_findings: extractResearchFindings(content),
implementation_examples: extractCodeExamples(content),
best_practices: extractBestPractices(content)
};
// Generate research summary
generateResearchSummary(synthesis);// Research Kotlin implementation patterns
browser_navigate({
url: "https://developer.android.com/kotlin"
});
const kotlinPatterns = await browser_extract({
instruction: "Extract Kotlin implementation patterns, migration strategies, and best practices for Android development"
});
await browser_snapshot({ name: "kotlin_migration_patterns" });// Research ML implementation approaches
browser_navigate({
url: "https://developers.google.com/ml-kit"
});
const mlPatterns = await browser_extract({
instruction: "Extract ML implementation patterns, model optimization techniques, and performance considerations for mobile"
});
await browser_snapshot({ name: "ml_implementation_patterns" });// Research performance optimization techniques
browser_navigate({
url: "https://developer.android.com/topic/performance"
});
const performanceTechniques = await browser_extract({
instruction: "Extract performance optimization techniques, benchmarking methods, and best practices for Android apps"
});
await browser_snapshot({ name: "android_performance_optimization" });// Research complex topics across multiple pages
const comprehensiveResearch = async (topic) => {
// Find related pages
const searchResults = await browser_navigate({
url: `https://developer.android.com/search?q=${encodeURIComponent(topic)}`
});
const relatedPages = await browser_extract({
instruction: "Extract links to all relevant documentation pages for this topic"
});
// Extract from each related page
const allContent = [];
for (const page of relatedPages.slice(0, 5)) { // Limit to top 5
await browser_navigate({ url: page.url });
const content = await browser_extract({
instruction: `Extract comprehensive information about ${topic} from this page`
});
allContent.push(content);
}
// Synthesize comprehensive research
return synthesizeResearch(allContent);
};// Extract and analyze research papers
const analyzeResearchPaper = async (paperUrl) => {
await browser_navigate({ url: paperUrl });
// Extract paper structure
const paperStructure = await browser_extract({
instruction: "Extract the paper's title, authors, abstract, methodology, results, and conclusions"
});
// Save for detailed analysis
await browser_pdf_save({ name: "research_paper_analysis" });
// Extract key findings
const keyFindings = await browser_extract({
instruction: "Extract the most important findings, performance metrics, and implementation insights from this research"
});
return { paperStructure, keyFindings };
};// Validate information across multiple sources
const validateInformation = async (topic, sources) => {
const validations = [];
for (const source of sources) {
await browser_navigate({ url: source });
const information = await browser_extract({
instruction: `Extract information about ${topic} and note the source credibility and publication date`
});
validations.push({
source: source,
information: information,
credibility: assessCredibility(source),
recency: assessRecency(information)
});
}
return crossValidateInformation(validations);
};// Extract practical implementation examples
const extractImplementationExamples = async (documentationUrl) => {
await browser_navigate({ url: documentationUrl });
const examples = await browser_extract({
instruction: "Extract all code examples, implementation patterns, and practical usage examples from this documentation"
});
// Validate examples work with current APIs
const validatedExamples = await validateExamples(examples);
return validatedExamples;
};- Refine Instructions: Make extraction instructions more specific
- Use Multiple Queries: Break complex extractions into smaller parts
- Check Page Loading: Ensure page is fully loaded before extraction
- Handle Dynamic Content: Wait for JavaScript-generated content
- Check URLs: Verify URLs are correct and accessible
- Handle Redirects: Account for URL redirects and changes
- Network Issues: Implement retry logic for network failures
- Access Restrictions: Handle paywalls or access restrictions
- Source Validation: Verify source credibility and recency
- Cross-Reference: Validate information against multiple sources
- Update Checks: Check for more recent versions of documentation
- Context Preservation: Ensure extracted content maintains proper context
- Batch Processing: Group related extractions to reduce overhead
- Selective Extraction: Extract only needed content to improve speed
- Caching Strategy: Cache frequently accessed content
- Parallel Processing: Extract from multiple sources simultaneously
- Extraction Completeness: Percentage of target content successfully extracted
- Research Efficiency: Time saved vs. manual research methods
- Source Coverage: Number of sources analyzed per research topic
- Content Quality: Accuracy and usefulness of extracted content
- Research Depth: Ability to gather comprehensive information
- Implementation Value: Practicality of extracted examples
- Source Quality: Credibility and recency of information sources
- Synthesis Quality: Effectiveness of multi-source information synthesis
- Official Documentation: Highest credibility for framework-specific information
- Peer-Reviewed Papers: Highest credibility for research and technical analysis
- Industry Blogs: Medium credibility with expert authors and companies
- Community Forums: Useful for practical problems and real-world experience
- Social Media: Lowest credibility, use for trend identification only
- Current Documentation: Official docs updated within last 6 months
- Recent Research: Papers published within last 12 months
- Industry Content: Articles from last 3-6 months
- Community Content: Recent discussions and solutions
- Be Specific: Use detailed, specific extraction instructions
- Context Matters: Include context about what and why you're extracting
- Quality Over Quantity: Focus on high-quality, relevant content
- Source Diversity: Extract from multiple types of sources
- Validation First: Verify source credibility before deep extraction
- Plan Extraction Scope: Define what content is needed upfront
- Source Prioritization: Start with highest credibility sources
- Iterative Refinement: Refine extraction based on initial results
- Content Synthesis: Combine information from multiple sources
- Validation and Verification: Cross-check information accuracy
- Strategic Extraction: Extract only what's needed for the research goal
- Batch Processing: Group related extractions for efficiency
- Content Caching: Reuse extracted content for similar research
- Quality Monitoring: Track extraction quality and success rates
- Increase usage for comprehensive documentation extraction
- Develop extraction templates for common research scenarios
- Implement content validation workflows
- Create research synthesis automation
- Achieve 95%+ extraction completeness for target content
- Integrate Playwright into all major research workflows
- Build library of successful extraction patterns
- Automate multi-source research synthesis
- Playwright as primary tool for comprehensive web research
- Automated research report generation from web sources
- Advanced content analysis and pattern recognition
- Community sharing of successful research extraction patterns
## Weekly Playwright Research Review
**Period**: [Date Range]
**Extractions Conducted**: [Number]
**Average Completeness Score**: [1-10]
**Sources Analyzed**: [Number]
**Time Saved**: [Hours]
**Research Quality Score**: [1-10]
### Top Research Findings
1. **[Topic]**: [Key insights from extracted content]
2. **[Topic]**: [Key insights from extracted content]
### Extraction Efficiency Improvements
- **Query Optimization**: [Successful extraction patterns]
- **Source Quality**: [High-value sources identified]
- **Synthesis Process**: [Improved multi-source integration]
### Next Week Focus
- **Priority Research Areas**: [Based on project needs]
- **Extraction Optimization**: [Patterns to test and refine]
- **Integration Improvements**: [Workflow enhancements]This guide optimizes Playwright usage for web research and documentation extraction. For GitHub MCP repository management, see docs/mcp-integration/github-mcp-guide.md. For general MCP integration, see docs/mcp-integration/mcp-optimization.md.