Samir Caizapasto Sam-24-dev

Junior Data Engineer & Data Analyst | Open to Remote/Hybrid (LATAM/US)

I build production-style data systems and decision-focused BI products.

Impact highlights: 133+ tests in CI, up to 40% faster SQL workloads, $16K+ performance gaps identified, 1.2M+ records in ML pricing pipelines.

📍 Guayaquil, Ecuador · 🌎 Open to Remote / Hybrid opportunities

🧭 Choose Your Track

🛠️ Hire me as a Data Engineer

ETL/ELT pipelines, data contracts, CI/CD
Pandera, DuckDB, dbt, SQL optimization
Reproducible CLI workflows and tested delivery

🔗 Best-fit projects:

📊 Hire me as a Data Analyst / BI Analyst

KPI modeling, dashboard storytelling, executive reporting
Power BI, SQL, Python preprocessing
Business insights translated into action plans

🔗 Best-fit projects:

📌 Impact Metrics

Metric	Proof of Impact
✅ 133+ automated tests	Production-style data platforms with CI quality gates
⚡ Up to 40% faster queries	SQL tuning and indexing for analytical workloads
💰 $16.66K performance gap identified	BI analysis for business decision prioritization
📦 1.2M+ records processed	Dynamic pricing pipeline with explainable ML artifacts
🧪 126 tests in eSports project	Reliable ETL + analytics delivery workflow

📌 BI / Analytics Business Cases

Case 1 — Seller Performance Gap (Revenue Uplift)

Business problem: Uneven seller performance was reducing total revenue potential.
Analysis performed: Segmentation by seller contribution, variance analysis, and gap quantification using Power BI + DAX.
Impact: Identified a $16.66K performance opportunity and prioritized intervention areas.

Case 2 — Customer Value & Premium Spend Behavior

Business problem: Marketing needed clearer targeting of high-value customer segments.
Analysis performed: Behavioral profiling by demographics, household structure, tenure, and channel mix.
Impact: Highlighted premium-spend segments and enabled more focused campaign planning.

Case 3 — Executive Dashboard Decision Layer

Business problem: Stakeholders lacked a concise decision view across KPIs.
Analysis performed: Built desktop/mobile KPI dashboards with narrative flow (context → insight → action).
Impact: Reduced reporting friction and improved decision speed for non-technical audiences.

👋 About Me

Junior Data Engineer & Data Analyst | Computer Engineering Student (ESPOL, 7th semester)

I build production-style, reproducible data products that combine engineering reliability with business decision impact.
I work across the full cycle: ingestion, transformation, validation, analytics modeling, and stakeholder-facing BI delivery.

What I bring

Built ETL/ELT workflows with Pandera validation, automated testing, and CI/CD quality gates.
Implemented SQL transformation layers and indexing strategies, improving analytical query performance by up to 40%.
Delivered KPI-driven BI outputs that identified $16.66K performance opportunities for business action.
Modeled applied ML pipelines (e.g., pricing) over 1.2M+ records with explainability artifacts for transparent decisions.
Automated reproducible delivery flows from raw data to dashboard/web-ready outputs.

🎯 Target Roles (What I can do from day one)

🛠️ Junior Data Engineer

Build and maintain reproducible ETL/ELT pipelines
Implement data contracts and validation gates (Pandera)
Develop analytics engineering layers with SQL, DuckDB, dbt
Set up testing strategy and CI/CD reliability checks
Deliver versioned, maintainable data artifacts + runbooks

📊 Junior Data Analyst / BI Analyst

Define and monitor KPI frameworks tied to business goals
Build executive-ready dashboards (Power BI) for decisions
Translate analysis into clear, action-oriented narratives
Perform EDA and trend analysis for opportunity detection
Support stakeholders with insight-to-action recommendations

🔭 Current Focus

📘 Certification Track	PL-300: Microsoft Power BI Data Analyst — Strengthening advanced modeling, DAX, and business storytelling for decision-focused dashboards.
☁️ Learning Path	Cloud + dbt — Building stronger foundations in modern data stack practices, transformation workflows, and analytics engineering standards.
🧩 Career Optimization	Portfolio optimization for job applications — refining project narratives, measurable impact, and recruiter-facing positioning for Junior Data Engineer / Data Analyst opportunities.

🚧 Current Focus Project

Portfolio Refactor — Dual Track Positioning (Data Engineer + Data Analyst)

Status: In progress

I am currently refactoring this portfolio repository to present my two professional tracks in a clearer and more strategic way:

Data Engineer Track: featured projects focused on ETL/ELT, data quality, analytics engineering, reproducible pipelines, testing, and CI/CD.
Data Analyst / BI Track: featured projects focused on KPI modeling, dashboard storytelling, business insights, and executive reporting.

Current Workstreams

Restructuring project sections to make each role path explicit for recruiters and ATS.
Improving content hierarchy (impact first, stack second, implementation third).
Standardizing project maturity labels (Production-ready / Active maintenance / Completed).
Polishing responsive design for mobile and desktop readability.
Strengthening call-to-action messaging for Junior Data Engineer / Data Analyst opportunities.

Goal

Deliver a recruiter-friendly profile that communicates both technical depth and business impact in under 60 seconds.

🌎 Spoken Languages

Actively preparing for C1 certification

🏆 Certifications & Awards

🎖️ Certification / Award	🏢 Issuer	📅 Status / Date	🔗 Link
📗 Microsoft Office Specialist: Excel Associate (Microsoft 365 Apps)	Microsoft	Issued: Mar 2026	📄 Credential
📊 Data Analyst Associate	DataCamp	Issued: Mar 2026	📄 Credential
🛠️ ETL y ELT en Python	DataCamp	Issued: Mar 2026	📄 Credential
🌍 Galactic Problem Solver — Global Nominee	NASA Space Apps Challenge	Oct 2025	📄 View
🤖 Desarrollo con IA: de 0 a Producción	BIG school	Issued: Mar 2026	📜 Credential
📊 Data-Driven Decision Specialist (Bootcamp)	ESPOL & MINTEL	Completed (Graduation: Apr 2026)	⭐ Top Project

🚀 Featured Projects (Priority Order)

🛠️ Data Engineer Track — Top Matches

🟫 RideFare — Production Data & Pricing Intelligence Platform

Production-Style Data Engineering + ML + Analytics Product

From notebook-based analysis to a reproducible, production-style data product with public delivery.

Pipeline Modernization: Rebuilt legacy notebook flow into reproducible commands (ridefare ingest, transform, train, export-web) with clear operational interfaces.
Data Quality by Design: Implemented schema and validation controls with Pandera, stable transformations with DuckDB + dbt, and versioned public artifacts.
Explainable ML Delivery: Trained and exported XGBoost + SHAP artifacts for transparent model behavior and scenario exploration.
Public Product Interface: Delivered a Spanish-language Next.js web experience (/dashboard, /como-funciona, /escenarios) powered by deterministic exported JSON.
Automation & Deployment: Integrated CI validation, artifact refresh workflows, preview/prod deploy pipelines, and release automation.

🔷 Technology Trend Analysis Platform

Production-Style Multi-Source Data Engineering Platform

Tracking real-time developer technology trends by orchestrating data from GitHub, StackOverflow, and Reddit into a unified analytics engine.

🌐 Multi-Source ETL: Consolidates developer signals from GitHub, StackOverflow, and Reddit into a canonical pipeline.
🛡️ Data Quality Gates: Enforces schema and validation rules with Pandera data contracts.
⚡ Modern Analytics Engine: Uses DuckDB for trend computation, ranking, and lightweight analytical workloads.
✅ Production Discipline: 133+ passing tests with automated CI/CD workflows and scheduled refreshes.
📱 Delivery Layer: Serves insights to a Flutter Web dashboard with stable bridge outputs for frontend consumption.

📊 Data Analyst / BI Track — Top Matches

📊 Customer Profile Analytics Dashboard (Power BI)

Status: Completed | Track: Data Analyst / BI Analyst

Built a reproducible workflow: raw dataset → Python preprocessing notebook → validated clean CSV → Power BI dashboard.
Modeled customer behavior across demographics, household composition, tenure, and channels.
Delivered desktop + mobile report layouts for stakeholder-ready consumption.
Produced a clear commercial narrative around high-value segments and premium spend behavior.

🛒 Grocery Sales BI Dashboard (Analytical Case)

Status: Completed | Track: Data Analyst / BI Analyst

Identified a $16.66K performance gap across seller performance.
Surfaced top revenue category with $80.05K for commercial prioritization.
Analyzed 23 active sellers, with Tulsa highlighted as the strongest market.
Built with Power BI + DAX + Excel to deliver a concise decision-making dashboard.

📁 Additional Projects

🎮 eSports Analytics Dashboard LATAM

Status: Completed | Track: Data Engineer + Data Analyst

Built a full pipeline: MySQL → Python ETL → validated JSON contracts → web dashboard.
Integrated Random Forest projections (2026) to combine descriptive and predictive analytics.
Delivered reliable outputs with 126 automated tests and CI-driven deployment.
Consolidated ecosystem visibility across teams, players, competitions, and prize performance.

🌾 Rice Crop Analytics Platform

Status: Completed | Track: Data Engineer + Data Analyst

Engineered pipeline: MySQL → Python ETL → JSON outputs → 5-view web dashboard.
Modeled strategic recovery from -5.58% ROI to +15% target (+20.6 pts).
Projected +75% productivity uplift with KPI-driven operational analysis.
Delivered reproducible implementation backed by automated ETL tests.

🏓 Statistical Analysis — Ping Pong Precision Model

Status: Completed | Track: Data Analyst (Statistical Modeling)

Validated a Negative Binomial model with goodness-of-fit acceptance (p = 0.6603).
Processed 309 observations and confirmed mean serve time under 2 seconds (1.945s).
Automated JSON/PNG exports from R pipeline for dashboard-ready delivery.
Improved interpretability by packaging statistical outputs into a lightweight web report.

🛰️ NASA Space Apps Challenge 2025 — Weather for All

Status: Completed | Track: Full-Stack + Applied Analytics

Built MVP in 48 hours during NASA Space Apps Challenge.
Processed 10 years of climate-related data for 195+ countries.
Delivered interactive map workflows with <2s response time for user exploration.
Recognized as Galactic Problem Solver (Global Nominee).

🛠️ Technical Stack

Category	Technologies
💻 Languages
⚙️ Data Engineering & DBs
🤖 Machine Learning
🧪 Testing & Quality
📊 Visualization & BI
🌐 Web & Mobile
🚀 DevOps & Cloud
📚 Learning