Skip to content
View Sam-24-dev's full-sized avatar

Block or report Sam-24-dev

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Sam-24-dev/README.md
Samir Caizapasto - Data Analysis Specialist Value Proposition

Junior Data Engineer & Data Analyst | Open to Remote/Hybrid (LATAM/US)

I build production-style data systems and decision-focused BI products.

Impact highlights: 133+ tests in CI, up to 40% faster SQL workloads, $16K+ performance gaps identified, 1.2M+ records in ML pricing pipelines.

📍 Guayaquil, Ecuador · 🌎 Open to Remote / Hybrid opportunities


🧭 Choose Your Track

🛠️ Hire me as a Data Engineer

  • ETL/ELT pipelines, data contracts, CI/CD
  • Pandera, DuckDB, dbt, SQL optimization
  • Reproducible CLI workflows and tested delivery

🔗 Best-fit projects:

📊 Hire me as a Data Analyst / BI Analyst

  • KPI modeling, dashboard storytelling, executive reporting
  • Power BI, SQL, Python preprocessing
  • Business insights translated into action plans

🔗 Best-fit projects:


📌 Impact Metrics

Metric Proof of Impact
133+ automated tests Production-style data platforms with CI quality gates
Up to 40% faster queries SQL tuning and indexing for analytical workloads
💰 $16.66K performance gap identified BI analysis for business decision prioritization
📦 1.2M+ records processed Dynamic pricing pipeline with explainable ML artifacts
🧪 126 tests in eSports project Reliable ETL + analytics delivery workflow

📌 BI / Analytics Business Cases

Case 1 — Seller Performance Gap (Revenue Uplift)

Business problem: Uneven seller performance was reducing total revenue potential.
Analysis performed: Segmentation by seller contribution, variance analysis, and gap quantification using Power BI + DAX.
Impact: Identified a $16.66K performance opportunity and prioritized intervention areas.

Case 2 — Customer Value & Premium Spend Behavior

Business problem: Marketing needed clearer targeting of high-value customer segments.
Analysis performed: Behavioral profiling by demographics, household structure, tenure, and channel mix.
Impact: Highlighted premium-spend segments and enabled more focused campaign planning.

Case 3 — Executive Dashboard Decision Layer

Business problem: Stakeholders lacked a concise decision view across KPIs.
Analysis performed: Built desktop/mobile KPI dashboards with narrative flow (context → insight → action).
Impact: Reduced reporting friction and improved decision speed for non-technical audiences.


👋 About Me

Junior Data Engineer & Data Analyst | Computer Engineering Student (ESPOL, 7th semester)

I build production-style, reproducible data products that combine engineering reliability with business decision impact.
I work across the full cycle: ingestion, transformation, validation, analytics modeling, and stakeholder-facing BI delivery.

What I bring

  • Built ETL/ELT workflows with Pandera validation, automated testing, and CI/CD quality gates.
  • Implemented SQL transformation layers and indexing strategies, improving analytical query performance by up to 40%.
  • Delivered KPI-driven BI outputs that identified $16.66K performance opportunities for business action.
  • Modeled applied ML pipelines (e.g., pricing) over 1.2M+ records with explainability artifacts for transparent decisions.
  • Automated reproducible delivery flows from raw data to dashboard/web-ready outputs.

🎯 Target Roles (What I can do from day one)

🛠️ Junior Data Engineer

  • Build and maintain reproducible ETL/ELT pipelines

  • Implement data contracts and validation gates (Pandera)

  • Develop analytics engineering layers with SQL, DuckDB, dbt

  • Set up testing strategy and CI/CD reliability checks

  • Deliver versioned, maintainable data artifacts + runbooks

📊 Junior Data Analyst / BI Analyst

  • Define and monitor KPI frameworks tied to business goals

  • Build executive-ready dashboards (Power BI) for decisions

  • Translate analysis into clear, action-oriented narratives

  • Perform EDA and trend analysis for opportunity detection

  • Support stakeholders with insight-to-action recommendations


🔭 Current Focus

📘 Certification Track PL-300: Microsoft Power BI Data Analyst — Strengthening advanced modeling, DAX, and business storytelling for decision-focused dashboards.
☁️ Learning Path Cloud + dbt — Building stronger foundations in modern data stack practices, transformation workflows, and analytics engineering standards.
🧩 Career Optimization Portfolio optimization for job applications — refining project narratives, measurable impact, and recruiter-facing positioning for Junior Data Engineer / Data Analyst opportunities.

🚧 Current Focus Project

Portfolio Refactor — Dual Track Positioning (Data Engineer + Data Analyst)

Status: In progress

I am currently refactoring this portfolio repository to present my two professional tracks in a clearer and more strategic way:

  • Data Engineer Track: featured projects focused on ETL/ELT, data quality, analytics engineering, reproducible pipelines, testing, and CI/CD.
  • Data Analyst / BI Track: featured projects focused on KPI modeling, dashboard storytelling, business insights, and executive reporting.

Current Workstreams

  • Restructuring project sections to make each role path explicit for recruiters and ATS.
  • Improving content hierarchy (impact first, stack second, implementation third).
  • Standardizing project maturity labels (Production-ready / Active maintenance / Completed).
  • Polishing responsive design for mobile and desktop readability.
  • Strengthening call-to-action messaging for Junior Data Engineer / Data Analyst opportunities.

Goal

Deliver a recruiter-friendly profile that communicates both technical depth and business impact in under 60 seconds.


🌎 Spoken Languages

      
Actively preparing for C1 certification

🏆 Certifications & Awards

🎖️ Certification / Award 🏢 Issuer 📅 Status / Date 🔗 Link
📗 Microsoft Office Specialist: Excel Associate (Microsoft 365 Apps) Microsoft Issued: Mar 2026 📄 Credential
📊 Data Analyst Associate DataCamp Issued: Mar 2026 📄 Credential
🛠️ ETL y ELT en Python DataCamp Issued: Mar 2026 📄 Credential
🌍 Galactic Problem Solver — Global Nominee NASA Space Apps Challenge Oct 2025 📄 View
🤖 Desarrollo con IA: de 0 a Producción BIG school Issued: Mar 2026 📜 Credential
📊 Data-Driven Decision Specialist (Bootcamp) ESPOL & MINTEL Completed (Graduation: Apr 2026) ⭐ Top Project

🚀 Featured Projects (Priority Order)

🛠️ Data Engineer Track — Top Matches

Production-Style Data Engineering + ML + Analytics Product

From notebook-based analysis to a reproducible, production-style data product with public delivery.

  • Pipeline Modernization: Rebuilt legacy notebook flow into reproducible commands (ridefare ingest, transform, train, export-web) with clear operational interfaces.
  • Data Quality by Design: Implemented schema and validation controls with Pandera, stable transformations with DuckDB + dbt, and versioned public artifacts.
  • Explainable ML Delivery: Trained and exported XGBoost + SHAP artifacts for transparent model behavior and scenario exploration.
  • Public Product Interface: Delivered a Spanish-language Next.js web experience (/dashboard, /como-funciona, /escenarios) powered by deterministic exported JSON.
  • Automation & Deployment: Integrated CI validation, artifact refresh workflows, preview/prod deploy pipelines, and release automation.
   

Production-Style Multi-Source Data Engineering Platform

Tracking real-time developer technology trends by orchestrating data from GitHub, StackOverflow, and Reddit into a unified analytics engine.

  • 🌐 Multi-Source ETL: Consolidates developer signals from GitHub, StackOverflow, and Reddit into a canonical pipeline.
  • 🛡️ Data Quality Gates: Enforces schema and validation rules with Pandera data contracts.
  • ⚡ Modern Analytics Engine: Uses DuckDB for trend computation, ranking, and lightweight analytical workloads.
  • ✅ Production Discipline: 133+ passing tests with automated CI/CD workflows and scheduled refreshes.
  • 📱 Delivery Layer: Serves insights to a Flutter Web dashboard with stable bridge outputs for frontend consumption.
   

📊 Data Analyst / BI Track — Top Matches

Status: Completed | Track: Data Analyst / BI Analyst

  • Built a reproducible workflow: raw dataset → Python preprocessing notebook → validated clean CSV → Power BI dashboard.
  • Modeled customer behavior across demographics, household composition, tenure, and channels.
  • Delivered desktop + mobile report layouts for stakeholder-ready consumption.
  • Produced a clear commercial narrative around high-value segments and premium spend behavior.
 

🛒 Grocery Sales BI Dashboard (Analytical Case)

Status: Completed | Track: Data Analyst / BI Analyst

  • Identified a $16.66K performance gap across seller performance.
  • Surfaced top revenue category with $80.05K for commercial prioritization.
  • Analyzed 23 active sellers, with Tulsa highlighted as the strongest market.
  • Built with Power BI + DAX + Excel to deliver a concise decision-making dashboard.


📁 Additional Projects

Status: Completed | Track: Data Engineer + Data Analyst

  • Built a full pipeline: MySQL → Python ETL → validated JSON contracts → web dashboard.
  • Integrated Random Forest projections (2026) to combine descriptive and predictive analytics.
  • Delivered reliable outputs with 126 automated tests and CI-driven deployment.
  • Consolidated ecosystem visibility across teams, players, competitions, and prize performance.
 

Status: Completed | Track: Data Engineer + Data Analyst

  • Engineered pipeline: MySQL → Python ETL → JSON outputs → 5-view web dashboard.
  • Modeled strategic recovery from -5.58% ROI to +15% target (+20.6 pts).
  • Projected +75% productivity uplift with KPI-driven operational analysis.
  • Delivered reproducible implementation backed by automated ETL tests.
 

Status: Completed | Track: Data Analyst (Statistical Modeling)

  • Validated a Negative Binomial model with goodness-of-fit acceptance (p = 0.6603).
  • Processed 309 observations and confirmed mean serve time under 2 seconds (1.945s).
  • Automated JSON/PNG exports from R pipeline for dashboard-ready delivery.
  • Improved interpretability by packaging statistical outputs into a lightweight web report.
 

Status: Completed | Track: Full-Stack + Applied Analytics

  • Built MVP in 48 hours during NASA Space Apps Challenge.
  • Processed 10 years of climate-related data for 195+ countries.
  • Delivered interactive map workflows with <2s response time for user exploration.
  • Recognized as Galactic Problem Solver (Global Nominee).
 

🛠️ Technical Stack

Category Technologies
💻 Languages Python R SQL TypeScript Dart
⚙️ Data Engineering & DBs DuckDB MySQL SQLite Pandas Jupyter
🤖 Machine Learning Scikit-Learn
🧪 Testing & Quality Pytest Pandera
📊 Visualization & BI Power BI Tableau Plotly Excel
🌐 Web & Mobile React Flutter Flask Tailwind CSS Vite Bootstrap Leaflet
🚀 DevOps & Cloud GitHub Actions Vercel Git
📚 Learning AWS dbt

📊 GitHub Stats


⏱️ Weekly Coding Activity

Real-time stats powered by WakaTime — tracking every line of code I write.


WakaTime Stats

📈 Contribution Trend

---

🐍 Contribution Snake

github contribution grid snake animation

🤝 Let’s Connect

Open to Junior Data Engineer / Data Analyst roles (remote/hybrid, LATAM/US).

I’m ready to contribute from day one in data pipeline automation, analytics engineering, and decision-focused BI.

Profile Views

Pinned Loading

  1. Technology-trend-analysis-platform Technology-trend-analysis-platform Public

    Data intelligence platform for technology trends across GitHub, StackOverflow, and Reddit using Python ETL, Pandera quality gates, DuckDB trend engine, and Flutter Web.

    Dart

  2. Analisis-Ping-Pong Analisis-Ping-Pong Public

    Automated statistical analysis pipeline using R to model ping pong serve precision with Negative Binomial distribution (309 observations). Includes interactive web dashboard.

    HTML 1

  3. Analisis-Cultivo-Arroz Analisis-Cultivo-Arroz Public

    End-to-end data engineering platform for agricultural analytics. ETL pipeline (Python) + Interactive dashboard (Chart.js) with KPIs, financial analysis, and strategic insights.

    HTML

  4. easyparker-pwa easyparker-pwa Public

    EasyParker es una PWA para reservar parqueo en Guayaquil | Modos: Conductor y Anfitrión | Chat tiempo real | Eventos con surge pricing | Calificaciones etc| React + TypeScript + Tailwind

    TypeScript

  5. eSports-Analytics-Dashboard eSports-Analytics-Dashboard Public

    Dashboard analítico end-to-end para eSports LATAM con ETL en Python, validación de datos, visualización web y proyección ML 2026.

    Python

  6. RideFare-ETL-Pipeline RideFare-ETL-Pipeline Public

    Portfolio-grade pricing intelligence product for urban mobility, built with DuckDB, dbt, XGBoost, Next.js, and Vercel.

    Jupyter Notebook