πŸ› οΈ Professional Experience

πŸ₯ Population Data BC

ML Engineer Sep 2025 – Apr 2026

The Mission: Optimizing the bridge between massive provincial health datasets and academic research.

Key Contributions:

  • Performance Engineering: Led the transition from legacy CSV ingestion to a high-performance Apache Parquet and Apache Arrow stack.
  • Infrastructure & Automation: Re-engineered core pipelines into a container-ready architecture utilizing Airflow orchestration and DuckDB for lightning-fast, reproducible local processing.
  • Intelligent Document Processing: Integrated OCR and Retrieval-Augmented Generation (RAG) via LlamaIndex to extract critical insights, generate key summaries, and flags application deficiencies of research requests.
  • Statistical Modeling: Developed a high-fidelity synthetic data generator utilizing complex statistical distributions (including Log-Normal, Poisson, and Normal) to accurately model sensitive variable relationships for research requests.

Stack: Python LlamaIndex Airflow DuckDB Arrow Parquet OCR SciPy

πŸ“± Samsung R&D

Data Engineer Sep 2024 – Aug 2025

The Mission: Managing global-scale data infrastructure and privacy compliance.

Key Contributions:

  • High-Volume Orchestration: Managed the real-time processing of 500M+ records per day. I deployed automated CI/CD pipelines via GitHub Actions that handled dynamic PII (Personal Identifiable Information) tagging across 10+ data streams.
  • Data Reliability: Owned the orchestration of cross-region transfers for 200+ datasets, maintaining a strict 99% availability SLA for downstream analytics teams.
  • Analytics Engineering: Leveraged dbt (data build tool) to transform raw FastAPI-backed application data into actionable insights, automating the workflows for over 50 executive Tableau dashboards.

Stack: AWS Redshift dbt Airflow GitHub Actions Tableau

πŸ’‰ Vancouver Coastal Health

Software Developer Intern May 2024 – Aug 2024

The Mission: Enhancing data integrity for clinical informatics systems.

Key Contributions:

  • Validation Frameworks: Developed a custom R package using the testthat framework to enforce data quality at the ingestion layer. This prevented β€œdirty data” from entering downstream clinical pipelines.
  • Modern Storage: Spearheaded the migration of historical clinical records from unstructured flat files to a centralized SQLite-based storage system, enabling faster retrieval and more complex analytical queries.
  • Pipeline Maintenance: Maintained and optimized critical analytical pipelines that support real-time clinical informatics, ensuring healthcare providers had access to validated data.

Stack: R testthat SQLite Clinical Informatics Git