🛠️ Skills & Expertise
💻 Programming
- Python — ETL pipelines, data analysis, testing
- SQL — analytics, warehousing, querying big datasets
- R — statistical workflows, data validation, reproducible analysis
- Bash — scripting & automation
⚙️ Data Engineering
- Apache Airflow — orchestrating workflows
- Apache Spark — distributed data processing
- dbt — analytics engineering & transformations
- Parquet & Arrow — high-performance data storage & in-memory analytics
- DuckDB — fast local analytics
🤖 Machine Learning & AI
- LlamaIndex — RAG & multi-agent system orchestration
- Intelligent Doc Processing — OCR & LLM-driven PDF analysis systems
- Statistical Modeling — Synthetic data generation (Poisson, Log-Normal, Normal)
- ML Frameworks — PyTorch, SciPy, NumPy, Scikit-Learn
🗄️ Databases & Analytics
- PostgreSQL, MongoDB, DuckDB, SQLite
- Tableau — dashboards & visualization
- Grafana — monitoring & metrics
🧰 Engineering Practices
- CI/CD — GitHub Actions
- Docker — containerized workflows
- Unit testing & data validation
- Data quality checks & reproducible pipelines