This is a greenfield project creating new data pipelines, data science analytics, and machine learning pipelines to deliver a customer focussed understanding of guests visiting Merlin Entertainment's sites and attractions.
Developed new processing pipelines for medical data in Azure. Supported and guided the implementation of a medallion architecture for the data platform. Worked on transformation logic, analytics, and delivery of data into GCP for use with BigQuery.
Skills used: Azure, Python, Data Factory, Spark, SQL, Databricks, Delta, Batch, Azure DevOps, GCP, BigQuery
Developed a new pipeline for geo data to extract, process, and query geojson and geopackage data for display in QGIS/ArcGIS and data export for customers, leading to new sales for the company.
Skills used: Scala, Python, Spark, Databricks, Airflow, Terraform, Bitbucket pipelines, Fargate).
Increased performance and stability of the data pipeline, and migrated datasets to save over 500K in annual costs as well as engineer toil.
Optimised data usage through system analysis and improved design saving over 400K/year.
Currently working on the realtime processing stack.
Skills used: Kafka, Airflow, Glue, PySpark, Databricks, Batch, Samza, EMR, AWS infrastructure, Lambda, Cloudformation, Python