Projects

Data Catalog Automation with AWS Glue

Automated metadata discovery and cataloging of datasets in S3 using AWS Glue Crawlers and Lambda to keep data catalogs updated for self-service analytics.

AWS GlueAWS LambdaPython

Automated Data Quality Checks Using AWS Glue & Lambda

Developed a data quality framework leveraging AWS Glue ETL jobs and Lambda functions to automatically perform validations such as null checks, duplicates, and referential integrity after data ingestion. Increased data reliability and trustworthiness while reducing manual data QA efforts by 75%.

AWS GlueAWS LambdaAWS S3AWS SNS

AWS Glue Data Lake ETL Pipeline

Built an automated ETL pipeline using AWS Glue to ingest, clean, and transform large datasets from S3 data lake into curated tables, optimizing data workflows for analytics and BI.

AWS GlueAWS S3AWS LambdaPySpark

Real-time Data Processing with AWS Kinesis

Implemented a real-time streaming data pipeline using AWS Kinesis and Lambda functions to process and store streaming data into S3 for downstream analytics and alerting.

AWS KinesisAWS LambdaAWS S3Python

Data Warehouse Modernization with Redshift & Glue

Migrated legacy data warehouse ETL workflows to AWS Glue and Amazon Redshift by building Glue ETL jobs to transform and load data from S3 into Redshift tables, improving query performance and reducing operational costs. Improved BI report availability and speed by 3x and achieved significant cost savings through cloud migration.

AWS GlueAmazon RedshiftAWS S3AWS IAM

S/4HANA to BW/4HANA Real-Time Streaming

Built real-time data replication from S/4HANA to BW/4HANA and Kafka using SLT and SAP DI. Enabled fault-tolerant streaming and optimized resource usage while reducing data loading time by 40%. Architected for both finance and operational use cases.

SAP S/4HANABW/4HANASLTKafka

B4HANA to Databricks Data Pipeline

Developed a robust data pipeline to migrate transactional data from SAP B4HANA to Databricks using PySpark. Enabled real-time analytics and improved data accessibility for business teams.

SAP B4HANADatabricksPySpark

SAP Data Intelligence Tenant Automation

Created Python-based automation to hibernate unused SAP DI tenants, saving $100K–$200K annually. Included alerting, scheduled monitoring, and full REST API integration for governance and environment consistency across multiple landscapes.

SAP Data IntelligencePythonAPIs

SAP Integration with Kafka and Azure

Developed scalable, automated data pipelines from SAP BW/4HANA to AWS S3 and Azure Data Lake using Airflow, Databricks, and SAP Data Intelligence. Triggered pipelines via Autosys and ensured delta handling across 1B+ rows. Reduced latency by 80% and improved end-to-end data availability.

SAP Data IntelligenceKafkaAzure Data FactoryPython