Optimized data processing: Converted SAS scripts to PySpark and automated the data processing pipeline, incorporating
metadata scripts and orchestration JSON, achieving a 50% reduction in data processing time.
Data management: Organized and managed over 10 million records using Spark libraries (PySpark SQL and PySpark
MLlib), improving data accessibility.
Testing and orchestration: Implemented testing environments with Azure DataBricks and Azure Data Factory, increasing
code testing efficiency by 50% and reducing debugging time by 35%.
Pipeline construction: Built a data pipeline to process 100 million raw records from 14 data sources, accelerating
reporting and analytics capabilities.
Education:
2018 - 2022
Bachelor of Engineering - Computer Science, Honors in Data Science
- Smt. Kashibai Navale college of Engineering, Pune
Relevant coursework: Operating Systems, Data Structures, Analysis of Algorithms, Artificial Intelligence, Machine Learning,
Networking, Databases, SQL, Java.