Skip to content
View pari1jay's full-sized avatar
🏠
Working from home
🏠
Working from home

Block or report pari1jay

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pari1jay/README.md

Data Science | ML | AI | Data Analyst | Data Engineer |

Projects πŸš€

  • Tools: Data Preparation, ML: random forest, XGboost, Scikitlearn
  • Description: Build a machine learning model to predict which chemical compounds can fight HIV effectively, helping researchers focus on the most promising candidates and skip compounds that likely won't work. Eg: You have 40,000 chemical compounds to test. Instead of testing all in the lab (expensive and slow!), you use AI to predict which might work.
  • Tools: Modeling - Decision Tree, Random Forest, and XGBoost, Correlation Analysis
  • Description: Objective1 : is it possible to classify songs into genres with just audio features \n Objective2 : what can these audio features tell us about the distinctions between genre (Naive Bayes, Decision Tree, KNN)
  • Tools: Python, NLP, scikit-learn, NLTK
  • Description: Built sentiment classification models (Naive Bayes, Decision Tree, KNN) on customer feedback with accuracies of 78.66%–79.83%. Preprocessed text data using CountVectorizer, label encoding, and NLTK techniques. Created word clouds, sentiment distribution plots, and confusion matrices to visualize insights.
  • Tools: Python, Machine Learning, Streamlit
  • Description: Developed a web app using the Texas housing dataset (txhousing) to forecast sales trends. Enabled user-driven predictions for city-specific and global trends. Integrated EDA, model selection, and real-time output using a Streamlit-based UI.
  • Tools: Shell, Airflow and Kafka
  • Description: Extracted and integrated data from SQL queries, APIs, and web scraping using both ETL and ELT approaches. Designed scalable pipelines to feed a centralized data warehouse, improving data accuracy by 25%. Completed hands-on labs using Kafka and Airflow.
  • Tools: Python, scikit-learn, Deep Learning
  • Description: Predicted weekly sales for 45 Walmart stores using store-level sales data, holiday indicators, and economic factors. Performed data cleaning, feature engineering, and model tuning using Gradient Boosting, Decision Trees, and Deep Neural Networks, improving forecasting for inventory and promotion planning.
  • Tools: TensorFlow, Deep Learning, Image Segmentation
  • Description: Designed a U-Net based segmentation model to detect crop rows from aerial field images. Optimized performance using IoU (Intersection over Union) and improved image preprocessing and annotation strategies.
  • Tools: R, Data Visualization, R shiny, Statistics
  • Description: Explored county-level Midwest U.S. census data to identify patterns in income, education, and population distribution. Presented findings through statistical summaries and visualizations to uncover regional disparities.

Experience πŸ’Ό

Data Assistant, Research Assistant | Indiana University | Indianapolis | Apr 2023 – Jun 2025

  • Tools: ML, AI, Chatbot, RAG, SQL, DBMS, DLSG softwares, Freeflow, Metadata, Batch Processing, ETL, Data warehouses, Delta lakehouse, Schema design.

Data Engineer/Data Analyst | Netcube Technologies | Bangalore, India | Jan 2019 – Feb 2022

  • Tools: SQL, Azure, Apache Airflow, GitHub, Restful APIs, Flask, ETL/ELT, CI/CD pipelines, SQL, NoSQL, Data warehouses, Delta lakehouse, SAP ERP, Tableau, Power BI.

Associate Software Engineer | Tech Mahindra | Bangalore, India | Aug 2016 – Oct 2018

  • Tools: Oracle DB, HP ALM, Python, Automation testing scripts, Data warehouses, Litmus Magix, Selenium.

Education πŸŽ“

  • Master of Science in Applied Data Science | Indiana University Indianapolis | Jan 2023 – May 2024 | Dean’s Scholarship Recipient

    • Coursework: Data Analytics using Python and R, Data Visualization, Deep Learning, Cloud Computing, DBMS, Statistics
  • Bachelor of Engineering | Mechanical, Aeronautics | Mangalore Institute of Technology and Engineering, VTU, India


Let's Connect! 🌐

I'm open to collaborating on interesting projects or discussing new opportunities. Feel free to reach out!


Pinned Loading

  1. 1_Crop-row-detection 1_Crop-row-detection Public

    Developed a deep learning model in Python to detect crop rows from input images, utilizing U-Net architecture with TensorFlow for image segmentation. Evaluated model performance using the Intersect…

    Jupyter Notebook 1

  2. 2_Drug-Efficacy-Prediction-Model 2_Drug-Efficacy-Prediction-Model Public

    Build a machine learning model to predict which chemical compounds can fight HIV effectively, helping researchers focus on the most promising candidates and skip compounds that likely won't work.

    Jupyter Notebook

  3. 3_Midwest-dataset-project-using-R 3_Midwest-dataset-project-using-R Public

    In this project, I aim to conduct a comprehensive analysis of demographic and socioeconomic data for counties in the Midwest region of the United States. The dataset, provides information on variou…

    R

  4. 4_Sales-Prediction-using-ML 4_Sales-Prediction-using-ML Public

    The project is on developing a sales prediction Web app using Texas housing dataset('txhousing'). The goal here is to provide insights into real estate sales trends using this dataset. I have used …

    Jupyter Notebook 1 1

  5. 5_Spotify-classification-R 5_Spotify-classification-R Public

    Automatic genre classification has long captivated researchers in Music Information Retrieval (MIR), seeking techniques to unravel the complex tapestry of musical diversity. We aim to delve into th…

    1

  6. 6_Customer-sentiment-Analysis 6_Customer-sentiment-Analysis Public

    This project focuses on analyzing customer sentiment based on textual data, such as product reviews, feedback, or social media posts. The goal is to classify customer feedback into different sentim…

    Jupyter Notebook