Projects


Human Multimedia Preference Correlation Analysis

  • Conducted a comprehensive analysis of human preferences for multimedia content (audio and images) to uncover correlations and trends across demographic groups.
  • Performed statistical correlation analysis, exploratory data analysis (EDA), and created visualizations to effectively present trends and patterns.
  • Tools: Python (Pandas, NumPy, Matplotlib, Seaborn), Statistical Analysis, Data Visualization.
  • Impact: Analyzed 52 users' preferred audio and image samples (100 from each category) across various demographic groups. Identified significant correlations offering actionable insights for personalized content recommendations and user engagement.

WiDS Datathon 2024 - Equity in Healthcare

  • Analyzed real-world patient data (19,000 records) to predict metastatic cancer diagnosis periods using demographic, clinical, and environmental features.
  • Addressed messy data issues through imputation, normalization, and cleaning. Performed exploratory data analysis (EDA) to uncover trends and relationships.
  • Built predictive models to evaluate the impact of geo-demographic and climate factors on cancer diagnoses.
  • Tools: Python (Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn), Feature Engineering, Predictive Modeling.
  • Impact: Achieved 84% accuracy in predicting metastatic diagnosis periods, providing insights into environmental and demographic factors influencing cancer outcomes for targeted prevention strategies.

Cervical Cancer Analysis

  • Conducted a comprehensive analysis of demographic, activity, and health data from 858 patients to predict cervical cancer diagnosis using machine learning techniques.
  • Detected and removed outliers using the isolation forest method and applied feature engineering to enhance model accuracy.
  • Tools: Python (Scikit-learn, Pandas, NumPy), Statistical Analysis.
  • Impact: Achieved a maximum accuracy of 95% using a cervical cancer dataset, providing a cost-effective and scalable solution for early-stage diagnosis and timely medical interventions.

Stock Market Price Prediction

  • Analyzed historical stock market data to predict future stock prices using machine learning techniques.
  • Performed time-series analysis and feature engineering on historical price and volume data, applying Long Short-Term Memory (LSTM) networks for prediction.
  • Tools: Python (Pandas, TensorFlow, Keras, Scikit-learn), Time-Series Forecasting, LSTM, Data Visualization.
  • Impact: Built a model with 30,000+ records of stock market data, achieving 90% accuracy in predicting stock price movements, enabling more informed decision-making in portfolio management.

Customer Churn Prediction

  • Analyzed customer behavior data to predict churn using classification techniques like logistic regression, random forests, and gradient boosting.
  • Performed feature selection, data preprocessing, and hyperparameter tuning to enhance model performance.
  • Achieved 88% accuracy, helping businesses identify at-risk customers and improve retention strategies.
Tools: Python (Scikit-learn, XGBoost, Matplotlib, Seaborn), Predictive Modeling

Fake News Detection System

  • Developed a machine learning system to detect fake news using natural language processing (NLP) techniques.
  • Extracted features using TF-IDF and word embeddings; implemented classification models like SVM and neural networks.
  • Achieved an F1-score of 92%, enhancing the reliability of news dissemination platforms.
Tools: Python (NLTK, Scikit-learn, TensorFlow), NLP, Machine Learning

Retail Sales Forecasting

  • Built a forecasting system for retail sales using machine learning and time-series models like ARIMA and Prophet.
  • Analyzed historical sales data to predict future demand, reducing overstock and understock scenarios.
  • Improved sales prediction accuracy by 15%, optimizing inventory management and resource allocation.
Tools: Python (Pandas, Statsmodels, Prophet, Matplotlib), Time-Series Analysis