👋

Debosmita Chatterjee

Data Scientist & Machine Learning Engineer

💡 About Me

I am currently in career development in the area of Data Science and Artificial Intelligence. This is my Portfolio to keep track of my projects, contributions & publications.

My background in Mechanical – I am passionate about data analytics, numbers, and programming, but also have domain knowledge on Banking, Finance and HealthCare & development experiences.

I am actively looking for job opportunities where I can help companies in optimizing their businesses and product development by providing meaningful data insights through Python, SQL, and modern analytics tools.

Feel free to contact me if you found my website useful and think I might be a good fit or simply would like to connect ☕

👩‍🔬 What I Possess as a Data Scientist

Personally, domain knowledge and the ability to learn on the fly are far more valuable than textbook understanding of machine learning algorithms. The good things I possess as a Data Scientist are:

Technical Excellence

  • Build robustly simple models that generalize effectively
  • Understanding that ML models don't always need to be complicated
  • Strong analytical thinking with problem-solving skills

Business Acumen

  • Express complex issues in simple, transparent manner
  • Detail-oriented with business competitive understanding
  • Strong presentation and communication skills

✍️ Technical Skills

Programming & Tools

Python SQL Git Command Line

Data Science Libraries

Pandas NumPy Matplotlib Scikit-learn

Visualization & BI

Tableau Power BI Plotly Seaborn

Machine Learning

Classification Regression Clustering Deep Learning

Specialized Areas

NLP Computer Vision Cloud Computing AWS

Statistics & Analysis

Statistics Probability A/B Testing Data Mining

Ready to apply my Data Science skills to solve business problems - full-time, contract, hybrid, remote or freelance opportunities welcome.

🚀 Featured Projects

Policy Lapse Prediction – Life Insurance

Policy lapses are a critical challenge in the life insurance industry, leading to revenue loss and disrupted customer relationships. Built ML models to predict policy lapses using customer and policy-related data. Experimented with Random Forest and XGBoost variants, achieving 85%+ accuracy and optimized feature selection using RFE to reduce model complexity by 40%.

Python Scikit-learn XGBoost RandomForest RFE Streamlit

Auto Insurance Claims Fraud Detection

Identified potentially fraudulent auto insurance claims using machine learning techniques. Processed 10,000+ insurance records with comprehensive EDA through Dataiku DSS. Achieved 92% precision in fraud detection while maintaining low false positive rates, potentially saving significant claim costs.

Python SQL PostgreSQL Tableau Dataiku DSS

Stock Analysis & Forecasting Dashboard

Interactive Streamlit app for analyzing, forecasting, and comparing stock market data using ML and time series models. Implements technical indicators, clustering algorithms (KMeans, DBSCAN), and statistical forecasting. Processes real-time data from multiple stock exchanges with automated feature engineering pipeline.

Streamlit yfinance Scikit-learn Statsmodels KMeans DBSCAN

Image-to-Speech GenAI Tool Using LLM

End-to-end Multimodal Generative AI application that converts uploaded images into narrated audio stories. Combines computer vision, large language models, and text-to-speech synthesis. Deployed using Streamlit and Hugging Face Spaces, generating creative narratives in under 5 seconds with seamless audio synthesis.

Transformers Langchain OpenAI Streamlit Hugging Face

NovaShield – AI Policy Assistant

AI-powered virtual insurance assistant that helps policyholders find answers and navigate policy actions. Built with RAG architecture using FAISS vector search and intent extraction. Handles 50+ policy document types with 95%+ accuracy in intent classification, automatically escalating delays beyond SLA thresholds.

RAG FAISS SentenceTransformers pdfplumber Streamlit

NLP Text Extractor & Classifier API

Hybrid NLP-powered Flask API that classifies customer service requests and extracts relevant entities using ML, spaCy NER, and regex rules. Deployed on AWS ECS Fargate with Swagger documentation for seamless integration. Processes 1000+ requests/hour with sub-100ms latency using containerized serverless architecture.

Flask spaCy Scikit-learn Docker AWS ECS Swagger

Interested in collaborating or have a project in mind? I'd love to hear from you!

📞 Get In Touch

Interested in collaborating or have a project in mind? I'd love to hear from you!