Portfolio

Name: Anish Man Gurung

Profile: Software Developer

Location: Kiel, Germany

Email: anishm.sde@gmail.com

Skill

Python

PostgreSQL, Redshift, MySQL, Clickhouse

Data Manipulation (Pandas, Polars, Numpy, PySpark)

Machine Learning (Supervised, Unsupervised algorithms)

Deep Learning (Neural networks, CNN, TensorFlow)

LLMs (OpenAI, Hugging Face, LangChain)

AWS (EC2, S3, Lambda, Athena, SageMaker, Glue, MWAA)

Apache Airflow

Apache Kafka

Statistics (Hypothesis tests, Probability, Bayes theorem, A/B Testing)

Web Application Frameworks (FastAPI, Flask, Django, Next.js)

Git

Docker

Tableau

Looker Studio

Scrum

About me

Moin Moin! Ich heiße Anish, and I’m excited to share a bit about my professional journey.

I am currently working as a Data Scientist at a startup, where I help optimize ecommerce performance by leveraging data science, data engineering, and cloud technologies. My work focuses not only on building machine learning models but also on designing robust data pipelines, ensuring reliable data infrastructure, and enabling data-driven decision-making across the organization.

With over five years of professional experience, I have worked across both fast-paced startups and multinational environments. This experience has allowed me to develop strong expertise in Python, backend development, and modern data engineering practices. I specialize in building scalable data pipelines, transforming raw data into meaningful insights, and developing analytics solutions that help businesses improve operational efficiency, increase revenue, and make informed strategic decisions.

I am passionate about helping organizations transform their business processes through data engineering and analytics. From designing data workflows and integrating cloud-based systems to developing machine learning solutions and analytics dashboards, I enjoy working across the full data lifecycle to deliver measurable business impact.

I hold a Master’s degree in Data Science from Kiel University of Applied Sciences, Germany, and a Bachelor’s degree in Electrical and Electronics Engineering from Kathmandu University, Nepal. I also completed a Post Graduate Diploma in Data Science from Great Learning, India.

Beyond my core focus on data science and engineering, I have strong experience in backend software development and working with full-stack systems. This enables me to build end-to-end data products, from data ingestion and processing to deployment and integration with production systems.

Outside of work, I enjoy football and weight lifting, which help maintain balance and discipline. I am also an avid dog lover and value spending quality time with my family and pets.

I am deeply enthusiastic about artificial intelligence, data engineering, and software development, and I am driven by the opportunity to help businesses unlock the full potential of their data.

Thank you for taking the time to learn about me. I welcome opportunities to connect via email or LinkedIn.

Work Experience

Aug 2024 - Current

Data Scientist

Uptain, Hamburg, Germany

Developing data science solutions for an ecommerce optimization tech product. Implementing A/B tests, data analysis, building ETL pipelines and building Machine Learning models, and developing Lookerstudio dashboards.

July 2023 - July 2024

Werk Student

Smiths Detection, Nordestedt, Germany

Data preparation and training neural networks for X-ray 3D images image segmentation. Training hyperparameters and setup optimization using Tensorflow.

Oct 2022 - Feb 2023

Software Engineer

Cotiviti Nepal, Kathmandu, Nepal

Built data ingestion pipelines using an in-house platform based on Apache Hadoop and Spark for US medical data. Transformation of raw source data to standardized format using SQL scripts.

Oct 2020 - June 2022

Software Engineer

GBT Technologies, Kathmandu, Nepal

Software development using frameworks Flask, Django and Odoo. Frontend and backend feature addition to larger projects. Development of Rest APIs, API integration and application version control and deployment using Git and Docker. Designed and led a few software projects with extensive application of Python, Web frameworks, SQL and frontend technologies.

June 2020 - August 2020

Data Science Developer Intern

Numeric Mind Technologies, Kathmandu, Nepal

Development of Python based Data Analytics GUI Application using PythonQT.

Projects

Favourite Mortgages UK

I designed and developed a fully functional Next.js website for a mortgage company, handling the entire process from client communication to implementation. The site features responsive desktop and mobile designs, interactive forms, and integrated Google Analytics for tracking user interactions. I managed both the frontend and backend development, delivering a professional, user-friendly platform that is actively used by the company to engage clients and streamline their processes.

Next.js, Typescript, CSS, Vercel

LLM Based English Newsletter

I built an AI-powered LLM newsletter application that delivers personalized English stories to subscribers on a weekly basis. Each story is customized according to the user’s preferred genres and English proficiency level, while also highlighting key vocabulary and grammar rules to support language learning. The application leverages large language models to generate engaging content tailored to each user, providing an interactive and educational reading experience that combines storytelling with practical language improvement.

Python, LangChain, MySQL, LLM, AWS, FastAPI

Churn Prediction

The Telecom Churn Prediction Web Application endeavors to predict customer churn within the telecom industry using cutting-edge machine learning methodologies. Beginning with a comprehensive data analysis phase utilizing Matplotlib, the project identifies crucial patterns and trends in customer behavior. Subsequently, feature selection is executed through hypothetical statistical tests, ensuring that only the most influential features are integrated into the predictive model. A variety of machine learning models are then developed, including Random Forest, Logistic Regression, and Support Vector Machines, each meticulously trained and fine-tuned to enhance predictive accuracy. Model refinement continues with hyperparameter tuning techniques, optimizing model performance based on key metrics such as recall, precision, accuracy, and AUC score. Through rigorous evaluation, the most effective model emerges – the hypertuned Random Forest classifier. The culmination of this project is the deployment of the winning model within a Django web framework, transforming it into an intuitive web application. Users can effortlessly input customer data and receive churn predictions in real-time. Leveraging Docker for deployment and AWS services such as S3 for hosting static files and EC2 for hosting the web application ensures scalability, reliability, and accessibility for users across diverse platforms and geographical locations.

Python, SQL, Machine Learning, Pandas, Matplotlib, AWS, Django, Docker

Term Deposit Prediction

The project's primary objective is to develop a term-deposit classifier for a bank, crucial for identifying high-value clients. Through meticulous data analysis and machine learning techniques, the project aims to optimize the bank's strategies for client retention and revenue enhancement. Beginning with data visualization using seaborn and matplotlib libraries, the project gains insights into the dataset's characteristics and distributions. This initial exploration lays the groundwork for subsequent data preprocessing and feature engineering, which are tailored to extract meaningful information conducive to model development. Various machine learning models, including Support Vector Machines, K Nearest Neighbours, and XGBoost, are trained on the refined dataset. The models undergo rigorous hyperparameter tuning using nested cross-validation techniques to ensure optimal performance. Performance metrics such as recall and precision are employed to evaluate the efficacy of each model. Ultimately, XGBoost emerges as the winning model due to its superior predictive capabilities. To facilitate seamless deployment and utilization, the trained XGBoost model is saved as a pickle file. This file can be readily employed to make predictions on entirely new datasets, enabling the bank to make informed decisions in real-time. The culmination of this project represents a significant milestone for the bank, empowering it with a powerful tool to identify high-value clients and tailor its strategies for maximum impact and profitability.

Python, Machine Learning, Pandas, Matplotlib, Nested Cross validation, Classification

Readmission Prediction

The Diabetic Patient Readmission Prediction and Patient Profiling project represents a collaborative effort undertaken by a dedicated team under the guidance and supervision of a seasoned professor. This collective endeavor harnesses the expertise and collaboration of team members to navigate the complexities of predictive healthcare analytics. With meticulous attention to detail, the team embarks on a journey through the intricacies of US medical data, guided by comprehensive literature reviews and the professor's invaluable insights. Through collaborative brainstorming sessions and structured discussions, the team formulates a robust methodology to tackle the multifaceted challenges inherent in the project. Each team member brings unique skills and perspectives to the table, contributing to the project's success through their individual expertise in data analysis, machine learning, and domain knowledge. Under the professor's mentorship, the team navigates the entire data science workflow with precision and diligence, ensuring adherence to best practices and rigorous methodologies. Throughout the project lifecycle, regular progress updates and feedback sessions with the professor foster a dynamic learning environment, enabling the team to refine their approaches and address any obstacles encountered along the way. Ultimately, the culmination of this team effort results in not only a successful readmission prediction model but also the development of a sophisticated patient profiling system, poised to make meaningful contributions to diabetic patient care strategies.

Python, Machine Learning, Pandas, Feature Engineering, Statistics, Hyperparameters Tuning, EDA, Clustering

NYC violence Visualization

The goal of this project is to provide a comprehensive understanding of the complex issue of violence in New York City. By visually representing the data related to murders per 100 shootings in each borough, age and race distribution of perpetrators, shootings in each hour of the day, and months with the highest shootings, the project aims to: Identify high crime areas, Highlight demographic patterns, Identify temporal patterns.

Python, Pandas, Geopandas, Altair, Matplotlib, Numpy

News Classifier in Flask

The News Text Classification project is designed to predict the category of a news article among five predefined classes. This endeavor encompasses a series of steps to preprocess the text data, apply neural network models for classification, and deploy the model within a Flask web framework for user interaction. The project begins with the preprocessing of the text data, where stop words are removed and regex operations are applied to clean the text. Subsequently, the text sequences are transformed into uniform lengths, facilitating consistency in input data for the neural network model. For feature representation, an embedding layer is utilized through Keras, enabling the conversion of text sequences into dense vectors suitable for neural network processing. This embedding layer enhances the model's ability to capture semantic relationships within the text data. The core classification task is handled by a Long Short-Term Memory (LSTM) neural network, renowned for its effectiveness in processing sequential data. Following the LSTM layer, a dense softmax layer is employed to perform multi-class classification, assigning probabilities to each class label. To make the model accessible to users, it is deployed within a Flask web framework. Users can input text into the web application, which then utilizes the trained model to predict the category of the news article. This user-friendly interface enhances the practical utility of the model, allowing for seamless interaction and real-world application.

Python, Preprocessing, Tokenizer, NLP, Embedding, LSTM

Marketing-Campaign-Clustering

This project consists of a artificial story of why we are doing it and how it can create a value for a retail store. From the technical point of view, it has Exploratory Data Analysis, Feature Engineering, Comparison of different clustering algorithms. This project also has a great analysis of clusters from business point of view. For the clustering, I have used KMeans and Optics DBSCAN algorithms with and without PCA on the dataset. Sihouette coefficient score plot and Reachability plot are used for hyperparameters tuning KMeans and Optics models. This dataset has no a distinctive boundary between clusters making it a bit challenging to do clustering. However, with feature engineering on continuous features, a reasonable clusters are found and analyzed.

Python, Pandas, Feature Engineering, EDA, PCA, Clustering

Ebay Kleinanzeigen Apartment Scraper and Notifier

This project utilizes Python, Selenium, and a cron job scheduler to continuously monitor apartment listings on Ebay Kleinanzeigend deployed in Heroku. Using Selenium, the program scrapes new apartment listings from the website and filters them based on predefined criteria, such as location, price range, and size. A cron job is set up to run the scraping script every 5 minutes, ensuring real-time monitoring of new listings. When new apartments matching the filter criteria are found, the program sends an email notification to the user, providing details of the available listings. This automated system streamlines the apartment search process, allowing users to stay updated on new listings without manual intervention.

Python, Selenium, Cron job, Heroku

Get in Touch

BERLIN, GERMANY
anishm.sde@gmail.com