@galkowskim
Mikołaj Gałkowski
MSc. Data Science @ MiNI WUT
Junior Machine Learning Engineer
Picture of myself

About me

Passionate and innovative, I am a driven Machine Learning Engineer and Data Scientist with 1.5 years of industry experience. I excel in leveraging cutting-edge technologies within Machine Learning and Deep Learning to solve complex real-world problems. My focus areas include Large Language Models (LLMs), Natural Language Processing (NLP), MLOps, and multimodal data analysis. I thrive on collaborating with diverse teams to develop scalable solutions that drive business growth and innovation. Outside of work, I enjoy participating in hackathons to continually challenge myself and learn from others. I am deeply excited about the endless possibilities of data science and its potential to transform industries.

Languages: Python, Java, R, SQL, HTML, CSS, JavaScript (basics), Bash
Frameworks: Django, Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn, NLTK, Keras, Tensorflow, PyTorch, transformers, MLflow, Airflow
Tools: Git, Github Actions, Jupyter Notebook, PyCharm, Visual Studio Code, Anaconda, Docker, Linux

Projects

Unconditional Image Generation using Diffusion Models

This project, part of the Deep Learning course at Warsaw University of Technology, investigated various diffusion models for image generation. We implemented and tested three models—DDPM, DDIM, and PNDM—using the LSUN Bedroom dataset to generate 128x128 pixel images. We trained these models with PyTorch and the Hugging Face Diffusers library, comparing their performance over different epochs. Evaluations included FID scores and visual interpolation assessments. Comprehensive and reproducible results are detailed in the repository's README.

Python
Torch
diffusers
DDPM
DDIM
PNDM
project
Speech Commands classification

The project, undertaken as part of the Deep Learning course at Warsaw University of Technology, aimed to develop custom RNN architectures and test Whisper and Audio Spectogram Transformer and evaluate their performance. Utilizing the Speech Commands dataset, all models were trained using PyTorch. Comprehensive results, documented in the repository's README file, are reproducible for further analysis.

Python
Torch
RNN
transformers
Whisper
Audio Spectogram Transformer
matplotlib
project
Image classification using CNN

The project, undertaken as part of the Deep Learning course at Warsaw University of Technology, aimed to develop custom CNN architectures and evaluate their performance against pretrained models. Utilizing the CINIC-10 dataset, all models were trained using PyTorch. Comprehensive results, documented in the repository's README file, are reproducible for further analysis.

Python
Torch
CNN
matplotlib
project
Logistic Regression From Scratch

The project, undertaken as part of the Advanced Machine Learning course at Warsaw University of Technology, aimed to implement Logistic Regression from scratch with 3 different optimizers: SGD, IRLS and Adam. Additionaly, evaluate their performance in binary classification task alongside with LDA, QDA, Decision Tree and Random Forest. Comprehensive results, documented in the repository's README file, are reproducible for further analysis.

Python
Numpy
matplotlib
project
Students helper - Django Web App

A Django web app can help students manage their studies by providing a platform to create and manage tasks and projects. Each task can have a priority level, and the app also has a translator tool to help with notes and assignments in different languages, which can be useful for international students or those studying in a foreign language.

Python
Django
HTML
CSS
project
Messenger Analysis Web App

Dashboard created in Streamlit showing analysis of the data from Messenger. Project required pipeline for data processing and analysis.

Python
Streamlit
Pandas
Numpy
Matplotlib
Docker
Github Actions
Bash
project
Phone activity classification & Document Clustering

Two projects from my studies at Warsaw University of Technology. The first one is a classification task, where I had to predict the activity of a person based on the data from their phone. The second one is a clustering task, where I had to cluster documents based on their content.

Python
Sklearn
Pandas
Numpy
Matplotlib
NLTK
project
A poster on environmental protection within the subject of Data Visualization Techniques

A poster on environmental protection within the subject of Data Visualization Techniques. The poster was created in Canva and contains a lot of charts and graphs created using R language.

R
ggplot
dplyr
tidyr
project

Experience

December 2023 - March 2024: Junior Machine Learning Engineer - Grid Dynamics

  • Utilizing acquired expertise in 2 client projects
    1. NLP project
      • Enhancing product recognition algorithms by refining pattern matching using Regex, and incorporating algorithms based on Levenstein distance and fuzzy search.
      • Participating in a dynamic three-person team, actively contributing to collaborative problem-solving initiatives aimed at addressing challenges and optimizing product recognition algorithms.
      • Data cleaning and preprocessing using pandas
    2. Optimization project
      • Assisting in refactoring the code base and addressing performance issues on Databricks

June 2023 - December 2023: Machine Learning Engineer Intern - Grid Dynamics

July 2022 - December 2022: NLP Intern - Samsung R&D

  • Development of Bixby, working on automation tools for linguist
  • Improving deep learning model performance in NLU area (mainly Tensorflow/transformers, significant improvement 5% accuracy on production data)
  • Implementing features in Android project written in Java
  • Technical Skills: Python with Tensorflow and HuggingFace, Java (Android project), Linux tools, Scripting (Bash), Git, Github, Github Actions.
  • Soft Skills: Teamwork, Time Management, Communication.