AboutExperienceSkillsWork
SK
GuestbookBlogResumeContact
HomeWorkExp.BlogGuests
_ I AM

Sumit Kumar

|

Passionate about building scalable data pipelines, real-time analytics systems, and exploring machine learning solutions for cybersecurity and financial wellness domains.

View My WorkResumeContact Me
Sumit Kumar
Open to Work
About Sumit

About Me

My Journey

About Sumit

I'm Sumit Kumar, a data-driven problem solver with experience in building large-scale ETL systems, developing real-time data pipelines, and working with high-impact data products across FinTech and cybersecurity domains.

I believe in the power of data to transform businesses and lives. My focus is on creating robust, scalable solutions that not only solve immediate problems but enable future growth.

0+

Years Experience

0+

Projects Completed

View Resume

Experience

Dec 2024
Present
Dec 2024 — Present

Data Engineer

Gen Digital (formerly NortonLifeLock)

Engineered a custom Java-based Kafka Connect SMT library to process Debezium CDC events from AWS DocumentDB, eliminating...

Jul 2021
Nov 2024
Jul 2021 — Nov 2024

Data Engineer

Tata Consultancy Services (Client: PayPal)

Data Migration Framework (Mar 2023– Nov 2024): Developed a scalable ETL framework for PayPal’s data migration using Pyth...

May 2020
Jul 2020
May 2020 — Jul 2020

Data Science Research Intern

NIT Patna (Internship)

Developed a real-time forest fire detection system using Python-based ML algorithms and fuzzy logic, achieving 90% accur...

Technical Skills

Technologies and tools I work with professionally

Python
Expert
C++
Advanced
Java
Advanced
Shell/Bash
Intermediate
BigQuery
Intermediate
MySQL
Advanced
Oracle
Intermediate
DocumentDB
Intermediate
Athena
Intermediate
DynamoDB
Intermediate
AWS
Advanced
GCP
Advanced
React
Intermediate
Django
Intermediate
FastAPI
Intermediate
Flask
Intermediate
Spring Boot
Intermediate
Git
Advanced
TeamCity
Intermediate
Jenkins
Intermediate
VS Code
Intermediate
IntelliJ IDEA
Intermediate
PyCharm
Intermediate
Unix Commands
Intermediate
PySpark
Intermediate
ETL
Advanced
Airflow
Intermediate
Kafka (AWS MSK)
Intermediate
Machine Learning
Advanced
Generative AI
Intermediate

My Work / Projects

Showcase of my professional work, internships, and side projects.

Featured Work

Smaxiso Writes
Side Projects

Smaxiso Writes

A collection of poetry exploring themes of love, life, and introspection. Hosted poetry portfolio.

Web AppPoetryHosted
Visit ProjectVisitView CodeCode
Swipe to see more
Side Projects
Smaxiso Writes
Nov 2025 — Present

Smaxiso Writes

A collection of poetry exploring themes of love, life, and introspection. Hosted poetry portfolio.

Web AppPoetryHosted
View Details
Side Projects
AI Hub
2025-11 — Present

AI Hub

Centralized hub for AI models and tools, streamlining access and management of various artificial intelligence resources.

AIMachine LearningWeb App
View Details
Side Projects
Local RAG
2025-11 — Present

Local RAG

Local Retrieval-Augmented Generation system for private document chat, enabling secure and offline AI interactions with personal data.

PythonLLMRAG
View Details
Work Experience
Real-time Transaction Normalization & Scalable Data Lake Design
2024-12 — Present

Real-time Transaction Normalization & Scalable Data Lake Design

Built a real-time transaction normalization pipeline using Kafka (AWS MSK), ECS, and Java (Spring Boot) for fraud detection. Optimised the normalization service for vendor compatibility. Designed a scalable Data Lake using S3 Hudi and optimized ETL pipelines for Athena querying. Built a Java-based test automation service.

PythonJavaKafka (AWS MSK)ECS+9
View Details
Side Projects
School Chale Ham
2023-11 — Present

School Chale Ham

An academic blogging platform for K-12 education. Features include efficient blog creation and management. Backend powered by Express.js with MongoDB; Frontend built with Next.js.

Express.jsMongoDBNext.js
View Details
Side Projects
CLI Chat Bot
2025-11 — 2025-11

CLI Chat Bot

Command-line interface chatbot focused on developer productivity, offering quick access to tools and information via terminal.

PythonCLIChatbot
View Details
Side Projects
VS Code Productivity Extension
2025-10 — 2025-10

VS Code Productivity Extension

Custom Visual Studio Code extension built to enhance developer workflows and automate repetitive coding tasks. Features include enhanced markdown previewing and snippet automation.

TypeScriptVS Code APIMarked.js
View Details
Side Projects
Contextual News System
2025-07 — 2025-08

Contextual News System

AI-driven news aggregation system using NLP for content personalization and relevance.

AINLPNews Aggregation
View Details
Work Experience
Data Migration Framework
2023-03 — 2024-11

Data Migration Framework

Developed a scalable Data Migration Framework for a global payments company using Python, AWS, GCS, and BigQuery. Reduced data migration time by 20% and improved scalability by 30%.

PythonAWSGCSBigQuery
View Details
Work Experience
Lynx Framework Optimization
2024-01 — 2024-05

Lynx Framework Optimization

Optimized the Lynx entity linkage framework, improving accuracy and efficiency. Enhanced the Locality-Sensitive Hashing (LSH) algorithm, reducing nearest neighbor search time by 40%. Streamlined feature aggregation pipelines, reducing processing latency by 25%.

PySparkScalaBigQueryGCP (Dataproc, GCS)+1
View Details
Side Projects
Google Search Pro
2024-08 — 2024-02

Google Search Pro

Advanced search scraper and utility tool designed for enhanced information retrieval and data extraction efficiency.

PythonScrapingAutomation
View Details
Side Projects
Android Bloatware Removal Guide
2023-10 — 2023-10

Android Bloatware Removal Guide

A basic blog webpage created for the Android bloatware removal guide. Simple and informative design for easy navigation.

HTMLCSS
View Details
Work Experience
Reporting Framework
2021-08 — 2023-01

Reporting Framework

Developed on-demand merchant reporting solutions, improving data accuracy by 15% and reducing report generation time by 25%. Contributed to the Argo Framework for report generation at scale.

PythonSQLApache SparkGCP
View Details
Internship
Fuzzy Control System for Forest Fire Detection
2021-01 — 2021-06

Fuzzy Control System for Forest Fire Detection

Developed a real-time forest fire detection system utilizing Python-based machine learning algorithms and fuzzy logic. Achieved 90% accuracy in predicting the likelihood and severity of forest fires.

PythonMachine LearningFuzzy Logic
View Details
Side Projects
Bihar COVID Help
2021-05 — 2021-05

Bihar COVID Help

Built a resource-sharing platform for COVID-19 relief, connecting volunteer doctors with patients and aggregating information on critical supplies like oxygen and hospital beds.

HTMLCSSJavaScriptGitHub
View Details
College Projects
Joint Image Compression & Encryption
2020-07 — 2020-12

Joint Image Compression & Encryption

Created an algorithm for joint image compression and encryption using lossless JPEG2000 and RC4 encryption. Achieved a compression ratio of 5.2, 99.69% NPCR, and 47.63% UACI in processed images.

PythonDigital Image ProcessingJPEG2000RC4+1
View Details
College Projects
Digital Image Compression
2020-01 — 2020-05

Digital Image Compression

Designed an algorithm for digital image compression using K-means clustering and PCA. Achieved a compression ratio of 2.8 with 55-70% compression and a PSNR of 30 and above. Collaborated with a team of 4 members.

PythonMachine LearningDigital Image ProcessingK-means+2
View Details
College Projects
Image Classification using CNNs
2019-07 — 2019-12

Image Classification using CNNs

Led a team of 4 in developing a machine learning-based CNN model for image classification, utilizing the CIFAR10 dataset. Achieved a high accuracy rate of 87.44% on the training dataset and 82.5% on the testing dataset.

PythonMachine LearningDigital Image ProcessingCNN
View Details
College Projects
YouTube Spam Comment Filter
2018-12 — 2018-12

YouTube Spam Comment Filter

Developed and implemented a machine learning algorithm for binary classification of spam and non-spam comments on YouTube videos. Achieved an accuracy rate of 96.21% through training on multiple datasets.

PythonMachine LearningBinary Classification
View Details

Get In Touch

Have a project in mind or just want to say hi? Feel free to reach out!

Connect with me

I'm always open to discussing new projects, creative ideas or opportunities to be part of your visions.

Professional

linkedingithubemail

Social

whatsappfacebookinstagramtwittermedium