Prashant Gupta

Hi, I'm Prashant Gupta!

I'm a passionate AI engineer determined to leave the world a little better than I found it. Welcome to my personal space on the web!

View My Work

About Me

Hello! I'm Prashant Gupta, an AI engineer based in San Jose, California, focussed on Generative AI inference. I specialize in deploying and optimizing large-scale LLMs and multimodal models in production environments. I am also passionate about bringing cutting-edge AI research into real-world applications with performance, safety, and usability in mind.

My journey into Computer Science began while pursuing my Masters in CS from North Carolina State University. Since then, I've been dedicated to honing my skills in Cloud, DevOps, and now AI, and applying them to create impactful projects.

When I'm not coding or designing, I enjoy dancing and playing Pickleball. I'm always eager to learn new things and take on challenging projects.

Get In Touch

My Skills

My Projects

Project 1

vLLM Spyre

IBM Spyre is the first production-grade Artificial Intelligence Unit (AIU) accelerator born out of IBM Research. The vLLM Spyre plugin (vllm-spyre) is a dedicated backend extension that enables seamless integration of IBM Spyre Accelerator with vLLM. It follows the architecture described in vLLM’s Plugin System, making it easy to integrate IBM’s advanced AI acceleration into existing vLLM workflows.

My work focused on understanding the intricacies of inferencing optimizations and attention mechanisms within large language models to ensure efficient hardware utilization on the Spyre Accelerator.

inference LLM Python

My Contributions:

Loading contributions...

Project 2

IBM watsonx

IBM watsonx is a portfolio of AI products that accelerates the impact of generative AI in core workflows to drive productivity. Users are free to choose an open source foundation model, bring their own, or use existing models while running it across any cloud and using open, transparent technology with governance and security controls built in.

My work focused on the inference stack, ensuring models performed optimally at scale.

inference LLM Python

My Contributions:

Project 3

Automatic mouse mover

A minimalistic go library/app to keep your mac active and alive! The simplest app that has the sole purpose of moving your mouse pointer at regular intervals so that your machine is kept awake! And best of all, it works ONLY when you are not working, so be rest assured that the mouse won't start moving on its own without the machine actually being idle.

This was a fun weekend project to explore Multithreading with Go Routines in Golang.

Golang Goroutine Multithreading

My Contributions:

Loading contributions...

My GitHub Snapshot

Loading GitHub stats...

Top Starred Repositories

For a full overview of my activity, including commit history and contributions, please visit my GitHub Repositories and Profile Page.

Get In Touch

I'm always open to discussing new projects, creative ideas, or opportunities to be part of something amazing. Feel free to reach out!

Email Me