hi theređź‘‹, I'm

Pragnyan Ramtha

17, he/him

AI Engineer - Specializing in LLM fine-tuning and AI system design

about me.

Results-driven AI Engineer specializing in Large Language Model (LLM) fine-tuning and AI system design. I design maintainable, production‑grade AI systems and can comfortably work with deep cloud infrastructure. I learn new tools fast and use AI as a force‑multiplier in my coding, designing, and research loops, which lets me move much faster while keeping systems reliable.

experience.

  • AI Engineering Intern Remote

    at, Reputation-DAO

    Aug 2025 – Jan 2026

    • Architected a high-availability serverless MLOps backend on GCP (Cloud Functions, Cloud Run) for production grade AI orchestration, achieving 99.9% uptime while successfully cutting down on cloud costs by 70%.
    • Engineered a Gemini API response system with optimized prompt caching, which significantly reduced API costs and delivered a 50% reduction in inference latency across critical workflows.
    • Boosted accuracy of customer support bot by developing a RAG pipeline utilizing semantic search for real-time documentation retrieval and source attribution.
    • GCP
    • Gemini API
    • RAG
    • Serverless
    • Python
  • Github Contributor Remote

    at, Open Source Contributions

    Jan 2025 – Present

    • Served as a key contributor to 30+ open-source projects, focusing on bug fixes and core feature development.
    • Identified a bottleneck in Scrapy causing severe latency, authored a fix that eliminated the drag and achieved a 2x speedup for affected workflows.
    • Winner of IEEE Summer of Code (IEEESoC) Hackathon 2025, for open source contributions to multiple projects.
    • Python
    • TypeScript
    • Git
    • Docker
    • CI/CD

projects.

  • AIMO-3: Efficient Reasoning via LLM Fine-Tuning

    • Fine-tuned Phi-4 (14B) on CoT and TiR datasets to optimize multi-step problem solving and tool-use efficiency.
    • Achieved 90% accuracy on reasoning benchmarks, rivaling 70B parameter models while utilizing significantly fewer compute resources.
    • Phi-4
    • Fine-tuning
    • CoT/TiR
    • PEFT
  • Personality Clone

    • Fine-tuned a Large Language model, leveraging PEFT (QLoRA) and contrastive learning on private conversational data to emulate personal response style.
    • Implemented a siamese network architecture with cosine similarity loss, which improved semantic embeddings and achieved 92% accuracy in replicating my response style, a 28% improvement over baseline models.
    • TensorFlow
    • Python
    • CUDA
    • Transformers
  • Autopilot

    • Engineered an AI-driven OS automation system leveraging function calling and tool-use paradigms to execute complex natural language tasks to achieve low-level automation.
    • Built a Reasoning + Acting agent framework with command sandboxing, reducing execution errors and achieving 45% faster task completion than manual workflows.
    • Python
    • LLM Agents
    • APIs

research papers.

  • Scaling Context Windows to Infinity: A Comprehensive Study of Position Encoding, Attention Mechanisms, Memory-Efficient Inference, and Context Reduction Techniques in Large Language Models 2026

    Read paper Academia.edu

    A comprehensive analysis of techniques for extending context windows in large language models, examining position encoding strategies, efficient attention mechanisms, and memory-optimized inference approaches to enable processing of arbitrarily long sequences.

  • Unlocking Societal Trends in Aadhaar Enrolment and Updates: Anomaly Detection and Fraud Risk Prediction 2026

    Read paper Academia.edu

    A data-driven approach to identify suspicious patterns in India's Aadhaar biometric identification system, utilizing machine learning for anomaly detection and fraud risk prediction in enrollment and update processes.

  • Speeding Up LLM Inference Using Quantum Computing Techniques 2026

    Read paper Under Research

    Exploring quantum-inspired algorithms and quantum computational primitives to accelerate inference in large language models, investigating quantum annealing for attention mechanisms and variational quantum circuits for efficient token generation.

technical skills.

  • Languages:

    Python, C++, Bash, SQL, TypeScript

  • AI/ML Tools:

    PyTorch, Transformers, Unsloth, NumPy, Pandas, Scikit-learn, PEFT/QLoRA, CUDA

  • Infrastructure:

    GCP, Azure, Docker, Linux (Arch), Git

  • Certifications:

    Machine learning certification (Stanford), CS50: comp. Sci. (Harvard University)

  • Achievements:

    • Artificial Intelligence Mathematical Olympiad (AIMO) Silver Medalist
    • Authored 2 Research Papers on Modern AI Optimization Techniques
    • Winner, IEEE Summer of Code (IEEESOC) Hackathon 2025
    • Winner, Empathy Encryption Hackathon 2025
    • Winner, Daydream Hyderabad @ Hackclub 2025
    • Top 0.5% Finalist, Shell AI Hackathon 2025
  • Developer Tools:

    uv, Neovim, Arch Linux

Let's work together.

I'm always interested in new opportunities and exciting projects. Whether you have a project in mind or just want to chat about tech, I'd love to hear from you.

Currently available for freelance work and internship opportunities

Response time: Usually within 24 hours

Pragnyan Ramtha · 2026