publications | Souvik Kundu, Ph.D.

2026

MLSys 2026

SkipKV: Selective Skipping of KV Generation and Storage for Efficient Inference with Large Reasoning Models

Jiayi Tian, Seyedarmin Azizi, Yequan Zhao, and 7 more authors

In Ninth Annual Conference on Machine Learning and Systems (MLSys), 2026

PDF
ICLR 2026

PRISM: Enhancing Protein Inverse Folding through Fine-Grained Retrieval on Structure-Sequence Multimodal Representations

Sazan Mahbub, Souvik Kundu, and Eric P Xing

In , 2026

PDF
ASPLOS 2026

MoDM: Efficient Serving for Image Generation via Mixture-of-Diffusion Models

Yuchen Xia, Divyam Sharma, Yichao Yuan, and 2 more authors

In Proceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1, 2026

PDF

2025

NeurIPS 2025

Top-H decoding: Adapting the creativity and coherence with bounded entropy in text generation

Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, and 1 more author

In , 2025

PDF
ISCA 2025

MicroScopiQ: Accelerating Foundational Models through Outlier-Aware Microscaling Quantization

Akshat Ramachandran, Souvik Kundu, and Tushar Krishna

In International Symposium on Computer Architecture (ISCA), 2025

PDF
ICLR 2025

MambaExtend: A Training-Free Approach to Improve Long Context Extension of Mamba

Souvik Kundu^*, Seyedarmin Azizi^*, Mohammad Erfan Sadeghi, and 1 more author

In International Conference on Learning Representations (ICLR), 2025

PDF
ICLR 2025

Lantern: Accelerating visual autoregressive models with relaxed speculative decoding

Doohyuk Jang, Sihwan Park, June Yong Yang, and 5 more authors

In International Conference on Learning Representations (ICLR), 2025

PDF Code
ICLR 2025

Scaling Long Context Training Data by Long-Distance Referrals

Yonghao Zhuang, Lanxiang Hu, Longfei Yun, and 4 more authors

In International Conference on Learning Representations (ICLR), 2025

PDF
NACCL 2025

LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression

Souvik Kundu, Anahita Bhiwandiwalla, Sungduk Yu, and 6 more authors

In NACCL, 2025

PDF

2024

NeurIPS 2024

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

Haoran You, Yipin Guo, Yichao Fu, and 6 more authors

In Thirty-Eighth Annual Conference on Neural Information Processing Systems, 2024

PDF Code
NeurIPS 2024

GEAR: An efficient kv cache compression recipefor near-lossless generative inference of llm

Hao Kang, Qingru Zhang, Souvik Kundu, and 4 more authors

In Thirty-Eighth Annual Conference on Neural Information Processing Systems Workshop (Spotlight), 2024

PDF Code
NeurIPS 2024

CITER: Collaborative Inference for Efficient Large Language Model Decoding with Token-Level Routing

Wenhao Zheng, Yixiao Chen, Weitong Zhang, and 6 more authors

In Thirty-Eighth Annual Conference on Neural Information Processing Systems Workshop, 2024
EMNLP 2024

LaMDA: Large Model Fine-Tuning via Spectrally Decomposed Low-Dimensional Adaptation

Seyedarmin Azizi, Souvik Kundu, and Massoud Pedram

In Conference on Empirical Methods in Natural Language Processing (Findings), 2024

PDF Code
ECCV 2024

CLAMP-ViT: contrastive data-free learning for adaptive post-training quantization of ViTs

Akshat Ramachandran, Souvik Kundu, and Tushar Krishna

In European Conference on Computer Vision, 2024

PDF Code
ECCV 2024

GenQ: Quantization in Low Data Regimes with Generative Synthetic Data

Yuhang Li, Youngeun Kim, Donghyun Lee, and 2 more authors

In European Conference on Computer Vision, 2024

PDF
ACL 2024

AFLoRA: Adaptive Freezing of Low Rank Adaptation in Parameter Efficient Fine-Tuning of Large Models

Souvik Kundu^*, Zeyu Liu^*, Anni Li, and 3 more authors

In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Best paper recommendation), 2024

PDF Code
ICML 2024

Junk dna hypothesis: A task-centric angle of llm pre-trained weights through sparsity

Lu Yin, Shiwei Liu, Ajay Jaiswal, and 2 more authors

In International Conference on Machine Learning, 2024

PDF Code
TMLR 2024

Bit-by-Bit: Investigating the Vulnerabilities of Binary Neural Networks to Adversarial Bit Flipping

Shamik Kundu, Sanjay Das, Sayar Karmakar, and 4 more authors

In Transactions on Machine Learning Research, 2024

PDF
ICPR 2024

What Makes Vision Transformers Robust Towards Bit-Flip Attacks?

Xuan Zhou, Souvik Kundu, Dake Chen, and 2 more authors

In International Conference on Pattern Recognition (Oral), 2024
CVPR 2024

Block Selective Reprogramming for On-device Training of Vision Transformers

Sreetama Sarkar, Souvik Kundu, Kai Zheng, and 1 more author

In Workshop Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognitio (Oral), 2024
CVPR 2024

RLNet: Robust Linearized Networks for Efficient Private Inference

Souvik Kundu^*, Sreetama Sarkar^*, and Peter A Beerel

In Workshop Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (Oral), 2024
CVPR 2024

DIA: Diffusion based Inverse Network Attack on Collaborative Inference

Dake Chen, Shiduo Li, Yuke Zhang, and 3 more authors

In Workshop Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024
ICLR 2024

Fusing models with complementary expertise

Hongyi Wang, Felipe Maia Polo, Yuekai Sun, and 3 more authors

In International Conference on Learning Representation, 2024
ICASSP 2024

Sensi-BERT: Towards sensitivity driven fine-tuning for parameter-efficient bert

Souvik Kundu, Sharath Nittur Sridhar, Maciej Szankin, and 1 more author

In IEEE International Conference on Acoustics, Speech and Signal Processing, 2024
ICASSP 2024

Recent Advances in Scalable Energy-Efficient and Trustworthy Spiking Neural Networks: from Algorithms to Technology

Souvik Kundu, Rui-Jie Zhu, Akhilesh Jaiswal, and 1 more author

In IEEE International Conference on Acoustics, Speech and Signal Processing, 2024
TinyML 2024

CiMNet: Towards Joint Optimization for DNN Architecture and Configuration for Compute-In-Memory Hardware

Souvik Kundu, Anthony Sarah, Vinay Joshi, and 2 more authors

In TinyML Conference long talk, 2024

PDF

2023

NeurIPS 2023

Don’t just prune by magnitude! Your mask topology is a secret weapon

Duc Hoang, Souvik Kundu, Shiwei Liu, and 2 more authors

In Advances in neural information processing systems, 2023
ICCAD 2023

RNA-ViT: Reduced-Dimension Approximate Normalized Attention Vision Transformers for Latency Efficient Private Inference

Souvik Kundu^*, Dake Chen^*, Yuke Zhang^*, and 2 more authors

In International Conference on Computer Aided Design, 2023
ICCV 2023

Vision HGNN: An image is more than a graph of nodes

Yan Han, Peihao Wang, Souvik Kundu, and 2 more authors

In Proceedings of the IEEE/CVF International Conference on Computer Vision (Oral), 2023
ICCV 2023

SAL-ViT: Towards latency efficient private inference on vit using selective attention search with a learnable softmax approximation

Souvik Kundu^*, Yuke Zhang^*, Dake Chen^*, and 2 more authors

In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
ICCV 2023

Instatune: Instantaneous neural architecture search during fine-tuning

Sharath Nittur Sridhar, Souvik Kundu, Sairam Sundaresan, and 2 more authors

In Workshop Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023
TMLR 2023

Revisiting Sparsity Hunting in Federated Learning: Why does Sparsity Consensus Matter?

Souvik Kundu^*, Sara Babakniya^*, Saurav Prakash, and 2 more authors

In Transactions on Machine Learning Research, 2023

PDF Code
TMLR 2023

Overcoming resource constraints in federated learning: Large models can be trained with only weak clients

Yue Niu, Saurav Prakash, Souvik Kundu, and 2 more authors

In Transactions on Machine Learning Research, 2023

PDF Code
CVPR 2023

Making models shallow again: Jointly learning to reduce non-linearity and depth for latency-efficient private inference

Souvik Kundu, Yuke Zhang, Dake Chen, and 1 more author

In Workshop Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (Oral), 2023

PDF
DAC 2023

C2PI: An Efficient Crypto-Clear Two-Party Neural Network Private Inference

Yuke Zhang, Dake Chen, Souvik Kundu, and 3 more authors

In 60th ACM/IEEE Design Automation Conference (DAC), 2023

PDF
ICLR 2023

Learning to linearize deep neural networks for secure and efficient private inference

Souvik Kundu, Shunlin Lu, Yuke Zhang, and 2 more authors

In International Conference on Learning Representation, 2023

PDF

2022

ACM TECS 2022

Toward Adversary-aware Non-iterative Model Pruning through D ynamic N etwork R ewiring of DNNs

Souvik Kundu, Yao Fu, Bill Ye, and 2 more authors

In ACM Transactions on Embedded Computing Systems, 2022

PDF
Euromicro 2022

Pipeedge: Pipeline parallelism for large-scale model inference on heterogeneous edge devices

Yang Hu, Connor Imes, Xuanang Zhao, and 4 more authors

In 2022 25th Euromicro Conference on Digital System Design (DSD), 2022

PDF
DATE 2022

BMPQ: bit-gradient sensitivity-driven mixed-precision quantization of dnns from scratch

Souvik Kundu, Shikai Wang, Qirui Sun, and 2 more authors

In 2022 Design, Automation & Test in Europe Conference & Exhibition (DATE), 2022

PDF
VLSI-SoC 2022

P2m-detrack: Processing-in-pixel-in-memory for energy-efficient and real-time multi-object detection and tracking

Souvik Kundu^*, Gourav Datta^*, Zihan Yin, and 8 more authors

In 2022 IFIP/IEEE 30th International Conference on Very Large Scale Integration (VLSI-SoC), 2022

PDF
Nature 2022

A processing-in-pixel-in-memory paradigm for resource-constrained tinyml applications

Souvik Kundu^*, Gourav Datta^*, Zihan Yin^*, and 5 more authors

In Nature Scientific Reports, 2022

PDF

2021

NeurIPS 2021

Analyzing the confidentiality of undistillable teachers in knowledge distillation

Souvik Kundu, Qirui Sun, Yao Fu, and 2 more authors

In Advances in Neural Information Processing Systems, 2021

PDF Code
ICCV 2021

Hire-SNN: Harnessing the inherent robustness of energy-efficient deep spiking neural networks by training with crafted input noise

Souvik Kundu, Massoud Pedram, and Peter A Beerel

In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021

PDF
WACV 2021

Spike-thrift: Towards energy-efficient deep spiking neural networks by limiting spiking activity via attention-guided compression

Souvik Kundu, Gourav Datta, Massoud Pedram, and 1 more author

In Proceedings of the IEEE/CVF winter conference on applications of computer vision, 2021

PDF
ASP-DAC 2021

DNR: A tunable robust pruning framework through dynamic network rewiring of dnns

Souvik Kundu, Mahdi Nazemi, Peter A Beerel, and 1 more author

In Proceedings of the 26th Asia and South Pacific Design Automation Conference, 2021

PDF
ICASSP 2021

Attentionlite: Towards efficient self-attention models for vision

Souvik Kundu, and Sairam Sundaresan

In ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2021

PDF

2020

IEEE TC 2020

Pre-defined sparsity for low-complexity convolutional neural networks

Souvik Kundu, Mahdi Nazemi, Massoud Pedram, and 2 more authors

In IEEE Transactions on Computers, 2020

PDF