me shubho

[ contact ]

[ presently at ]

I am a research scientist at Silicon Valley Artifical Intelligence Lab at Baidu Research, situated in the beautiful Silicon Valley. I primarily work on building the best system possible for doing Deep Learning at scale using heterogenous processors.

I was a principal engineer at the Audio Recognition group at Shazam where I worked on increasing the recognition rate of the core audio algorithm. I investigated various orthogonal features in audio and usd them to increase the accuracy of the recognition algorithm. I also worked on developing probability models that I used to increase the recognition rate and lower the false positive rate.

Before coming to Shazam, I was a Research Scientist at Parallel Computing Lab at Intel where I investigated the performance characteristics of various parallel algorithms. These include recommendation algorithms like Alternating Least Squares, chemical simulation algorithms like BigDFT and parallel median finding. I worked on all levels of performance evaluation; from developing novel algorithms to examining performance hotspots in cycle accurate simulators. I was the resident GPU expert of the group and developed a series of micro-benchmarks to characterize the performance differences between CPU and GPU and influence future CPU architectures.

[ academic career ]

My academic career has followed a couple of phases. In the latest phase I did a Masters in Financial Math from Department of Statistics, Stanford University. As a part of the program, I was exposed to machine learning, statistics, asset pricing, convex optimization and stochastic differential equations. I also did various projects in portfolio allocation and risk estimation.

Before coming to Stanford, I did my PhD at Department of Computer Science, University of California Davis. I worked with Prof. John Owens to develop fundamental parallel algorithms on graphics processors.

My most significant contribution was developing novel algorithms for parallel-prefix scan and its variants and their efficient implementations on graphics hardware. Many researchers, including myself, have used these algorithmic building blocks to develop fast parallel sort and sparse solvers. I also developed algorithms to build data structures that can store sparse data. The two avenues that I explored are using a fast parallel sort to build Bounding Volume Hierarchies and develop a very fast algorithm that builds a perfect hash table on data parallel hardware. I have released all my work as a part of the data parallel primitives library CUDPP.

I also worked with Aaron Lefohn on rendering techniques and parallel algorithms on graphics processors (GPUs) that enable interactive film preview and high-quality graphics for games. During 2005 and 2006, I worked on generating high-quality shadows for dynamic scenes at interactive rates on GPUs. This work has also highlighted the importance of basic parallel algorithms in doing high-quality rendering. We developed the fastest known implementation of the parallel scan algorithm on the GPU and identified the need for fast implementations of other key algorithms. This led to the birth of the CUDPP project. I also worked with Aaron on Glift, which provides the multi-dimensional hierarchical data structure needed to generate high-quality shadows.

[ the early years ]

In my past life, I was working on the SunONE Application Server group at Sun Microsystems. I was there for four years and through two generations of the product, before the call of graphics proved irresistible. My non-graphics work also involves working on StorageTek's (now part of Sun Microsystems) REELs tape library management software. I also did a short stint in Tokyo, designing a large-scale online system.

Prior to starting my professional life, I did my Masters and Bachelors from Indian Institute of Technology(IIT), Kharagpur (India) in Mathematics.

[ publications ]

  • Persistent RNNs: Stashing Recurrent Weights On-Chip G. Diamos, S. Sengupta, B. Catanzaro, M. Chrzanowski, A. Coates, E. Elsen, J. Engel, A. Hannun, S. Satheesh Proceedings of The 33rd International Conference on Machine Learning
  • Deep Speech 2: End-to-End Speech Recognition in English and Mandarin D. Amodei et al. Proceedings of The 33rd International Conference on Machine Learning
  • Deep Speech: Scaling up end-to-end speech recognition A. Hannun, C. Case, J. Casper, B. Catanzaro, G. Diamos, E. Elsen, R. Prenger, S. Satheesh, S. Sengupta, A. Coates, A. Y. Ng arXiv:1412.5567
  • Navigating the Maze of Graph Analytics Frameworks using Massive Graph Datasets N. Satish, N. Sundaram, M. Patwary, J. Seo, J. Park, M. Hassaan, S. Sengupta, Z. Yin, P. Dubey Proceedings of the 2014 ACM SIGMOD international conference on Management of data. ACM, 2014.
  • Real-time Parallel Hashing on the GPU D. Alcantara, A. Sharf, F. Abbasinejad, S. Sengupta, M. Mitzenmacher, J. D. Owens, N. Amenta ACM Transactions on Graphics (Proceedings of ACM SIGGRAPH Asia 2009)
  • Fast BVH Construction for GPUs C. Lauterbach, M. Garland, S. Sengupta, D. Manocha Proc. Eurographics 2009, March 2009
  • Efficient parallel scan algorithms for GPUs S. Sengupta, M. Harris, M. Garland NVIDIA Technical Report NVR-2008-003, Decemter 2008
  • Scan Primitives for GPU computing S. Sengupta, Mark Harris, Yao Zhang, J. D. Owens Graphics Hardware 2007, pages 97-106, August 2007, Best paper award
  • Parallel Prefix Sum (Scan) with CUDA Mark Harris, S. Sengupta, J. D. Owens GPU Gems 3, Hubert Nguyen, editor, chapter 39, Addison Wesley, August 2007
  • Resolution Matched Shadow Maps A Lefohn, S. Sengupta, J. D. Owens ACM Transactions on Graphics 2007
  • A Work-Efficient Step-Efficient Prefix Sum Algorithm. S. Sengupta, A. Lefohn, J. Owens. Proceedings of the 2006 Workshop on Edge Computing using New Commodity Architectures
  • Dynamic Adaptive Shadow Maps on Graphics Hardware. A Lefohn, S. Sengupta, J. Kniss, R. Strzodka, J. Owens SIGGRAPH 2005 Technical Sketch
  • Octree Textures on Graphics Hardware. J. Kniss, A. Lefohn, R. Strzodka, S. Sengupta, J. Owens. SIGGRAPH 2005 Technical Sketch
  • Glift: Generic, Effiicient, Random-Access GPU data structure. A. Lefohn, J. Kniss, R. Strzodka, S. Sengupta, J. Owens. ACM Transactions on Graphics, 25(1), Jan 2006 (accepted to SIGGRAPH 2005 with major revisions)
  • Assessment of Graphics Processing Units (GPUs) for Department of Defense (DOD) Digital Signal Processing (DSP) Applications. J. Owens, S. Sengupta, D. Horn. Technical Report ECE-CE-2005-3, Computer Engineering Research Laboratory, University of California, Davis
  • [ fellowships and awards ]

  • Best paper award at Graphics Hardware 2007
  • NVIDIA Fellowship 2007-2008
  • NVIDIA Fellowship 2008-2009
  • Best Graduate Researcher Award: Dept of Computer Science
  • [ resume ]

    My resume in PDF format