I am a research scientist at Silicon Valley Artifical Intelligence Lab at Baidu Research, situated in the beautiful Silicon Valley. I primarily work on building the best system possible for doing Deep Learning at scale using heterogenous processors.
I was a principal engineer at the Audio Recognition group at Shazam where I worked on increasing the recognition rate of the core audio algorithm. I investigated various orthogonal features in audio and usd them to increase the accuracy of the recognition algorithm. I also worked on developing probability models that I used to increase the recognition rate and lower the false positive rate.
Before coming to Shazam, I was a Research Scientist at Parallel Computing Lab at Intel where I investigated the performance characteristics of various parallel algorithms. These include recommendation algorithms like Alternating Least Squares, chemical simulation algorithms like BigDFT and parallel median finding. I worked on all levels of performance evaluation; from developing novel algorithms to examining performance hotspots in cycle accurate simulators. I was the resident GPU expert of the group and developed a series of micro-benchmarks to characterize the performance differences between CPU and GPU and influence future CPU architectures.
My academic career has followed a couple of phases. In the latest phase I did a Masters in Financial Math from Department of Statistics, Stanford University. As a part of the program, I was exposed to machine learning, statistics, asset pricing, convex optimization and stochastic differential equations. I also did various projects in portfolio allocation and risk estimation.
Before coming to Stanford, I did my PhD at Department of Computer Science, University of California Davis. I worked with Prof. John Owens to develop fundamental parallel algorithms on graphics processors.
My most significant contribution was developing novel algorithms for parallel-prefix scan and its variants and their efficient implementations on graphics hardware. Many researchers, including myself, have used these algorithmic building blocks to develop fast parallel sort and sparse solvers. I also developed algorithms to build data structures that can store sparse data. The two avenues that I explored are using a fast parallel sort to build Bounding Volume Hierarchies and develop a very fast algorithm that builds a perfect hash table on data parallel hardware. I have released all my work as a part of the data parallel primitives library CUDPP.
I also worked with Aaron Lefohn on rendering techniques and parallel algorithms on graphics processors (GPUs) that enable interactive film preview and high-quality graphics for games. During 2005 and 2006, I worked on generating high-quality shadows for dynamic scenes at interactive rates on GPUs. This work has also highlighted the importance of basic parallel algorithms in doing high-quality rendering. We developed the fastest known implementation of the parallel scan algorithm on the GPU and identified the need for fast implementations of other key algorithms. This led to the birth of the CUDPP project. I also worked with Aaron on Glift, which provides the multi-dimensional hierarchical data structure needed to generate high-quality shadows.
In my past life, I was working on the SunONE Application Server group at Sun Microsystems. I was there for four years and through two generations of the product, before the call of graphics proved irresistible. My non-graphics work also involves working on StorageTek's (now part of Sun Microsystems) REELs tape library management software. I also did a short stint in Tokyo, designing a large-scale online system.
Prior to starting my professional life, I did my Masters and Bachelors from Indian Institute of Technology(IIT), Kharagpur (India) in Mathematics.
My resume in PDF format