IITM badge
Speech Lab
Department of Electrical Engineering
Indian Institute of Technology Madras, Chennai
Publications

(A): Journal and Conference publications

  1. Sukhadia, Vrunda N; Umesh, S; Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models; To Appear at IEEE SLT 2022
  2. Lodagala, Vasista Sai; Ghosh, Sreyan; Umesh, S; PADA: PRUNING ASSISTED DOMAIN ADAPTATION FOR SELF-SUPERVISED SPEECH REPRESENTATIONS; To Appear at IEEE SLT 2022
  3. Lodagala, Vasista Sai; Ghosh, Sreyan; Umesh, S; CCC-WAV2VEC 2.0: CLUSTERING AIDED CROSS CONTRASTIVE SELF-SUPERVISED LEARNING OF SPEECH REPRESENTATIONS; To Appear at IEEE SLT 2022
  4. Arunkumar, A., Umesh, S. (2022) Joint Encoder-Decoder Self-Supervised Pre-training for ASR. Proc. Interspeech 2022, 3418-3422, doi: 10.21437/Interspeech.2022-11338
  5. Bhanushali, A., Bridgman, G., G, D., Ghosh, P., Kumar, P., Kumar, S., Raj Kolladath, A., Ravi, N., Seth, A., Seth, A., Singh, A., Sukhadia, V., S, U., Udupa, S., Prasad, L.V.S.V.D. (2022) Gram Vaani ASR Challenge on spontaneous telephone speech recordings in regional variations of Hindi. Proc. Interspeech 2022, 3548-3552, doi: 10.21437/Interspeech.2022-11371
  6. Ghosh, S., Kumar, S., Kumar, Y., Ratn Shah, R., Umesh, S. (2022) Span Classification with Structured Information for Disfluency Detection in Spoken Utterances. Proc. Interspeech 2022, 3998-4002, doi: 10.21437/Interspeech.2022-11242
  7. Arunkumar, A., Nileshkumar Sukhadia, V., Umesh, S. (2022) Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition. Proc. Interspeech 2022, 5145-5149, doi: 10.21437/Interspeech.2022-11376
  8. Ghosh, S., Lepcha, S., Sakshi, S., Shah, R.R., Umesh, S. (2022) DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances. Proc. Interspeech 2022, 5185-5189, doi: 10.21437/Interspeech.2022-10752
  9. P. Kumar, V. N. Sukhadia and S. Umesh, "Investigation of Robustness of Hubert Features from Different Layers to Domain, Accent and Language Variations," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, doi: 10.1109/ICASSP43922.2022.9746250.
  10. Vishwas M. Shetty ., Metilda Sagaya Mary N.J. ,S. Umesh .,"Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages" in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Volume 2020-May, Year 2020, Pages 8279-8283
  11. Metilda Sagaya Mary N J ., Vishwas M. Shetty ., S. Umesh ., "Investigation of Methods to Improve the Recognition Performance of Tamil-English Code-Switched Data in Transformer Framework", in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Volume 2020-May, Year 2020, Pages 7889-7893
  12. Prakash, A., Leela Thomas, A., Umesh, S., A Murthy, H. (2019) Building Multilingual End-to-End Speech Synthesisers for Indian Languages. Proc. 10th ISCA Speech Synthesis Workshop, 194-199, DOI: 10.21437/SSW.2019-35.
  13. Shetty, Vishwas M.;Sharon, Rini A.;Abraham, Basil;Seeram, Tejaswi;Prakash, Anusha;Ravi, Nithya;Umesh, S., "Articulatory and stacked bottleneck features for low resource speech recognition", in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Volume 2018-September, Year 2018, Pages 3202-3206
  14. Neethu Mariam Joy, S. Umesh, "Improving Acoustic Models in TORGO Dysarthric Speech Database", in IEEE Transactions on Neural Systems and Rehabilitation Engineering, Volume 26, Year 2018, Pages 637-645
  15. Neethu Mariam Joy, Sandeep Reddy Kothinti, Srinivasan Umesh, "FMLLR Speaker Normalization With i-vector: In Pseudo-FMLLR and Distillation Framework", in IEEE/ACM Transactions on Audio Speech and Language Processing, Volume 26, Year 2018, Pages 797-805
  16. Neethu Mariam Joy, Murali Karthik Baskar, S. Umesh "DNNs for Unsupervised Extraction of Pseudo Normalized Features Without Explicit Adaptation Data”, Journal of Speech Communication, Vol. 92, pp. 64-76, September 2017.
  17. Basil Abraham, S. Umesh “ An automated technique to generate phone-to-articulatory label mapping”. Journal of Speech Communication, (Elsevier), Vol. 86, pp. 107-120, 2017.
  18. Basil Abraham, Tejaswi Seeram, S. Umesh “ Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource Languages” Proc. of International Conference on Spoken Language Processing , (Interspeech 2017), (Stockholm, Sweden), August 2017.
  19. Neethu Mariam Joy, Sandeep Reddy Kothinti, S. Umesh, Basil Abraham “ Generalized Distillation Framework for Speaker Normalization” Proc. of International Conference on Spoken Language Processing , (Interspeech 2017), (Stockholm, Sweden), August 2017.
  20. Neethu Mariam Joy, S. Umesh, Basil Abraham “ On Improving Acoustic Models for TORGO Dysarthric Speech Database” Proc. of International Conference on Spoken Language Processing , (Interspeech 2017), (Stockholm, Sweden), August 2017.
  21. Basil Abraham, S. Umesh, Neethu Mariam Joy “ Joint Estimation of Articulatory Features and Acoustic Model for Low Resource Languages” Proc. of International Conference on Spoken Language Processing , (Interspeech 2017), (Stockholm, Sweden), August 2017.
  22. Seeram Tejaswi, S. Umesh “ DNN Acoustic Models for Dysarthric Speech” Proc. of Twenty Third National Conference on Communication (NCC – 2017), Madras, 2017.
  23. Seeram Tejaswi, S. Umesh “ Addressing Data Sparsity in DNN Acoustic Modeling” Proc. of Twenty Third National Conference on Communication (NCC – 2017), Madras, 2017.
  24. Basil Abraham, S. Umesh and Neethu M Joy “ Articulatory Feature Extraction Using CTC to Build Articulatory Classifiers Without Forced Frame Alignments for Speech Recognition” Proc. of International Conference on Spoken Language Processing , (Interspeech 2016), (San Francisco, USA), pp.798-802
  25. Neethu M. Joy, M. K. Baskar, S. Umesh and Basil Abraham “ DNNs for Unsupervised Extraction of Pseudo FMLLR Features Without Explicit Adaptation Data” Proc. of International Conference on Spoken Language Processing , (Interspeech-2016), (San Francisco, USA), pp.3479-3483
  26. Basil Abraham, S. Umesh and Neethu M. Joy “ Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages” Proc. of International Conference on Spoken Language Processing , (Interspeech-2016), (San Francisco, USA), pp.3037-3041
  27. Neethu M. Joy, B. Abraham, K. Navneeth and S. Umesh “ Improved acoustic modeling of low-resource languages using shared SGMM parameters of high-resource languages” Proc. of Twenty Second National Conference on Communication (NCC – 2016), Guwahati.
  28. B. Murali Karthick, Prateek Kolhar and S. Umesh “ Speaker Adaptation of Convolu- tional Neural Network Using Speaker Specific Subspace Vectors of SGMM”. Proc. of International Conference on Spoken Language Processing , (Interspeech-2015), (Dres- den, Germany), Sep. 2015.
  29. Vikas Joshi, Raghavendra Bilgi, S. Umesh, Luz Garcia, and Carmen Benitez “ Sub- band based histogram equalization in cepstral domain for speech recognition”. Journal of Speech Communication, (Elsevier), Vol. 69, pp. 46-65, May 2015.
  30. R. Sriranjani, S. Umesh and M.R. Reddy “ Automatic Severity Assessment of Dysarthria using State-Specific Vectors”. Journal Biomedical Sciences Instrumentation, (Plenum), Vol. 51, pp. 99-106, April 2015.
  31. Joshi, Vikas, Prasad, N Vishnu, S. Umesh “ Modified Mean and Variance Normaliza- tion: Transforming to Utterance-Specific Estimates”. Journal of Circuits, Systems, and Signal Processing, (Springer), pp. 1-17, 2015.
  32. Sriranjani R, S. Umesh, others “ Investigation of different acoustic modeling techniques for low resource Indian language data (Inproceeding)”. Proc. of Twenty First National Conference on Communications, (NCC-2015), Mumbai, 2015.
  33. R. Sriranjani, M. Ramasubba Reddy, S. Umesh “ Improved acoustic modeling for auto- matic dysarthric speech recognition (Inproceeding)”. Proc. of Twenty First National Conference on Communications, , (NCC-2015), pp. 1-6, Mumbai, 2015.
  34. Mohan Aanchan, Richard Rose, Sina Hamidi Ghalehjegh and S. Umesh. “ Acoustic modelling for speech recognition in Indian languages in an agricultural commodities task domain ”, Journal of Speech Communication, (Elsevier), Vol. 56, pp. 167–180, 2014.
  35. B. Murali Karthick and S. Umesh. “Improving deep neural networks using state projection vectors of subspace Gaussian mixture model as features” Proc. of IEEE Spoken Language Technology Workshop, pp. 129-134, South Lake Tahoe, NV, USA, December 7-10, 2014.
  36. S. Umesh, Basil Abraham, Joy Neethu Mariam, K. Navneeth “A data-driven phoneme mapping technique using interpolation vectors of phone-cluster adaptive training” IEEE Spoken Language Technology Workshop, South Lake Tahoe, NV, USA, Decem- ber 7-10, 2014, IEEE 2014.
  37. Neethu Mariam Joy, Basil Abraham, K. Navneeth and S. Umesh. “ Cross-lingual acoustic modeling for Indian languages based on Subspace Gaussian Mixture Models” Proc. of Twentieth National Conference on Communications, (NCC-2014), pp. 1-5. IEEE, 2014.
  38. Sriranjani R, B. Murali Karthick, and S. Umesh. “Experiments on front-end tech- niques and segmentation model for robust Indian Language speech recognizer” Proc. of Twentieth National Conference on Communications, (NCC-2014), pp. 1-6. IEEE, 2014.
  39. Vimal Manohar, Bhargav Srinivas Ch and S. Umesh, “Acoustic Modeling Using Transform-based Phone-Cluster Adaptive Training”, Proc. of IEEE Workshop on Automatic Speech Recognition Understanding, Olomouc, Czech Republic, December 2013.
  40. D S Pavan Kumar, N. Vishnu Prasad, Vikas Joshi and S. Umesh, “Modified SPLICE and its Extension to Non-Stereo Data for Noise Robust Speech Recognition”, Proc. of IEEE Workshop on Automatic Speech Recognition Understanding, Olomouc, Czech Republic, December 2013.
  41. N. Vishnu Prasad and S. Umesh, “Improved Cepstral Mean and Variance Normaliza- tion using Bayesian Framework”, Proc. of of IEEE Workshop on Automatic Speech Recognition Understanding, Olomouc, Czech Republic, December 2013.
  42. Vikas Joshi, N. Vishnu Prasad, S. Umesh, “Modified Cepstral Mean Normalization - Transforming to utterance specific non-zero mean” - Proc. of International Con- ference on Spoken Language Processing, Interspeech-2013, pp. 881-885, Lyon, France, September 2013.
  43. D S Pavan Kumar, Raghavendra Bilgi and S. Umesh, “Non-Negative Subspace Projec- tion During Conventional MFCC Feature Extraction for Noise Robust Speech Recog- nition”, in Proc. of Nineteenth National Conference on Communications, (NCC- 2013), Delhi, India, Feb 2013.
  44. Bhargav Srinivas Ch, Neethu Joy, Raghavendra Bilgi and S. Umesh, “Subspace Mod- eling Techniques Using Monophones for Speech Recognition”, inProc. of Nineteenth National Conference on Communications, (NCC-2013), Delhi, India, Feb 2013.
  45. D.R. Sanand, S. Umesh, “VTLN Using Analytically Determined Linear Transforma- tion on Conventional MFCC”, IEEE Transactions on Audio, Speech, and Language Processing, pp. 1573-1584, July 2012.
  46. A. K. Sarkar, S. Umesh, “Multiple background models for speaker verification using the concept of vocal tract length and MLLR super-vector”, International Journal of Speech Technology,, (Springer), Vo. 15, No. 3, pp. 351-364, 2012.
  47. Aanchan Mohan, S. Umesh and Richard C. Rose, “Subspace based acoustic modelling in Indian languages”, IEEE Conference on Information Science, Signal Processing and their Applications, Montreal, Canada, 2012.
  48. Vikas Joshi, R. Bilgi, S. Umesh, G. Luz and B. Carmen, “Noise and Speaker Compen- sation in Log Filter Bank Domain”- Proc. of IEEE International Conf. on Acoustic, Speech and Signal Processing, ICASSP-2012, pp. 4709-4712, Kyoto, Japan, 2012.
  49. R. Bilgi, Vikas Joshi, S. Umesh, G. Luz and B. Carmen, “Robust Speech Recognition through the selection of Speaker and Noise transforms”- Proc. of IEEE International Conf. on Acoustic, Speech and Signal Processing, ICASSP-2012, pp. 4333-4336, Ky- oto, Japan, 2012.
  50. A. K. Sarkar, S. Umesh and J. F. Bonastre “Computationally Efficient Speaker Iden- tification Using Fast-MLLR Based Anchor Modeling”, Proc. of IEEE International Conf. on Acoustic, Speech and Signal Processing, ICASSP-2012, pp 4357-4360, Kyoto, Japan, 2012.
  51. S. Umesh, “Studies on Inter-Speaker Variability in Speech and its Application in Automatic Speech Recognition”, Sadhana, (Springer), (Invited Paper), Vol. 36, Part 5, pp. 853–883, October 2011.
  52. A. K. Sarkar and S. Umesh, ‘Eigen-Voice Based Anchor Modeling System for Speaker Identification using MLLR Super-Vector”, in Proc. of International Conference on Spoken Language Processing, Interspeech-2011, pp. 2357-2360, Florence, Italy, 2011
  53. Joshi V., Bilgi R., S. Umesh, Benitez C., & Garcia L. “Efficient Speaker and Noise Normalization for Robust Speech Recognition”. inProc. of International Conference on Spoken Language Processing, (Interspeech-2011), pp. 2601-2604, Florence, Italy, 2011.
  54. Joshi V., Bilgi R., S. Umesh, Garcia L. & Benitez C., “Sub-Band Level Histogram Equalization for Robust Speech Recognition”, in Proc. of International Conference on Spoken Language Processing, (Interspeech-2011), pp. 661-664, Florence, Italy, 2011.
  55. Achintya Sarkar, Shakti P Rath, S. Umesh, "Vocal Tract Length Normalization factor based speaker-cluster UBM for speaker verification", in Proceedings of 16th National Conference on Communications, NCC 2010, Year 2010
  56. Achintya Sarkar, S. Umesh, Shakti P Rath, "Computationally efficient speaker identification for large population tasks using MLLR and sufficient statistics", in Odyssey 2010: Speaker and Language Recognition Workshop, Year 2010, Pages 7-11
  57. Achintya Sarkar, S. Umesh, "Investigation of Speaker-Clustered UBMs based on Vocal Tract Lengths and MLLR matrices for Speaker Verification", in Odyssey 2010: Speaker and Language Recognition Workshop, Year 2010, Pages 286-293
  58. Shakti P Rath, Achintya Sarkar, S Umesh, "Effect of Jacobian Compensation in Linear Transformation based VTLN under Matched and Mis-matched Speaker Conditions", in Proceedings of 16th National Conference on Communications, NCC 2010, Year 2010
  59. Achintya Sarkar, Srinivasan Umesh, "Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework" , in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Year 2010, Pages 2738-2741
  60. Rama Sanand Doddipatla, Shakti P Rath, Srinivasan Umesh, "Improving the performance of VTLN under mismatched speaker conditions and making it approach that of matched speaker conditions", in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Year 2009, Pages 4397-4400
  61. Achintya Sarkar, S. Umesh, Shakti P Rath, "Text-Independent Speaker Identification Using Vocal Tract Length Normalization for Building Universal Background Model", in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Year 2009, Pages 2331-2334
  62. Achintya Sarkar, Shakti P Rath, S Umesh, "Fast Approach to Speaker Identification for Large Population using MLLR and Sufficient Statistics", in Proceedings of 16th National Conference on Communications, NCC 2010, Year 2010
  63. Shakti P Rath, Srinivasan Umesh, Achintya Sarkar, "Using VTLN matrices for rapid and computationally-efficient speaker adaptation with robustness to first-pass transcription errors.", in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Year 2009, Pages 572-575
  64. Rama Sanand Doddipatla, Shakti P Rath, Srinivasan Umesh, "A study on the influence of covariance adaptation on Jacobian compensation in vocal tract length normalization", in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Year 2009, Pages 584-587
  65. A. N. Harish, Rama Sanand Doddipatla, Srinivasan Umesh, "Characterizing speaker variability using spectral envelopes of vowel sounds", in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Year 2009, Pages 1107-1110
  66. Shakti P Rath, Srinivasan Umesh, "Acoustic class specific VTLN-warping using regression class trees", in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Year 2009, Pages 556-559
  67. D. R. Sanand and S. Umesh [2008]: ``Study of Jacobian Compensation Using Linear Transformation of Conventional MFCC for VTLN'', To Appear in Interspeech-2008, Brisbane, Sep. 2008
  68. D. R. Sanand, V. Balaji, R. Sandhya Rani and S. Umesh [2008]: ``Use of Spectral Center of Gravity for Generating Speaker Invariant Features for Automatic Speech Recognition'', To Appear in Interspeech-2008, Brisbane, Sep. 2008
  69. P. T. Akhil, S. P. Rath, S. Umesh and D. R. Sanand [2008]: ``A Computationally Efficient Approach to Warp Factor Estimation in VTLN Using EM Algoirthm and Sufficient Statistics'', To Appear in Interspeech-2008, Brisbane, Sep. 2008
  70. S. V. Bharath Kumar and S. Umesh [2008]: ``Non-Uniform Speaker Normalization Using Affine Transformation,'' To Appear in Journal of the Acoustical Society of America, Vol. 124, No. 3, Sep. 2008
  71. R. Sinha and S. Umesh [2008]: ``A Shift based Approach to Speaker Normalization using Non-Linear Frequency-Scaling Model,'' ISCA Transactions on Speech Communication, Vol. 50,No. 3, pp.191-202, Mar. 2008
  72. D. Dinesh Kumar, D. R. Sanand and S. Umesh [2008]: `` Linear Transformation Approach to Speaker Normalization on Conventional MFCC,’’ Proc. Of National Conference on Communications, IIT-Bombay, Feb-2008
  73. R. Sandhya Rani, D. R. Sanand and S. Umesh [2008]: ``Speaker Normzalisation Using Center of Gravity’’ Proc. Of National Conference on Communications, IIT-Bombay, Feb-2008
  74. S. P. Rath, D. R. Sanand and S. Umesh [2008]: `` MAP based Warping factor Estimation in Vocal Tract Length Normalization’’ Proc. Of National Confer+ence on Communications, IIT-Bombay, Feb-2008
  75. S. Umesh and R. Sinha [2007]: ``A Study of Filter-Bank Smoothing in MFCC Features for Recogniti on of Children Speech,'' IEEE Transactions on Audio, Speech and Language Processing, Volume 15, Issue 8, Nov. 2007 Page(s): 2418 – 2430
  76. S. Umesh, L. Cohen and D. Nelson [2007]: `` Fluctuations in Speech'', Fluctuations and Noise Letters, 1.Vol. 7, No. 3, Sep. 2007, pp. 215—224
  77. D. R. Sanand, D. Dinesh Kumar and S. Umesh [2007]: ``Linear Transformation Approach to VTLN Using Dynamic Frequency Warping,'' Proc. of International Conference on Spoken Language Processing (Interspeech 2007), Antwerp, Belgium, August 27-31, 2007. [Acceptance ratio: 59% = 748/1268]
  78. S. Umesh, L. Cohen and D. Nelson [2007]: ``Fluctuations in speech,'' Proc. of Conference on Noise and Fluctuations in Biological, Biophysical, and Biomedical Systems, Florence, Italy, May 2007
  79. S. Umesh, D. Rama Sanand, G. Praveen [2007]: ``Speaker-Invariant Features for Automatic Speech Recognition,'' Proc. of International Joint Conferences on Artificial Intelligence, (IJCAI-07), pp. 1738-1743, Jan. 2007 [Acceptance ratio: 15.5% = 212/1365]
  80. Mohd Amir Khan, D. Rama Sanand, S. Umesh [2007]: ``Jacobian Compensation Using Variance Normalization in Automatic Speech Recognition,'' Proc. of National Conference on Communications, IIT-Kanpur, Jan. 2007
  81. S. Umesh, R. Sinha, D Rama Sanand [2007]: ``Using Vocal-Tract Length Normalization in Recognition of Children Speech,'' Proc. of National Conference on Communications, IIT-Kanpur, Jan. 2007
  82. S. V. Bharath, S. Umesh and R. Sinha [2006]: ``Study of Non-Linear Frequency Warping Functions for Speaker Normalization,'' To Appear in Proc. of IEEE International Conf. on Acoustic, Speech and Signal Processing, (ICASSP Toulouse), April 2006 [Acceptance ratio: 48.1% = 1465/3045]
  83. J. Lööf and H. Ney and S. Umesh [2006]: ``VTLN Warping Factor Estimation Using Accumulation of Sufficient Statistics,'' To Appear in Proc. of IEEE International Conf. on Acoustic, Speech and Signal Processing, (ICASSP Toulouse), April 2006 [Acceptance ratio: 48.1% = 1465/3045]
  84. R. Sinha and S. Umesh [2006]: ``Linear-Transformation Approach to Shift-Based Speaker-Normalisaion'' Proc. of National Conference on Communications , (IIT,Delhi), January 2006
  85. S. Umesh and S. V. Bharath [2006]: ``Study of Non-linear Frequency Warping functions for Speaker Normalisation'' Proc. of National Conference on Communications , (IIT,Delhi), January 2006
  86. S. Umesh, A. Zolnay and H. Ney [2005]: ``Implementing Frequency-Warping and VTLN Through Linear Transformation of Conventional MFCC,'' Proc. of InterSpeech 2005, (Lisbon, Portugal), Sep.'2005 [Acceptance ratio: 62% = 855/1379]
  87. S. Umesh, L. Cohen and D. Nelson [2005]: ``The Speech Scale and Spectral Transformation,'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., July'2005
  88. S. V. Bharath and S. Umesh [2004]: ``Non-uniform speaker normalization using frequency-dependent scaling function,'' Proc. IEEE International Conference on Signal Processing and Communications, (Bangalore), December 2004
  89. S. Tranter, M. J. Gales, R. Sinha, S. Umesh and P. Woodland [2004]: ``The Development of the Cambrdige University RT-04 Diarisation System,'' Proc. of 2004 Rich Transcription Workshop (RT-04) , (Palisades, NY, USA), November 2004
  90. D. Kim, S. Umesh, M. J. Gales, T. Hain and P. Woodland [2004]: ``Using VTLN for Broadcast News Transcription,'' Proc. of International Conference on Spoken Language Processing , (ICSLP, Jeju Island, S.Korea), October 2004
  91. S. V. Bharath, S. Umesh and R. Sinha [2004]: ``Non-Uniform Speaker Normalization using Affine Transformation,'' Proc. of IEEE International Conf. on Acoustic, Speech and Signal Processing, (ICASSP Montreal), Vol. I, pp.121-124, April 2004 Voted the top paper in its review category [Acceptance ratio: 51.8% = 1262/2434]
  92. S. Umesh, R. Sinha and S. V. Bharath [2004]: ``An Investigation into Front-End Signal Processing for Speaker Normalization,'' Proc. of IEEE International Conference on Acoustic, Speech and Signal Processing, (ICASSP Montreal), Vol. I, pp.345-348, April 2004 [Acceptance ratio: 51.8% = 1262/2434]
  93. D. Nelson, D. Smith, S. Umesh, L. Cohen [2003]: ``Estimating speaker scale factors from vowels,'' Proc. of SPIE Conference on Wavelets: Applications in Signal and Image Processing, , vol. 5207, pp. 794-800, July 2003.
  94. R.Sinha and S. Umesh [2003]: ``A Method for Compensation of Jacobian in Speaker Normalization,'' Proc. of IEEE International Conference on Acoustic, Speech and Signal Processing, (ICASSP Hong Kong), April 2003
  95. R. Sinha and S. Umesh [2003]: ``A Study into Front-End Signal Processing for Automatic Speech Recognition,'' Proc. of Workshop on Spoken Language Processing , (TIFR,Mumbai), pp. 87 - 92, January 2003
  96. R. Sinha and S. Umesh [2003]: ``Spectral Smoothing for Vocal-Tract Length Normalization,'' Proc. of National Conference on Communications , (IIT,Chennai), pp. 87 - 92, January 2003
  97. S. Umesh, L. Cohen and D. Nelson [2002]: ``The speech scale, the Mel scale and the Tube Model for Speech,'' Proc. of SPIE Conference on Advanced Signal Processing Algorithms, Architectures and Implementations, vol. 4791, pp. 7 - 23, July 2002.
  98. S. Umesh, L. Cohen and D. Nelson [2002]: ``The Speech Scale,'' Acoustics Research Letters Online of the Journal of Acoustical Society of America, Vol. 3, Issue 3, pp.83-88, July 2002.
  99. R.Sinha and S. Umesh [2002]: ``Non-Uniform Scaling Based Speaker-Normalization,'' Proc. of IEEE International Conference on Acoustic, Speech and Signal Processing, (ICASSP Orlando, USA), Vol. I, pp. 589-592, May 2002 [Acceptance ratio: 56.9% = 1007/1770]
  100. S. Umesh, S. V. Bharath, M. K. Vinay, R. Sharma and R. Sinha [2002]: ``A Simple Approach to Non-Uniform Vowel Normalization,'' Proc. of IEEE International Conference on Acoustic, Speech and Signal Processing, (ICASSP, Orlando, USA),Vol. I, pp. 517-520, May 2002 [Acceptance ratio: 56.9% = 1007/1770]
  101. S. Umesh, L. Cohen and D. Nelson [2002]: ``Frequency Warping and the Mel-scale,'' IEEE Signal Processing Letters, vol. 9, no. 3, pp.104-107, March 2002.
  102. S. Umesh, D. Nelson and L. Cohen [2001]: ``Further Experimental Results on the Speech-Hearing Connection,'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., Vol. 4478, pp. 361-366, July'2001
  103. S. Umesh, Richard C. Rose, and S. Parthasarathy [2000]: ``Exploiting Frequency-Scaling Invariance Properties of the Scale Transform for Automatic Speech Recognition,'' in Proc. of International Conference on Spoken Language Processing, (ICSLP Beijing, China), pp. 651-654, Oct.'2000
  104. D. Nelson, S. Umesh, and L. Cohen [2000]: ``High Frequency Formant Estimation & Its Application in Frequency-Scaling of Speech,'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., vol. 4119, pp. 294-301, July'2000
  105. S. Umesh, L. Cohen and D. Nelson [1999]: ``Scale-Transform Based Features for Application in Speech Recognition,'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., Vol. 3813, pp.727-731, July'1999
  106. S. Umesh, L. Cohen, and D. Nelson [1999]: ``Fitting the Mel-Scale,'' Proc. IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Phoenix, Arizona, USA), Vol. 1, pp. 217-220, March 1999. [Acceptance ratio: 58.2% = 869/1490]
  107. S. Umesh, L. Cohen, N. Marinovic, and D. J. Nelson [1999]: ``Scale-Transform in Speech Analysis,'' IEEE Transactions on Speech and Audio Processing, vol. 7, no. 1, pp.40-45, Jan. 1999.
  108. S.Umesh, M. Belkhode and Rohit Sinha [1999]: ``Comparison of Front-End Features for Speech Recognition'' Proc. of National Conf. on Communications, (Kharagpur), pp.163-170, Jan. 1999
  109. S.Umesh, L.Cohen and D.Nelson [1998]: ``Warping Functions in Speech'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., Vol. 3458, pp.194-209, July'1998
  110. S. Umesh, L. Cohen, and D. J. Nelson [1998]: ``Improved Scale-Cepstral Analysis in Speech,'' IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Seattle, USA), pp. 637-640, May 1998.
  111. S. Umesh, L. Cohen, and D. Nelson [1997]: ``Improvements in Scale-Cepstral Features for Speech Analysis,'' in Proc. SPIE Conference on Wavelet Applications in Signal & Image Proc., (San Diego, USA), vol. 3169, pp. 481-494, July 1997.
  112. S.Umesh, A. Rao, G.Cristobal, L.Cohen and J.H. van Deemter [1997]: ``Global and local translation and magnification'' Proc. of SPIE Conference on Statistical & Stochastic Methods in Image Processing, Vol. 3167, pp.106-117, July'1997.
  113. S. Umesh, L. Cohen, and D. J. Nelson [1997]: ``Frequency-Warping and Speaker-Normalization,''IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Munich, Germany), pp. 983-986, May 1997.
  114. S. Umesh, L. Cohen, N. Marinovic, and D. J. Nelson [1996]: ``Frequency-Warping in Speech,'' in Proc. International Conference on Spoken Language Processing, (ICSLP Philadelphia,USA), pp. 414-417, October 1996.
  115. S. Umesh and D. W. Tufts [1996]: ``Estimation of Parameters of Multiple Exponentially Damped Sinusoids using Fast Maximum Likelihood Estimation with Application to NMR Spectroscopy Data,'' IEEE Trans. Signal Processing, vol. 44, no. 9, pp.2245-2259, Sept. 1996.​
  116. S. Umesh, L. Cohen, N. Marinovic, and D. J. Nelson [1996]: ``Psychoacoustic-Frequency Scales versus Frequency-Warping in Scale cepstrum ,'' in Proc. SPIE Conference on Wavelet Applications in Signal & Image Proc. , Vol. 2825, pp. 530-539, July 1996.
  117. S. Umesh and D. J. Nelson [1996]: ``Computationally Efficient Estimation of Sinusoidal Frequency at low SNR,'' in Proc. IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Atlanta, USA), pp. 2797-2800, May 1996.
  118. L. Cohen, N. Marinovic, S. Umesh, and D. Nelson [1995]: ``Scale-Invariant Speech Analysis via joint time-frequency-scale processing,'' in Proc. SPIE Conference on Wavelet Applications in Signal & Image Proc., Vol. 2569, pp. 522-537, July 1995.
  119. N. Marinovic, L. Cohen, S. Umesh, and D. Nelson [1995]: ``Classification of Digital Modulation Types,'' in Proc. SPIE Conference on Advanced Signal Processing Algorithms, vol. SPIE-2563, (San Diego, USA), pp. 125-143, July 1995.
  120. N. Marinovic, L. Cohen, and S. Umesh [1994]: ``Joint Representations in Time and Frequency Scale for Harmonic Type Signals,'' in Proc. IEEE-SP International Symposium on T-F and T-S Representations, (Philadelphia, PA), pp. 84-87, October 1994.
  121. N. Marinovic, L. Cohen, and S. Umesh [1994]: ``Scale and Harmonic Signal Analysis,'' in Proc. International Society of Optical Engineering Conference on Wavelet Applications in Signal & Image Proc., Vol. 2303, pp. 411-418, August 1994.
  122. D. W. Tufts, H. Ge, and S. Umesh [1993]: ``Fast Maximum Likelihood Estimation of Signal Parameters using the Shape of the Compressed Likelihood Function,'' IEEE Journal of Oceanic Engg., Vol. 18, no. 4, pp. 388-400, Oct. 1993. (Invited Paper).
  123. E. Wilson, S. Umesh, and D. W. Tufts [1993]: ``Multistage Neural Network Structure for Transient Detection and Feature Extraction,'' in Proc. IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Minneapolis, USA), pp. 489-492, April 1993.
  124. E. Wilson, S. Umesh, and D. W. Tufts [1992]: ``Designing a Neural Network Structure for Transient Detection Using the Subspace Inhibition Filter Algorithm,'' in Proc. IEEE Oceans '92, pp. 120-125 (Newport, USA), Oct. 1992.
  125. E. Wilson, S. Umesh, and D. W. Tufts [1992]: ``Resolving the Components of Transient Signals Using the Neural Network and Subspace Inhibition Filter Algorithms,'' in Proc. International Joint Conference on Neural Networks, (Baltimore, USA), pp. 283-288, June 1992.
  126. S. Umesh and D. W. Tufts [1992]: ``Resolving the Components of Transient Signals by a Multistage Procedure,'' in Proc. IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP San Francisco, USA), pp. 553-556, March 1992.
  127. G. F. Boudreaux-Bartels, D. W. Tufts, and S. Umesh [1991]: ``On Improving the Detection of Gabor Components.,'' in Proc. of Mini ASSP Conference, (Boston, USA), April 1991.

(B): Technical Workshop Presentations

  1. S. Tranter and S. Umesh [2004]: ``Diarisation Research at CUED,'' Meta-Data Evaluation (MDE) Technical Meeting of U.S. ARPA's Effecti ve Affordable Reusable Speeech (EARS) Project, (Boston, USA), May 2004
  2. D.Y. Kim, M.J.F. Gales, H.Y.Chan, P.C. Woodland, S. Umesh and T. Hain [2004]: ``Progress in Broadcast News English Transcription,'' Speech-to-Text (STT) Workshop of ARPA's EARS Project, (Montreal, Canada), May 2004

(C): Invited Talks

  1. S. Umesh [2007]: ``Introduction to Large Vocabulary Continuous Speech Recognition'' National Conference on Communications, (IIT-Kanpur), Jan.200
  2. S. Umesh [2006]: ``Statistical Fundamentals for Speech Recognition'' Winter School on Speech & Audio Processing (WISSAP-06), (IISc., Bangalore), Jan. 2006
  3. S. Umesh [2005]: ``Large Vocabulary Continuous Speech Recognition,'' International Conference on Natural Language Processing, (IIT, Kanpur), Dec. 2005

(D): Books

  1. Ajit K. Chaturvedi, Srinivasan Umesh, Adrish Banerjee, Kameswari Chebrolu, Joseph John, Ayyangar R. Harish (Editors): Proceedings of the Thirteenth National Conference on Communications, I.I.T. Kanpur, 26-28 January 2007. ISBN Number: 978-81-904444-0-8