Sukhadia, Vrunda N; Umesh, S; Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models; To Appear at IEEE SLT 2022
Lodagala, Vasista Sai; Ghosh, Sreyan; Umesh, S; PADA: PRUNING ASSISTED DOMAIN ADAPTATION FOR SELF-SUPERVISED SPEECH REPRESENTATIONS; To Appear at IEEE SLT 2022
Lodagala, Vasista Sai; Ghosh, Sreyan; Umesh, S; CCC-WAV2VEC 2.0: CLUSTERING AIDED CROSS CONTRASTIVE SELF-SUPERVISED LEARNING OF SPEECH REPRESENTATIONS; To Appear at IEEE SLT 2022
Arunkumar, A., Umesh, S. (2022) Joint Encoder-Decoder Self-Supervised Pre-training for ASR. Proc. Interspeech 2022, 3418-3422, doi: 10.21437/Interspeech.2022-11338
Bhanushali, A., Bridgman, G., G, D., Ghosh, P., Kumar, P., Kumar, S., Raj Kolladath, A., Ravi, N., Seth, A., Seth, A., Singh, A., Sukhadia, V., S, U., Udupa, S., Prasad, L.V.S.V.D. (2022) Gram Vaani ASR Challenge on spontaneous telephone speech recordings in regional variations of Hindi. Proc. Interspeech 2022, 3548-3552, doi: 10.21437/Interspeech.2022-11371
Ghosh, S., Kumar, S., Kumar, Y., Ratn Shah, R., Umesh, S. (2022) Span Classification with Structured Information for Disfluency Detection in Spoken Utterances. Proc. Interspeech 2022, 3998-4002, doi: 10.21437/Interspeech.2022-11242
Arunkumar, A., Nileshkumar Sukhadia, V., Umesh, S. (2022) Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition. Proc. Interspeech 2022, 5145-5149, doi: 10.21437/Interspeech.2022-11376
Ghosh, S., Lepcha, S., Sakshi, S., Shah, R.R., Umesh, S. (2022) DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances. Proc. Interspeech 2022, 5185-5189, doi: 10.21437/Interspeech.2022-10752
P. Kumar, V. N. Sukhadia and S. Umesh, "Investigation of Robustness of Hubert Features from Different Layers to Domain, Accent and Language Variations," IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2022, doi: 10.1109/ICASSP43922.2022.9746250.
Vishwas M. Shetty ., Metilda Sagaya Mary N.J. ,S. Umesh .,"Improving the Performance of Transformer Based Low Resource Speech Recognition for Indian Languages" in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Volume 2020-May, Year 2020, Pages 8279-8283
Metilda Sagaya Mary N J ., Vishwas M. Shetty ., S. Umesh ., "Investigation of Methods to Improve the Recognition Performance of Tamil-English Code-Switched Data in Transformer Framework", in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Volume 2020-May, Year 2020, Pages 7889-7893
Prakash, A., Leela Thomas, A., Umesh, S., A Murthy, H. (2019) Building Multilingual End-to-End Speech Synthesisers for Indian Languages. Proc. 10th ISCA Speech Synthesis Workshop, 194-199, DOI: 10.21437/SSW.2019-35.
Shetty, Vishwas M.;Sharon, Rini A.;Abraham, Basil;Seeram, Tejaswi;Prakash, Anusha;Ravi, Nithya;Umesh, S., "Articulatory and stacked bottleneck features for low resource speech recognition", in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Volume 2018-September, Year 2018, Pages 3202-3206
Neethu Mariam Joy, S. Umesh, "Improving Acoustic Models in TORGO Dysarthric Speech Database", in IEEE Transactions on Neural Systems and Rehabilitation Engineering, Volume 26, Year 2018, Pages 637-645
Neethu Mariam Joy, Sandeep Reddy Kothinti, Srinivasan Umesh, "FMLLR Speaker Normalization With i-vector: In Pseudo-FMLLR and Distillation Framework", in IEEE/ACM Transactions on Audio Speech and Language Processing, Volume 26, Year 2018, Pages 797-805
Neethu Mariam Joy, Murali Karthik Baskar, S. Umesh "DNNs for Unsupervised Extraction of Pseudo Normalized Features Without Explicit Adaptation Data”, Journal of Speech Communication, Vol. 92, pp. 64-76, September 2017.
Basil Abraham, S. Umesh “ An automated technique to generate phone-to-articulatory label mapping”. Journal of Speech Communication, (Elsevier), Vol. 86, pp. 107-120, 2017.
Basil Abraham, Tejaswi Seeram, S. Umesh “ Transfer Learning and Distillation Techniques to Improve the Acoustic Modeling of Low Resource Languages” Proc. of International Conference on Spoken Language Processing , (Interspeech 2017), (Stockholm, Sweden), August 2017.
Neethu Mariam Joy, Sandeep Reddy Kothinti, S. Umesh, Basil Abraham “ Generalized Distillation Framework for Speaker Normalization” Proc. of International Conference on Spoken Language Processing , (Interspeech 2017), (Stockholm, Sweden), August 2017.
Neethu Mariam Joy, S. Umesh, Basil Abraham “ On Improving Acoustic Models for TORGO Dysarthric Speech Database” Proc. of International Conference on Spoken Language Processing , (Interspeech 2017), (Stockholm, Sweden), August 2017.
Basil Abraham, S. Umesh, Neethu Mariam Joy “ Joint Estimation of Articulatory Features and Acoustic Model for Low Resource Languages” Proc. of International Conference on Spoken Language Processing , (Interspeech 2017), (Stockholm, Sweden), August 2017.
Seeram Tejaswi, S. Umesh “ DNN Acoustic Models for Dysarthric Speech” Proc. of Twenty Third National Conference on Communication (NCC – 2017), Madras, 2017.
Seeram Tejaswi, S. Umesh “ Addressing Data Sparsity in DNN Acoustic Modeling” Proc. of Twenty Third National Conference on Communication (NCC – 2017), Madras, 2017.
Basil Abraham, S. Umesh and Neethu M Joy “ Articulatory Feature Extraction Using
CTC to Build Articulatory Classifiers Without Forced Frame Alignments for Speech
Recognition” Proc. of International Conference on Spoken Language Processing ,
(Interspeech 2016), (San Francisco, USA), pp.798-802
Neethu M. Joy, M. K. Baskar, S. Umesh and Basil Abraham “ DNNs for Unsupervised
Extraction of Pseudo FMLLR Features Without Explicit Adaptation Data” Proc. of
International Conference on Spoken Language Processing , (Interspeech-2016), (San
Francisco, USA), pp.3479-3483
Basil Abraham, S. Umesh and Neethu M. Joy “ Overcoming Data Sparsity in Acoustic
Modeling of Low-Resource Language by Borrowing Data and Model Parameters from
High-Resource Languages” Proc. of International Conference on Spoken Language
Processing , (Interspeech-2016), (San Francisco, USA), pp.3037-3041
Neethu M. Joy, B. Abraham, K. Navneeth and S. Umesh “ Improved acoustic modeling
of low-resource languages using shared SGMM parameters of high-resource languages”
Proc. of Twenty Second National Conference on Communication (NCC – 2016),
Guwahati.
B. Murali Karthick, Prateek Kolhar and S. Umesh “ Speaker Adaptation of Convolu-
tional Neural Network Using Speaker Specific Subspace Vectors of SGMM”. Proc. of
International Conference on Spoken Language Processing , (Interspeech-2015), (Dres-
den, Germany), Sep. 2015.
Vikas Joshi, Raghavendra Bilgi, S. Umesh, Luz Garcia, and Carmen Benitez “ Sub-
band based histogram equalization in cepstral domain for speech recognition”. Journal
of Speech Communication, (Elsevier), Vol. 69, pp. 46-65, May 2015.
R. Sriranjani, S. Umesh and M.R. Reddy “ Automatic Severity Assessment of Dysarthria
using State-Specific Vectors”. Journal Biomedical Sciences Instrumentation, (Plenum),
Vol. 51, pp. 99-106, April 2015.
Joshi, Vikas, Prasad, N Vishnu, S. Umesh “ Modified Mean and Variance Normaliza-
tion: Transforming to Utterance-Specific Estimates”. Journal of Circuits, Systems,
and Signal Processing, (Springer), pp. 1-17, 2015.
Sriranjani R, S. Umesh, others “ Investigation of different acoustic modeling techniques
for low resource Indian language data (Inproceeding)”. Proc. of Twenty First National
Conference on Communications, (NCC-2015), Mumbai, 2015.
R. Sriranjani, M. Ramasubba Reddy, S. Umesh “ Improved acoustic modeling for auto-
matic dysarthric speech recognition (Inproceeding)”. Proc. of Twenty First National
Conference on Communications, , (NCC-2015), pp. 1-6, Mumbai, 2015.
Mohan Aanchan, Richard Rose, Sina Hamidi Ghalehjegh and S. Umesh. “ Acoustic
modelling for speech recognition in Indian languages in an agricultural commodities
task domain ”, Journal of Speech Communication, (Elsevier), Vol. 56, pp. 167–180,
2014.
B. Murali Karthick and S. Umesh. “Improving deep neural networks using state
projection vectors of subspace Gaussian mixture model as features” Proc. of IEEE
Spoken Language Technology Workshop, pp. 129-134, South Lake Tahoe, NV, USA,
December 7-10, 2014.
S. Umesh, Basil Abraham, Joy Neethu Mariam, K. Navneeth “A data-driven phoneme
mapping technique using interpolation vectors of phone-cluster adaptive training”
IEEE Spoken Language Technology Workshop, South Lake Tahoe, NV, USA, Decem-
ber 7-10, 2014, IEEE 2014.
Neethu Mariam Joy, Basil Abraham, K. Navneeth and S. Umesh. “ Cross-lingual
acoustic modeling for Indian languages based on Subspace Gaussian Mixture Models”
Proc. of Twentieth National Conference on Communications, (NCC-2014), pp. 1-5.
IEEE, 2014.
Sriranjani R, B. Murali Karthick, and S. Umesh. “Experiments on front-end tech-
niques and segmentation model for robust Indian Language speech recognizer” Proc.
of Twentieth National Conference on Communications, (NCC-2014), pp. 1-6. IEEE,
2014.
Vimal Manohar, Bhargav Srinivas Ch and S. Umesh, “Acoustic Modeling Using
Transform-based Phone-Cluster Adaptive Training”, Proc. of IEEE Workshop on
Automatic Speech Recognition Understanding, Olomouc, Czech Republic, December
2013.
D S Pavan Kumar, N. Vishnu Prasad, Vikas Joshi and S. Umesh, “Modified SPLICE
and its Extension to Non-Stereo Data for Noise Robust Speech Recognition”, Proc.
of IEEE Workshop on Automatic Speech Recognition Understanding, Olomouc, Czech
Republic, December 2013.
N. Vishnu Prasad and S. Umesh, “Improved Cepstral Mean and Variance Normaliza-
tion using Bayesian Framework”, Proc. of of IEEE Workshop on Automatic Speech
Recognition Understanding, Olomouc, Czech Republic, December 2013.
Vikas Joshi, N. Vishnu Prasad, S. Umesh, “Modified Cepstral Mean Normalization
- Transforming to utterance specific non-zero mean” - Proc. of International Con-
ference on Spoken Language Processing, Interspeech-2013, pp. 881-885, Lyon, France,
September 2013.
D S Pavan Kumar, Raghavendra Bilgi and S. Umesh, “Non-Negative Subspace Projec-
tion During Conventional MFCC Feature Extraction for Noise Robust Speech Recog-
nition”, in Proc. of Nineteenth National Conference on Communications, (NCC-
2013), Delhi, India, Feb 2013.
Bhargav Srinivas Ch, Neethu Joy, Raghavendra Bilgi and S. Umesh, “Subspace Mod-
eling Techniques Using Monophones for Speech Recognition”, inProc. of Nineteenth
National Conference on Communications, (NCC-2013), Delhi, India, Feb 2013.
D.R. Sanand, S. Umesh, “VTLN Using Analytically Determined Linear Transforma-
tion on Conventional MFCC”, IEEE Transactions on Audio, Speech, and Language
Processing, pp. 1573-1584, July 2012.
A. K. Sarkar, S. Umesh, “Multiple background models for speaker verification using
the concept of vocal tract length and MLLR super-vector”, International Journal of
Speech Technology,, (Springer), Vo. 15, No. 3, pp. 351-364, 2012.
Aanchan Mohan, S. Umesh and Richard C. Rose, “Subspace based acoustic modelling
in Indian languages”, IEEE Conference on Information Science, Signal Processing and
their Applications, Montreal, Canada, 2012.
Vikas Joshi, R. Bilgi, S. Umesh, G. Luz and B. Carmen, “Noise and Speaker Compen-
sation in Log Filter Bank Domain”- Proc. of IEEE International Conf. on Acoustic,
Speech and Signal Processing, ICASSP-2012, pp. 4709-4712, Kyoto, Japan, 2012.
R. Bilgi, Vikas Joshi, S. Umesh, G. Luz and B. Carmen, “Robust Speech Recognition
through the selection of Speaker and Noise transforms”- Proc. of IEEE International
Conf. on Acoustic, Speech and Signal Processing, ICASSP-2012, pp. 4333-4336, Ky-
oto, Japan, 2012.
A. K. Sarkar, S. Umesh and J. F. Bonastre “Computationally Efficient Speaker Iden-
tification Using Fast-MLLR Based Anchor Modeling”, Proc. of IEEE International
Conf. on Acoustic, Speech and Signal Processing, ICASSP-2012, pp 4357-4360, Kyoto,
Japan, 2012.
S. Umesh, “Studies on Inter-Speaker Variability in Speech and its Application in
Automatic Speech Recognition”, Sadhana, (Springer), (Invited Paper), Vol. 36, Part
5, pp. 853–883, October 2011.
A. K. Sarkar and S. Umesh, ‘Eigen-Voice Based Anchor Modeling System for Speaker
Identification using MLLR Super-Vector”, in Proc. of International Conference on
Spoken Language Processing, Interspeech-2011, pp. 2357-2360, Florence, Italy, 2011
Joshi V., Bilgi R., S. Umesh, Benitez C., & Garcia L. “Efficient Speaker and Noise
Normalization for Robust Speech Recognition”. inProc. of International Conference
on Spoken Language Processing, (Interspeech-2011), pp. 2601-2604, Florence, Italy,
2011.
Joshi V., Bilgi R., S. Umesh, Garcia L. & Benitez C., “Sub-Band Level Histogram
Equalization for Robust Speech Recognition”, in Proc. of International Conference on
Spoken Language Processing, (Interspeech-2011), pp. 661-664, Florence, Italy, 2011.
Achintya Sarkar, Shakti P Rath, S. Umesh, "Vocal Tract Length Normalization factor based speaker-cluster UBM for speaker verification", in Proceedings of 16th National Conference on Communications, NCC 2010, Year 2010
Achintya Sarkar, S. Umesh, Shakti P Rath, "Computationally efficient speaker identification for large population tasks using MLLR and sufficient statistics", in Odyssey 2010: Speaker and Language Recognition Workshop, Year 2010, Pages 7-11
Achintya Sarkar, S. Umesh, "Investigation of Speaker-Clustered UBMs based on Vocal Tract Lengths and MLLR matrices for Speaker Verification", in Odyssey 2010: Speaker and Language Recognition Workshop, Year 2010, Pages 286-293
Shakti P Rath, Achintya Sarkar, S Umesh, "Effect of Jacobian Compensation in Linear Transformation based VTLN under Matched and Mis-matched Speaker Conditions", in Proceedings of 16th National Conference on Communications, NCC 2010, Year 2010
Achintya Sarkar, Srinivasan Umesh, "Fast computation of speaker characterization vector using MLLR and sufficient statistics in anchor model framework" , in Proceedings of the 11th Annual Conference of the International Speech Communication Association, INTERSPEECH 2010, Year 2010, Pages 2738-2741
Rama Sanand Doddipatla, Shakti P Rath, Srinivasan Umesh, "Improving the performance of VTLN under mismatched speaker conditions and making it approach that of matched speaker conditions", in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Year 2009, Pages 4397-4400
Achintya Sarkar, S. Umesh, Shakti P Rath, "Text-Independent Speaker Identification Using Vocal Tract Length Normalization for Building Universal Background Model", in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Year 2009, Pages 2331-2334
Achintya Sarkar, Shakti P Rath, S Umesh, "Fast Approach to Speaker Identification for Large Population using MLLR and Sufficient Statistics", in Proceedings of 16th National Conference on Communications, NCC 2010, Year 2010
Shakti P Rath, Srinivasan Umesh, Achintya Sarkar, "Using VTLN matrices for rapid and computationally-efficient speaker adaptation with robustness to first-pass transcription errors.", in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Year 2009, Pages 572-575
Rama Sanand Doddipatla, Shakti P Rath, Srinivasan Umesh, "A study on the influence of covariance adaptation on Jacobian compensation in vocal tract length normalization", in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Year 2009, Pages 584-587
A. N. Harish, Rama Sanand Doddipatla, Srinivasan Umesh, "Characterizing speaker variability using spectral envelopes of vowel sounds", in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Year 2009, Pages 1107-1110
Shakti P Rath, Srinivasan Umesh, "Acoustic class specific VTLN-warping using regression class trees", in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Year 2009, Pages 556-559
D. R. Sanand and S. Umesh [2008]: ``Study of Jacobian Compensation Using Linear Transformation of Conventional MFCC for VTLN'', To Appear in Interspeech-2008, Brisbane, Sep. 2008
D. R. Sanand, V. Balaji, R. Sandhya Rani and S. Umesh [2008]: ``Use of Spectral Center of Gravity for Generating Speaker Invariant Features for Automatic Speech Recognition'', To Appear in Interspeech-2008, Brisbane, Sep. 2008
P. T. Akhil, S. P. Rath, S. Umesh and D. R. Sanand [2008]: ``A Computationally Efficient Approach to Warp Factor Estimation in VTLN Using EM Algoirthm and Sufficient Statistics'', To Appear in Interspeech-2008, Brisbane, Sep. 2008
S. V. Bharath Kumar and S. Umesh [2008]: ``Non-Uniform Speaker Normalization Using Affine Transformation,'' To Appear in Journal of the Acoustical Society of America, Vol. 124, No. 3, Sep. 2008
R. Sinha and S. Umesh [2008]: ``A Shift based Approach to Speaker Normalization using Non-Linear Frequency-Scaling Model,'' ISCA Transactions on Speech Communication, Vol. 50,No. 3, pp.191-202, Mar. 2008
D. Dinesh Kumar, D. R. Sanand and S. Umesh [2008]: `` Linear Transformation Approach to Speaker Normalization on Conventional MFCC,’’ Proc. Of National Conference on Communications, IIT-Bombay, Feb-2008
R. Sandhya Rani, D. R. Sanand and S. Umesh [2008]: ``Speaker Normzalisation Using Center of Gravity’’ Proc. Of National Conference on Communications, IIT-Bombay, Feb-2008
S. P. Rath, D. R. Sanand and S. Umesh [2008]: `` MAP based Warping factor Estimation in Vocal Tract Length Normalization’’ Proc. Of National Confer+ence on Communications, IIT-Bombay, Feb-2008
S. Umesh and R. Sinha [2007]: ``A Study of Filter-Bank Smoothing in MFCC Features for Recogniti on of Children Speech,'' IEEE Transactions on Audio, Speech and Language Processing, Volume 15, Issue 8, Nov. 2007 Page(s): 2418 – 2430
S. Umesh, L. Cohen and D. Nelson [2007]: `` Fluctuations in Speech'', Fluctuations and Noise Letters, 1.Vol. 7, No. 3, Sep. 2007, pp. 215—224
D. R. Sanand, D. Dinesh Kumar and S. Umesh [2007]: ``Linear Transformation Approach to VTLN Using Dynamic Frequency Warping,'' Proc. of International Conference on Spoken Language Processing (Interspeech 2007), Antwerp, Belgium, August 27-31, 2007. [Acceptance ratio: 59% = 748/1268]
S. Umesh, L. Cohen and D. Nelson [2007]: ``Fluctuations in speech,'' Proc. of Conference on Noise and Fluctuations in Biological, Biophysical, and Biomedical Systems, Florence, Italy, May 2007
S. Umesh, D. Rama Sanand, G. Praveen [2007]: ``Speaker-Invariant Features for Automatic Speech Recognition,'' Proc. of International Joint Conferences on Artificial Intelligence, (IJCAI-07), pp. 1738-1743, Jan. 2007 [Acceptance ratio: 15.5% = 212/1365]
Mohd Amir Khan, D. Rama Sanand, S. Umesh [2007]: ``Jacobian Compensation Using Variance Normalization in Automatic Speech Recognition,'' Proc. of National Conference on Communications, IIT-Kanpur, Jan. 2007
S. Umesh, R. Sinha, D Rama Sanand [2007]: ``Using Vocal-Tract Length Normalization in Recognition of Children Speech,'' Proc. of National Conference on Communications, IIT-Kanpur, Jan. 2007
S. V. Bharath, S. Umesh and R. Sinha [2006]: ``Study of Non-Linear Frequency Warping Functions for Speaker Normalization,'' To Appear in Proc. of IEEE International Conf. on Acoustic, Speech and Signal Processing, (ICASSP Toulouse), April 2006 [Acceptance ratio: 48.1% = 1465/3045]
J. Lööf and H. Ney and S. Umesh [2006]: ``VTLN Warping Factor Estimation Using Accumulation of Sufficient Statistics,'' To Appear in Proc. of IEEE International Conf. on Acoustic, Speech and Signal Processing, (ICASSP Toulouse), April 2006 [Acceptance ratio: 48.1% = 1465/3045]
R. Sinha and S. Umesh [2006]: ``Linear-Transformation Approach to Shift-Based Speaker-Normalisaion'' Proc. of National Conference on Communications , (IIT,Delhi), January 2006
S. Umesh and S. V. Bharath [2006]: ``Study of Non-linear Frequency Warping functions for Speaker Normalisation'' Proc. of National Conference on Communications , (IIT,Delhi), January 2006
S. Umesh, A. Zolnay and H. Ney [2005]: ``Implementing Frequency-Warping and VTLN Through Linear Transformation of Conventional MFCC,'' Proc. of InterSpeech 2005, (Lisbon, Portugal), Sep.'2005 [Acceptance ratio: 62% = 855/1379]
S. Umesh, L. Cohen and D. Nelson [2005]: ``The Speech Scale and Spectral Transformation,'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., July'2005
S. V. Bharath and S. Umesh [2004]: ``Non-uniform speaker normalization using frequency-dependent scaling function,'' Proc. IEEE International Conference on Signal Processing and Communications, (Bangalore), December 2004
S. Tranter, M. J. Gales, R. Sinha, S. Umesh and P. Woodland [2004]: ``The Development of the Cambrdige University RT-04 Diarisation System,'' Proc. of 2004 Rich Transcription Workshop (RT-04) , (Palisades, NY, USA), November 2004
D. Kim, S. Umesh, M. J. Gales, T. Hain and P. Woodland [2004]: ``Using VTLN for Broadcast News Transcription,'' Proc. of International Conference on Spoken Language Processing , (ICSLP, Jeju Island, S.Korea), October 2004
S. V. Bharath, S. Umesh and R. Sinha [2004]: ``Non-Uniform Speaker Normalization using Affine Transformation,'' Proc. of IEEE International Conf. on Acoustic, Speech and Signal Processing, (ICASSP Montreal), Vol. I, pp.121-124, April 2004 Voted the top paper in its review category [Acceptance ratio: 51.8% = 1262/2434]
S. Umesh, R. Sinha and S. V. Bharath [2004]: ``An Investigation into Front-End Signal Processing for Speaker Normalization,'' Proc. of IEEE International Conference on Acoustic, Speech and Signal Processing, (ICASSP Montreal), Vol. I, pp.345-348, April 2004 [Acceptance ratio: 51.8% = 1262/2434]
D. Nelson, D. Smith, S. Umesh, L. Cohen [2003]: ``Estimating speaker scale factors from vowels,'' Proc. of SPIE Conference on Wavelets: Applications in Signal and Image Processing, , vol. 5207, pp. 794-800, July 2003.
R.Sinha and S. Umesh [2003]: ``A Method for Compensation of Jacobian in Speaker Normalization,'' Proc. of IEEE International Conference on Acoustic, Speech and Signal Processing, (ICASSP Hong Kong), April 2003
R. Sinha and S. Umesh [2003]: ``A Study into Front-End Signal Processing for Automatic Speech Recognition,'' Proc. of Workshop on Spoken Language Processing , (TIFR,Mumbai), pp. 87 - 92, January 2003
R. Sinha and S. Umesh [2003]: ``Spectral Smoothing for Vocal-Tract Length Normalization,'' Proc. of National Conference on Communications , (IIT,Chennai), pp. 87 - 92, January 2003
S. Umesh, L. Cohen and D. Nelson [2002]: ``The speech scale, the Mel scale and the Tube Model for Speech,'' Proc. of SPIE Conference on Advanced Signal Processing Algorithms, Architectures and Implementations, vol. 4791, pp. 7 - 23, July 2002.
S. Umesh, L. Cohen and D. Nelson [2002]: ``The Speech Scale,'' Acoustics Research Letters Online of the Journal of Acoustical Society of America, Vol. 3, Issue 3, pp.83-88, July 2002.
R.Sinha and S. Umesh [2002]: ``Non-Uniform Scaling Based Speaker-Normalization,'' Proc. of IEEE International Conference on Acoustic, Speech and Signal Processing, (ICASSP Orlando, USA), Vol. I, pp. 589-592, May 2002 [Acceptance ratio: 56.9% = 1007/1770]
S. Umesh, S. V. Bharath, M. K. Vinay, R. Sharma and R. Sinha [2002]: ``A Simple Approach to Non-Uniform Vowel Normalization,'' Proc. of IEEE International Conference on Acoustic, Speech and Signal Processing, (ICASSP, Orlando, USA),Vol. I, pp. 517-520, May 2002 [Acceptance ratio: 56.9% = 1007/1770]
S. Umesh, L. Cohen and D. Nelson [2002]: ``Frequency Warping and the Mel-scale,'' IEEE Signal Processing Letters, vol. 9, no. 3, pp.104-107, March 2002.
S. Umesh, D. Nelson and L. Cohen [2001]: ``Further Experimental Results on the Speech-Hearing Connection,'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., Vol. 4478, pp. 361-366, July'2001
S. Umesh, Richard C. Rose, and S. Parthasarathy [2000]: ``Exploiting Frequency-Scaling Invariance Properties of the Scale Transform for Automatic Speech Recognition,'' in Proc. of International Conference on Spoken Language Processing, (ICSLP Beijing, China), pp. 651-654, Oct.'2000
D. Nelson, S. Umesh, and L. Cohen [2000]: ``High Frequency Formant Estimation & Its Application in Frequency-Scaling of Speech,'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., vol. 4119, pp. 294-301, July'2000
S. Umesh, L. Cohen and D. Nelson [1999]: ``Scale-Transform Based Features for Application in Speech Recognition,'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., Vol. 3813, pp.727-731, July'1999
S. Umesh, L. Cohen, and D. Nelson [1999]: ``Fitting the Mel-Scale,'' Proc. IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Phoenix, Arizona, USA), Vol. 1, pp. 217-220, March 1999. [Acceptance ratio: 58.2% = 869/1490]
S. Umesh, L. Cohen, N. Marinovic, and D. J. Nelson [1999]: ``Scale-Transform in Speech Analysis,'' IEEE Transactions on Speech and Audio Processing, vol. 7, no. 1, pp.40-45, Jan. 1999.
S.Umesh, M. Belkhode and Rohit Sinha [1999]: ``Comparison of Front-End Features for Speech Recognition'' Proc. of National Conf. on Communications, (Kharagpur), pp.163-170, Jan. 1999
S.Umesh, L.Cohen and D.Nelson [1998]: ``Warping Functions in Speech'' Proc. of SPIE Conference on Wavelet Applications in Signal & Image Proc., Vol. 3458, pp.194-209, July'1998
S. Umesh, L. Cohen, and D. J. Nelson [1998]: ``Improved Scale-Cepstral Analysis in Speech,'' IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Seattle, USA), pp. 637-640, May 1998.
S. Umesh, L. Cohen, and D. Nelson [1997]: ``Improvements in Scale-Cepstral Features for Speech Analysis,'' in Proc. SPIE Conference on Wavelet Applications in Signal & Image Proc., (San Diego, USA), vol. 3169, pp. 481-494, July 1997.
S.Umesh, A. Rao, G.Cristobal, L.Cohen and J.H. van Deemter [1997]: ``Global and local translation and magnification'' Proc. of SPIE Conference on Statistical & Stochastic Methods in Image Processing, Vol. 3167, pp.106-117, July'1997.
S. Umesh, L. Cohen, and D. J. Nelson [1997]: ``Frequency-Warping and Speaker-Normalization,''IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Munich, Germany), pp. 983-986, May 1997.
S. Umesh, L. Cohen, N. Marinovic, and D. J. Nelson [1996]: ``Frequency-Warping in Speech,'' in Proc. International Conference on Spoken Language Processing, (ICSLP Philadelphia,USA), pp. 414-417, October 1996.
S. Umesh and D. W. Tufts [1996]: ``Estimation of Parameters of Multiple Exponentially Damped Sinusoids using Fast Maximum Likelihood Estimation with Application to NMR Spectroscopy Data,'' IEEE Trans. Signal Processing, vol. 44, no. 9, pp.2245-2259, Sept. 1996.
S. Umesh, L. Cohen, N. Marinovic, and D. J. Nelson [1996]: ``Psychoacoustic-Frequency Scales versus Frequency-Warping in Scale cepstrum ,'' in Proc. SPIE Conference on Wavelet Applications in Signal & Image Proc. , Vol. 2825, pp. 530-539, July 1996.
S. Umesh and D. J. Nelson [1996]: ``Computationally Efficient Estimation of Sinusoidal Frequency at low SNR,'' in Proc. IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Atlanta, USA), pp. 2797-2800, May 1996.
L. Cohen, N. Marinovic, S. Umesh, and D. Nelson [1995]: ``Scale-Invariant Speech Analysis via joint time-frequency-scale processing,'' in Proc. SPIE Conference on Wavelet Applications in Signal & Image Proc., Vol. 2569, pp. 522-537, July 1995.
N. Marinovic, L. Cohen, S. Umesh, and D. Nelson [1995]: ``Classification of Digital Modulation Types,'' in Proc. SPIE Conference on Advanced Signal Processing Algorithms, vol. SPIE-2563, (San Diego, USA), pp. 125-143, July 1995.
N. Marinovic, L. Cohen, and S. Umesh [1994]: ``Joint Representations in Time and Frequency Scale for Harmonic Type Signals,'' in Proc. IEEE-SP International Symposium on T-F and T-S Representations, (Philadelphia, PA), pp. 84-87, October 1994.
N. Marinovic, L. Cohen, and S. Umesh [1994]: ``Scale and Harmonic Signal Analysis,'' in Proc. International Society of Optical Engineering Conference on Wavelet Applications in Signal & Image Proc., Vol. 2303, pp. 411-418, August 1994.
D. W. Tufts, H. Ge, and S. Umesh [1993]: ``Fast Maximum Likelihood Estimation of Signal Parameters using the Shape of the Compressed Likelihood Function,'' IEEE Journal of Oceanic Engg., Vol. 18, no. 4, pp. 388-400, Oct. 1993.
(Invited Paper).
E. Wilson, S. Umesh, and D. W. Tufts [1993]: ``Multistage Neural Network Structure for Transient Detection and Feature Extraction,'' in Proc. IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP Minneapolis, USA), pp. 489-492, April 1993.
E. Wilson, S. Umesh, and D. W. Tufts [1992]: ``Designing a Neural Network Structure for Transient Detection Using the Subspace Inhibition Filter Algorithm,'' in Proc. IEEE Oceans '92, pp. 120-125 (Newport, USA), Oct. 1992.
E. Wilson, S. Umesh, and D. W. Tufts [1992]: ``Resolving the Components of Transient Signals Using the Neural Network and Subspace Inhibition Filter Algorithms,'' in Proc. International Joint Conference on Neural Networks, (Baltimore, USA), pp. 283-288, June 1992.
S. Umesh and D. W. Tufts [1992]: ``Resolving the Components of Transient Signals by a Multistage Procedure,'' in Proc. IEEE International Conference on Acoust. Speech, Signal Processing, (ICASSP San Francisco, USA), pp. 553-556, March 1992.
G. F. Boudreaux-Bartels, D. W. Tufts, and S. Umesh [1991]: ``On Improving the Detection of Gabor Components.,'' in Proc. of Mini ASSP Conference, (Boston, USA), April 1991.
(B): Technical Workshop Presentations
S. Tranter and S. Umesh [2004]: ``Diarisation Research at CUED,'' Meta-Data Evaluation (MDE) Technical Meeting of U.S. ARPA's Effecti ve Affordable Reusable Speeech (EARS) Project, (Boston, USA), May 2004
D.Y. Kim, M.J.F. Gales, H.Y.Chan, P.C. Woodland, S. Umesh and T. Hain [2004]: ``Progress in Broadcast News English Transcription,'' Speech-to-Text (STT) Workshop of ARPA's EARS Project, (Montreal, Canada), May 2004
(C): Invited Talks
S. Umesh [2007]: ``Introduction to Large Vocabulary Continuous Speech Recognition'' National Conference on Communications, (IIT-Kanpur), Jan.200
S. Umesh [2006]: ``Statistical Fundamentals for Speech Recognition'' Winter School on Speech & Audio Processing (WISSAP-06), (IISc., Bangalore), Jan. 2006
S. Umesh [2005]: ``Large Vocabulary Continuous Speech Recognition,'' International Conference on Natural Language Processing, (IIT, Kanpur), Dec. 2005
(D): Books
Ajit K. Chaturvedi, Srinivasan Umesh, Adrish Banerjee, Kameswari Chebrolu, Joseph John, Ayyangar R. Harish (Editors): Proceedings of the Thirteenth National Conference on Communications, I.I.T. Kanpur, 26-28 January 2007. ISBN Number: 978-81-904444-0-8