Mikolaj Kegler, Ph.D.

Mikolaj Kegler, Ph.D.

Audio Machine Learning Scientist

Bose Research

About me

I am an Audio Machine Learning Scientist at Bose Corporation. My current research is centered around the development of novel ML-based methods for lightweight, on-device speech and audio signal processing, with a particular focus on speech enhancement and hearing augmentation.

Before, I was a PhD student in the Department of Bioengineering & Centre for Neurotechnology at Imperial College London (ICL). As a member of the Sensory Neuroengineering lab led by Prof. Tobias Reichenbach, my research focused on understanding neural mechanisms underlying perception and comprehension of natural speech, especially in challenging listening conditions. In my work, I combined computational modelling with neuroimaging and non-invasive brain stimulation to understand how natural speech is processed across human auditory pathways.

In addition to my PhD research, I also worked as an Applied Scientist Intern at Amazon - Lab 126, a Scientific Advisor at Logitech and a Consultant for clinical data analysis at INBRAIN Neuroelectronics.

For more details, see my CV, explore this website or get in touch!

Interests

  • Machine Learning
  • Speech Signal Processing
  • (Bio)Signal Processing
  • Computational Neuroscience
  • Auditory Cognitive Neuroscience

Education

  • PhD in Neurotechnology, 2022

    Imperial College London, UK

  • MRes in Neurotechnology, 2018

    Imperial College London, UK

  • MSc in Biomedical Engineering, 2017

    Imperial College London, UK

  • BEng in Biomedical Engineering, 2016

    Warsaw University of Technology, Poland

Publications

*-equal contribution

Also available on my Google Scholar profile.

(2023). Two-Step Knowledge Distillation for Tiny Speech Enhancement". Under review at ICASSP 2024.

PDF DOI

(2023). Self-Supervised Learning for Speech Enhancement through Synthesis. 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF DOI Demo

(2022). BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping. HEAR: Holistic Evaluation of Audio Representations (NeurIPS 2021 Competition) - PMLR.

PDF Code DOI

(2022). Hybrid Handcrafted and Learnable Audio Representation for Analysis of Speech Under Cognitive and Physical Load. Proc. Interspeech 2022.

PDF Code DOI

(2022). Mental Flow Estimation Through Wearable EEG. 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC).

PDF DOI

(2022). The neural response at the fundamental frequency of speech is modulated by word-level acoustic and linguistic information. Frontiers in Neuroscience.

PDF DOI

(2022). SERAB: A multi-lingual benchmark for speech emotion recognition. 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

PDF Code DOI

(2021). Effect of visual input on syllable parsing in a computational model of a neural microcircuit for speech processing. Journal of Neural Engineering.

PDF DOI

(2021). Word-Level Embeddings for Cross-Task Transfer Learning in Speech Processing. 2021 29th European Signal Processing Conference (EUSIPCO).

PDF Code DOI

(2020). Deep Speech Inpainting of Time-Frequency Masks. Proc. Interspeech 2020.

PDF DOI Demo

(2020). Hearing aids do not alter cortical entrainment to speech at audible levels in mild-to-moderately hearing-impaired subjects. Frontiers in Human Neuroscience.

PDF Dataset DOI

(2019). Decoding of selective attention to continuous speech from the human auditory brainstem response. NeuroImage.

PDF Code DOI

Projects

selected

More code on my GitHub.

*

Hybrid BYOL-S

(Code) Hybrid BYOL speech representation learning

SERAB

(Code) A multi-lingual benchmark for speech emotion recognition

Modelling the effects of tACS on speech processing

(Code) Computational model for the effect of non-invasive brain stimulation on speech in noise processing.

pyNSL

(Code) Python port of the NSL toolbox used for auditory modelling.

sPyEEG

(Code) Custom set of tools for EEG processing and analysis.

Speech-VGG

(Code) Transferable pre-trained feature extractor for speech processing.

Deep Speech Inpainting

(Demo) Algorithm for recovering missing or severely degraded parts of time-frequency representations of speech.

Complex TRF

(Code) Complex TRFs for modelling auditory brainstem responses to continuous speech from full-cap EEG

Fundamental waveform extraction

(Code) EMD-based algorithm for the extraction of F0 waveform from continuous speech. Maintained code. Original implementation by A.E. Forte.