Improved training of end-to-end attention models for speech recognition A Zeyer, K Irie, R Schlüter, H Ney Interspeech 2018, 2018 | 291 | 2018 |

RWTH ASR Systems for LibriSpeech: Hybrid vs Attention--w/o Data Augmentation C Lüscher, E Beck, K Irie, M Kitza, W Michel, A Zeyer, R Schlüter, H Ney Interspeech 2019, 2019 | 275 | 2019 |

A Comparison of Transformer and LSTM Encoder Decoder Models for ASR A Zeyer, P Bahar, K Irie, R Schlüter, H Ney ASRU 2019, 2019 | 186 | 2019 |

Lingvo: a modular and scalable framework for sequence-to-sequence modeling J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ... Preprint arXiv:1902.08295, 2019 | 184 | 2019 |

Language modeling with deep transformers K Irie, A Zeyer, R Schlüter, H Ney Interspeech 2019, 2019 | 178 | 2019 |

Linear transformers are secretly fast weight programmers I Schlag*, K Irie*, J Schmidhuber ICML 2021, 2021 | 128* | 2021 |

LSTM, GRU, highway and a bit of attention: an empirical overview for language modeling in speech recognition K Irie, Z Tuske, T Alkhouli, R Schluter, H Ney Interspeech 2016, 2016 | 98 | 2016 |

The devil is in the detail: Simple tricks improve systematic generalization of transformers R Csordás, K Irie, J Schmidhuber EMNLP 2021, 2021 | 90 | 2021 |

On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition K Irie, R Prabhavalkar, A Kannan, A Bruguier, D Rybach, P Nguyen Interspeech 2019, 2019 | 69* | 2019 |

The RWTH/UPB/FORTH system combination for the 4th CHiME challenge evaluation T Menne, J Heymann, A Alexandridis, K Irie, A Zeyer, M Kitza, P Golik, ... CHiME 2016, 2016 | 50 | 2016 |

The RWTH ASR System for TED-LIUM Release 2: Improving Hybrid HMM with SpecAugment W Zhou, W Michel, K Irie, M Kitza, R Schlüter, H Ney ICASSP 2020, 2020 | 46 | 2020 |

Going beyond linear transformers with recurrent fast weight programmers K Irie*, I Schlag*, R Csordás, J Schmidhuber NeurIPS 2021, 2021 | 45 | 2021 |

Training language models for long-span cross-sentence evaluation K Irie, A Zeyer, R Schlüter, H Ney ASRU 2019, 2019 | 43 | 2019 |

RADMM: Recurrent Adaptive Mixture Model with Applications to Domain Robust Language Modeling K Irie, S Kumar, M Nirschl, H Liao ICASSP 2018, 2018 | 40 | 2018 |

The Neural Data Router: Adaptive control flow in Transformers improves systematic generalization R Csordás, K Irie, J Schmidhuber ICLR 2022, 2021 | 31 | 2021 |

A Modern Self-Referential Weight Matrix That Learns to Modify Itself K Irie, I Schlag, R Csordás, J Schmidhuber ICML 2022, 2022 | 21 | 2022 |

On efficient training of word classes and their application to recurrent neural network language models R Botros, K Irie, M Sundermeyer, H Ney Interspeech 2016, 2015 | 21 | 2015 |

Investigation on log-linear interpolation of multi-domain neural network language model Z Tüske, K Irie, R Schlüter, H Ney ICASSP 2016, 2016 | 20 | 2016 |

Prediction of LSTM-RNN Full Context States as a Subtask for N-gram Feedforward Language Models K Irie, Z Lei, R Schlüter, H Ney ICASSP 2018, 2018 | 18 | 2018 |

The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention K Irie*, R Csordás*, J Schmidhuber ICML 2022, 2022 | 17 | 2022 |