Amirkeivan Mohtashami

Cited by

	All	Since 2019
Citations	275	273
h-index	8	8
i10-index	7	7

200

100

150

20212022202320243 14 60 194

Public access

View all

3 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Martin JaggiEPFLVerified email at epfl.ch
Sebastian Urban StichCISPA Helmholtz CenterVerified email at cispa.de
Matteo PagliardiniEPFLVerified email at epfl.ch
Dan AlistarhIST AustriaVerified email at ist.ac.at
Saleh AshkboosETH ZurichVerified email at inf.ethz.ch
Paul K RubensteinGoogle DeepMindVerified email at google.com
Florian HartmannGoogle ResearchVerified email at google.com
Matt SharifiGoogleVerified email at google.com
Mohammad RoghaniPhD student, Stanford UniversityVerified email at stanford.edu
Ehsan PajouheshgarPhD Student, EPFLVerified email at epfl.ch

Amirkeivan Mohtashami

EPFL

Verified email at epfl.ch

long context large language models efficient transformers neural network optimization


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Meditron-70b: Scaling medical pretraining for large language models Z Chen, AH Cano, A Romanou, A Bonnet, K Matoba, F Salvi, ... arXiv preprint arXiv:2311.16079, 2023	92	2023
Landmark Attention: Random-Access Infinite Context Length for Transformers A Mohtashami, M Jaggi Advances in Neural Information Processing Systems (NeurIPS) 2023, 2023	73*	2023
Masked Training of Neural Networks with Partial Gradients A Mohtashami, M Jaggi, SU Stich The 25th International Conference on Artificial Intelligence and Statistics, 2021	28*	2021
Critical parameters for scalable distributed learning with large batches and asynchronous updates S Stich, A Mohtashami, M Jaggi International Conference on Artificial Intelligence and Statistics, 4042-4050, 2021	20	2021
Characterizing & finding good data orderings for fast convergence of sequential gradient methods A Mohtashami, S Stich, M Jaggi arXiv preprint arXiv:2202.01838, 2022	13	2022
Quarot: Outlier-free 4-bit inference in rotated llms S Ashkboos, A Mohtashami, ML Croci, B Li, M Jaggi, D Alistarh, T Hoefler, ... arXiv preprint arXiv:2404.00456, 2024	12	2024
The splay-list: A distribution-adaptive concurrent skip-list V Aksenov, D Alistarh, A Drozdova, A Mohtashami 34th International Symposium on Distributed Computing 179, 2020	12	2020
Special Properties of Gradient Descent with Large Learning Rates A Mohtashami, M Jaggi, S Stich ICML 2023, 2022	9*	2022
epfllm megatron-llm, 2023 AH Cano, M Pagliardini, A Köpf, K Matoba, A Mohtashami, X Wang, ... URL https://github. com/epfLLM/Megatron-LLM, 0	6
Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models A Mohtashami, M Verzetti, PK Rubenstein Practical ML for Developing Countries Workshop @ ICLR 2023, 2023	5	2023
Social Learning: Towards Collaborative Learning with Large Language Models A Mohtashami, F Hartmann, S Gooding, L Zilka, M Sharifi, ... arXiv preprint arXiv:2312.11441, 2023	2	2023
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging M Pagliardini, A Mohtashami, F Fleuret, M Jaggi arXiv preprint arXiv:2402.02622, 2024	1	2024
CoTFormer: More Tokens With Attention Make Up For Less Depth A Mohtashami, M Pagliardini, M Jaggi Workshop on Advancing Neural Network Training @ NeurIPS 2023, 2023	1	2023
TPS (Task Preparation System): A Tool for Developing Tasks in Programming Contests K MIRJALALI, AK MOHTASHAMI, M ROGHANI, H ZARRABI-ZADEH	1	2019
Reproducibility Report for "On Warm-Starting Neural Network Training" A Mohtashami, E Pajouheshgar, K Kireev ML Reproducibility Challenge 2020, 2021		2021
MLO J Bachmann Ona, SA Bahreinian, LF Barba Flores, WA Ben Naceur, ...
A Gradient-Based Approach to Neural Networks Structure Learning AA Moinfar, A Mohtashami, M Soleymani, A Sharifi-Zarchi

The system can't perform the operation now. Try again later.

Articles 1–17

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors