Noam Ordan
Linguistics & NLP consultant — Hebrew & Arabic.
I help teams evaluate, annotate, and ship language models for Hebrew and Arabic. Twenty years in computational linguistics and translation studies, with a focus on bringing linguistic insight to bear on engineering decisions.
01 What I do
-
LLM evaluation
Evaluation protocols, dataset curation, and model-quality assessment — recently for state-sponsored Hebrew and Arabic projects with Google and PwC.
-
Annotation & corpora
Annotated resources for NER, summarization, entity linking, and human–LLM hybrid pipelines.
-
Linguistic advising
Translating fuzzy linguistic phenomena into things engineers can act on.
02 Selected engagements
- 2025–
- Open-LLM evaluation, Hebrew & Arabic Freelance — Google, PwC
- 2021–24
- NLP technology lead IAHLT — Hebrew & Arabic NLP tools and corpora
- 2018–20
- R&D advisor EDT Software — active learning for legal discovery, PII detection
03 Background
Most of my published research concerns translationese — the systematic features that distinguish translated text from text written natively. PhD in Translation Studies (Bar-Ilan, 2011); MA in Philosophy of Science (Tel Aviv). Lectured and held research fellowships at Haifa, Saarland, and the Arab Academic College of Education.
04 Selected publications
- 2017
-
Found in Translation: Reconstructing Phylogenetic Language Trees from Translations
ACL
The fingerprint a source language leaves on translated English is strong enough to recover language-family trees from translations alone — phylogeny without ever looking at the originals.
- 2016
-
On the Similarities Between Native, Non-native and Translated Texts
ACL
Translated and non-native English share more than either shares with native English: both are lexically thinner, lean on explicit cohesion, and use fewer pronouns. A common second-pass fingerprint.
- 2015
-
On the Features of Translationese
Digital Scholarship in the Humanities
A computational audit of the translation universals. Across ten source languages in Europarl, some classic features of translationese hold up; others collapse to chance.
- 2012
-
Language Models for Machine Translation: Original vs. Translated Texts
Computational Linguistics
A counter-intuitive result: language models trained on translations beat models trained on originals for MT — because MT output already lives in the same dialect as human translation.
- 2011
-
Translationese and Its Dialects
ACL
Translations differ from originals in two distinct ways — source-language interference and general translation effects. Function-word frequencies alone recover the source language with 92.7% accuracy.
Full list on Google Scholar. Code on GitHub.