Noam Ordan — NLP & Linguistics Consultant

I help teams evaluate, annotate, and ship language models for Hebrew and Arabic. Twenty years in computational linguistics and translation studies, with a focus on bringing linguistic insight to bear on engineering decisions.

01 What I do

LLM evaluation
Evaluation protocols, dataset curation, and model-quality assessment — recently for state-sponsored Hebrew and Arabic projects with Google and PwC.
Annotation & corpora
Annotated resources for NER, summarization, entity linking, and human–LLM hybrid pipelines.
Linguistic advising
Translating fuzzy linguistic phenomena into things engineers can act on.

02 Selected engagements

2025–: Open-LLM evaluation, Hebrew & Arabic Freelance — Google, PwC
2021–24: NLP technology lead IAHLT — Hebrew & Arabic NLP tools and corpora
2018–20: R&D advisor EDT Software — active learning for legal discovery, PII detection

03 Background

Most of my published research concerns translationese — the systematic features that distinguish translated text from text written natively. PhD in Translation Studies (Bar-Ilan, 2011); MA in Philosophy of Science (Tel Aviv). Lectured and held research fellowships at Haifa, Saarland, and the Arab Academic College of Education.

04 Selected publications

2017: Found in Translation: Reconstructing Phylogenetic Language Trees from Translations ACL
The fingerprint a source language leaves on translated English is strong enough to recover language-family trees from translations alone — phylogeny without ever looking at the originals.
2016: On the Similarities Between Native, Non-native and Translated Texts ACL
Translated and non-native English share more than either shares with native English: both are lexically thinner, lean on explicit cohesion, and use fewer pronouns. A common second-pass fingerprint.
2015: On the Features of Translationese Digital Scholarship in the Humanities
A computational audit of the translation universals. Across ten source languages in Europarl, some classic features of translationese hold up; others collapse to chance.
2012: Language Models for Machine Translation: Original vs. Translated Texts Computational Linguistics
A counter-intuitive result: language models trained on translations beat models trained on originals for MT — because MT output already lives in the same dialect as human translation.
2011: Translationese and Its Dialects ACL
Translations differ from originals in two distinct ways — source-language interference and general translation effects. Function-word frequencies alone recover the source language with 92.7% accuracy.

Full list on Google Scholar. Code on GitHub.

05 Contact

[email protected]