Talk:Bitext word alignment

External links modified
Hello fellow Wikipedians,

I have just modified 2 one external links on Bitext word alignment. Please take a moment to review my edit. If you have any questions, or need the bot to ignore the links, or the page altogether, please visit this simple FaQ for additional information. I made the following changes:
 * Added archive https://web.archive.org/web/20090424054417/http://acl.ldc.upenn.edu:80/J/J93/J93-2003.pdf to http://acl.ldc.upenn.edu/J/J93/J93-2003.pdf
 * Added archive https://web.archive.org/web/20090509231708/http://www.cse.unt.edu:80/~rada/wpt05/ to http://www.cse.unt.edu/~rada/wpt05/

When you have finished reviewing my changes, please set the checked parameter below to true or failed to let others know (documentation at ).

Cheers.— InternetArchiveBot  (Report bug) 09:25, 3 November 2016 (UTC)

Terribly dated
The article describes a start of affairs from around 10 years ago and is thus quite misleading. Relevant developments include


 * FastAlign (fast_align, yet another IBM-2 implementation, but easier to use and hence more popular than GIZA++ nowadays)
 * Chris Dyer, Victor Chahuneau, and Noah A Smith. 2013. A simple, fast, and effective reparameterization of IBM model 2. In Proc. of NAACL-HLT, pages 644–648.
 * neural alignment
 * Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio. 2015. Neural machine translation by jointly learning to align and translate. In 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
 * Ho, Anh Khoa Ngo, and François Yvon. "Neural Baselines for Word Alignments." In International Workshop on Spoken Language Translation. 2019
 * Ferrando, Javier and Marta R. Costa-jussà. 2021. Attention weights in transformer NMT fail aligning words between sequences but largely explain model predictions. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 434–443, Association for Computational Linguistics, Punta Cana, Dominican Republic.
 * Jalili Sabet, Masoud, Philipp Dufter, François Yvon, and Hinrich Schütze. 2020. SimAlign: High quality word alignments without parallel training data using static and contextualized embeddings. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 1627–1643, Association for Computational Linguistics, Online
 * Dou, Zi-Yi and Graham Neubig. 2021. Word alignment by fine-tuning embeddings on parallel corpora. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 2112–2128, Association for Computational Linguistics, Online.

If I have the time, I could imagine to work that into the article, ... but this might take a while. At least I wanted to leave some pointers for others to start from ;)

The other issue of the current article is that the implementations listed under "Software" actually perform very different tasks and need to be classified as such. HunAlign is for sentence alignment, the IBM models are for word alignment, and Anymalign is for dictionary induction. Chiarcos (talk) 09:41, 1 November 2023 (UTC)