OmegaT
Encyclopedia : O : OM : OME : OmegaT
OmegaT is a computer-assisted translation tool written in the Java programming language. It is free software originally developed by Keith Godfrey in 2000, and is currently developed by a team led by Maxym Mykhalchuk.
OmegaT is intended for professional translators. Some of its features include user-customisable segmentation using regular expressions, translation memory, fuzzy matching, match propagation, glossary matching, context search in translation memories and keyword search in reference materials.
It requires Java 1.4, which is available for Linux, Mac OS X and Microsoft Windows 98 or higher.
History
OmegaT was first developed by Keith Godfrey in 2000. The original engine was written in C++, but the first public release in February 2001 was written in Java.
The first Java version used a proprietary translation memory format, and required Java 1.3 to run. It offered support for StarOffice documents, so-called plain text and Unicode text, and HTML, and could do only block-level segmentation (which for most practical purposes meant paragraph segmentation).
The current stable version is 1.4.5, which supports more source document formats and has a cleaner user interface, but can still do only block-level segmentation. The current release-candidate version, 1.6 RC 10, has many additional features and much less bugs. Additional features include flexible segmentation rules which makes sentence segmentation possible and DocBook support among others.
Workflow in OmegaT
The user places source documents, existing translation memories and any glossaries in specified subfolders of a translation project. When a project is "opened", OmegaT extracts the translatable text from all recognised documents. As the translator translates each segment, OmegaT adds the translation units to a translation memory. Finally, OmegaT creates the target documents by merging the translation memory with the source documents.
During translation, fuzzy matches from the translation memory and matches from the glossaries for the current segment are displayed in the adjacent Match/Glossary window. Fuzzy matches are inserted by the translator using keyboard shortcuts. Fuzzy matches above a user-determined threshold can optionally be inserted automatically.
The translator can switch to a different document in the same project at any time using the Project Files viewer, or to a different segment in the same file using keyboard shortcuts or by double-clicking the appropriate segment.
Whenever additional source documents, translation memories or glossaries are added to the project, or when manual changes are made to those files, the translator must "reload" the project, so that OmegaT recognises the newly added segments. The project must also be reloaded when changes to the segmentation rules are made in mid-translation.
Collaboration between translators
Translators using different computer assisted translation tools can only share their translation memories if (a) either or both their respective programs can import and/or export the other program's proprietary format or (b) both programs can import and export an intermediary format. OmegaT does the latter. It can import and export the industry standard intermediary format TMX (Translation Memory eXchange).
OmegaT's glossary files are tab-delimited plain text files with the source term in the first column and the target text in the second column. Additional columns are ignored by OmegaT, and can be used for anything (e.g. user comments). OmegaT does not support the industry standard glossary format TBX proposed by LISA[LISA] - Localization Industry Standards Association.
Supported source document formats
OmegaT can translate the following formats: text files (any text format which Java can handle) encoded in a variety of encodings including Unicode, HTML/XHTML, Java properties files, StarOffice, OpenOffice.org and OpenDocument, as well as DocBook files, PO files and files with a "Key=Value" structure. It handles formatted documents using tagged text in a way which is similar to that of other commercial translation memory tools.
It does not offer direct support for Microsoft Office formats Word, Excel and PowerPoint. However, Microsoft Office documents can be converted near-perfectly to and from OpenDocument formats using OpenOffice.org 2.0[OpenOffice.org] - A free office suite that offers conversion filters to and from most of hte commonly used Microsoft Office file formats, which OmegaT supports natively.
Files formats such as LaTeX, TeX, POD etc can be converted to and from PO using the po4a utility[po4a] - A conversion utility to and from the Portable Object format.
OmegaT does not officially support file formats such as WordML, ExcelML, and Latex, or localisation formats such Trados uncleaned files or XLIFF files, but these can be translated losslessly in OmegaT by tweaking the segmentation rules or trivial modifications to the code.
Supported memory and glossary formats
OmegaT can import and export TMX version 1.4b level 1 (in other words, it preserves textual information but not formatting mark-up). For glossaries, it uses tab-delimited plain text files.
OmegaT's internal translation memory format is not visible to the user, but every time it autosaves the translation project (which is once every five segments), all new or updated translation units are automatically exported and added to two external TMX memories.
One of these is not standards-compliant, and for use by OmegaT users, because it contains OmegaT's own formatting markup syntax for tagged documents. The other is standards-compliant, and does not contain any formatting markup, because it is intended for exchange with users of other translation memory tools.
Documentation
When OmegaT starts, a quick guide called "Instant Start" is displayed. A comprehensive User Manual, originally by Marc Prior, is bundled with the OmegaT installation. Both of these have been translated into several languages by volunteers. Marc Prior also wrote the ASAD Manual which contains additional information for advanced users (contains much outdated material for previous versions of OmegaT). Finally, the archived messages of OmegaT's user groups are searchable by anyone without registration.
Development and localisation
Code development is currently handled by a team lead by Maxym Mykhalchuk. Other current code contributors include Sacha Chua, Kim Bruning, Henry Pijffers and Benjamin Siband. The developers respond to bug reports and requests for enhancements.
The OmegaT user interface and bundled documentation is translated by volunteers. Current localisations of version 1.4.5 include:
- Belorussian
- Esperanto
- French
- German
- Italian
- Japanese
- Russian
- Spanish
Related software
Several tools have been created by third parties which can be used in conjunction with OmegaT, some of which are only useful with previous versions of OmegaT.
Third-party tools listed on the OmegaT web site[OmegaT Resources] - Third-party tools on the OmegaT web site or the user mailing list web space[OmegaT Files] - Third-party tools on the OmegaT user mailing list web space (registration required) include:
- Benjamin Siband's OpenOffice.org segmenter macros
- Didier Briel's aligner utility
- Dmitri Gabinski's aligner utility
- Dmitri's Wordfast TMX converter
- Dmitri's language selector
- Henry Pijffers's TMX merger tool
- Henry's TMX cleaner tool
- Marc Prior's external spell-checker
- Marc's sentence segmenter tool
- Samuel Murray's two spell-checker scripts
- Samuel's complicated utility for creating Trados uncleaned files
- Sonja Tomaskovic's macro for removing internal tags from TMX files
Mikel Forcada and Susana Santos's aligner, bitext2tmx, can also be used in conjunction with OmegaT[bitext2tmx] - Aligner written in Java by Mikel Forcada and Susana Santos.
There is a fork of OmegaT 1.4.5 called omegat, which is part of the omega t+ collection of translation tools[omega t+] - Translation tools collection containing an OmegaT fork. The application currently supports a relatively large subset of OmegaT 1.4.5's functions although its version number is 1.4.6 at the time of writing. Development seems to have plans to implement functions that are not on OmegaT's roadmap. The collection of translation tools is mostly a repackaging of a number of the above helper applications.
See also
External links
- [OmegaT Home] - Official OmegaT web site
- [Project: OmegaT] - OmegaT's SourceForge project page
User groups
- [omegat@yahoogroups.com] - User mailing list (archives searchable without subscription)
- [WeSolveIt] - Sabine Cretella's discussion board.
References
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.
