OpenOffice.org is the leading cross-platform Office suite. Its a large project and a large localisation undertaking, but it is an important component of a localised desktop.
Microsoft defines LCIDs for various locales. You need to know this so that OpenOffice.org can work well on Windows and also so that documents you create can move seamlessly between MS Word and Office Writer as the language identifier is correct.
There are a number of places that you can use to identify the LCID. For most languages they will all agree but in some cases (See 1072/Sutu/Sesotho) it helps to look at all list to help clarify what exactly Microsoft meant.
This is a very large application. If you can do a smaller section of the total and still have a useful product then that will help. We created this rough targeting guide using OpenOffice.org 1.1.3 and podebug
Read the localisation documentation on the OpenOffice.org website: http://wiki.services.openoffice.org/wiki/Category:Localization
Things are now very easy since they are using Pootle. You can translate online on Pootle, or download the files to work offline with something like Virtaal.
The OpenOffice.org guys have a tool for checking the SDF file called gsicheck. But of course you don’t want to build the whole of OpenOffice.org simply to get one tool. pofilter will pick up most errors that gsicheck does but its nice to know that your SDF is good before submitting it. Read more and download from the OpenOffice.org website:
http://wiki.services.openoffice.org/wiki/Gsicheck
Then install it and use it
tar xvzf gsicheck-1.7.8_2.0m122.tar.gz
cd gsicheck-1.7.8_2.0m122
./gsicheck -c <GSI/SDF file>
Now go and fix the errors that it detected. You should correct these in your PO files.
The OpenOffice.org AutoCorrect file is a zip file called for example, acor_en-US.dat. Søren Thing Pedersen has created csv2acor.py which generates an AutoCorrect file from CSV sources.
The autocorrect file contains 3 XML files:
When using csv2acor.py your need to have 3 files with the same name as above but with a .csv file extension. WordExceptList.csv and SentenceExceptList.csv contain just a list of entries one per line surrounded by double quotes (“). DocumentList.csv is a comma separated list with the mistyped word in the first column and the correct word in the second column, all also surrounded by double quotes.
The translation program Virtaal also makes use of these files, so consider contributing it to this project as well.
If you have an existing spell checking wordlist then use the following to extract potential words:
egrep "^[A-Z][A-Z][a-z]" spell-wordlist > WordExceptList.new
This extracts all words that start with two capitals followed by a lower case letter. Add all the characters valid in your language.
If you have an existing spell checking wordlist then use the following to extract potential words:
egrep "\.$" spell-wordlist > SentenceExceptList.new
This extracts all entries that end in a fullstop.
If you have an existing DocumentList.xml you can convert it to CSV using the following:
sed "s/<block-list:block block-list:abbreviated-name=\"/\"\\n\"/g;s/\" block-list:name=\"/\",\"/g;s/\"\/>//g" < DocumentList.xml > DocumentList.csv
Your’ll need to edit DocumentList.csv to remove some of the remaining XML data.
A cleaner method is to use the following XSLT – this way you don’t have to clean any XML data (so this is suitable for batch mode):
<?xml version="1.0" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"
xmlns:block-list="http://openoffice.org/2001/block-list">
<xsl:output method="text" encoding="utf-8"/>
<xsl:template match="//block-list:block">
<xsl:text>"</xsl:text>
<xsl:value-of select="@block-list:abbreviated-name"/>
<xsl:text>"</xsl:text>
<xsl:text>,</xsl:text>
<xsl:text>"</xsl:text>
<xsl:value-of select="@block-list:name"/>
<xsl:text>"</xsl:text>
<xsl:text>
</xsl:text>
</xsl:template>
</xsl:stylesheet>
</xml>
Run this script through any XSLT processor, e.g., for Saxon, type:
java -jar saxon8.jar DocumentList.xml <name-of-xslt> >DocumentList.new
Then run csv2acor.py acor_xx-YY.dat where xx-YY is your language and country code.
In order to add your spell checker and hyphenation file to OpenOffice.org CVS you need to do the following:
Looks like a StarBasic program that allows you to specify holidays, etc. FIXME need to check this more carefully
OpenOffice developers use what they call child workspaces to make fixes and commit changes. These are usually linked to related bugs in IssueZilla.
Here some instructions to help you track your changes and see if they have been integrated/fixed:
Now you see which l10n CWS have been integrated and which not. By clicking on the CWS name you see the list of the bugs registered to that CWS. Once approved by QA you’ll exactly know in which milestone the CWS has been integrated.