OBS localization - organization of work

Corwin · Jan 15, 2013

Hello! I am someone who translates the program into russian. I ran into a little problem. I am a bit uncomfortable doing this work. I have to find a new line of software interface and then add russian localization to ru.txt. Compare two txt file (en.txt and ru.txt) - which is too long.

I am proposing to organize an online platform for translators. For example, here I am helping a German developer:

This is a simple table of Google Docs. Author of the document adds a new line - then the translators fill their units.
What are your suggestions?

micechal · Jan 15, 2013

I like your idea. Right now, it's very hard to track changes made to the localization files. There are some free Content Management Systems about keeping it easy. Maybe Warchamp7 could create a subpage on the obsproject.com site so we can change it more easily :)

Muf · Jan 17, 2013

A spreadsheet isn't a bad idea. You can export to a CSV file (maybe with something other than a comma as a separating value - not sure if Google Drive allows this), and then have a shell script of some kind turn them into locale files.

micechal · Jan 30, 2013

Maybe you could use something like Pootle? This one is advanced, but maybe there is something lighter, for projects with less files to translate :)

Corwin · Apr 3, 2013

Something like that. If Jim will like it's - I'll give him the file and it will distribute rights to all translators:
http://goo.gl/7hkFu

Lain · Apr 5, 2013

Okay, listen. I need someone's help with this. I will always add translation files people send me, but hypothetical situation here: let's say we used a spreadsheet, there are 20 new translation items, and there are countless languages to go through. Let's say 10 languages. That's 200 items across 10 files. People add it onto the spreadsheet.. that's great, but how would they get it off the spreadsheet and into the translation files? We can't just copy and paste each new item for each new language one by one. I am -not- going to do that. I don't have time. There would have to be some sort of tool to convert it to translation files, but I just do not have time to do that right now.

I seriously have so many things I have to do and it's admittedly frustrating. Translators themselves currently can access github and do pull requests literally any time and they will be merged into the project with a simple press of a button, and they can send the new translation files to me and I will upload them directly as well. But other than that, I don't know what to do. I know you guys need help, but please help me too. I need to be in the code, not digging inside the translation files. What am I supposed to do?

R1CH · Apr 5, 2013

I feel like a web based system would work well. Pulls strings to be translated automatically from the latest git and such and outputs in the exact format needed.

Corwin · Apr 5, 2013

How to automate these steps? https://www.youtube.com/watch?v=jgAtiNVhvhQ

pystub · Apr 5, 2013

You can use Python, for example (only one of the options). I think even some JavaScript can be used to extract the translation file straight from the GDocs. I'll try to come up with something real quick.

Edit: So Google Docs helpfully removes the invisible rows from the HTML. JS approach kind of goes out the window, because it will take some time to make it really automated

Code:

var someExporter = {}; //don't litter the global scope!
someExporter.table = document.getElementById ('0-grid-table-quadrantscrollable');
someExporter.tbody = null;
for (var i = 0; i < someExporter.table.childNodes.length; i++) {
    if (someExporter.table.childNodes[i].nodeName == 'TBODY') {
        someExporter.tbody = someExporter.table.childNodes[i];
        break;
    }
}
if (!someExporter.tbody)
    throw new Error ('there is no body!');
for (var i = 7; i < someExporter.tbody.childNodes.length; i++) {
    console.log (
        someExporter.tbody.childNodes[i].childNodes[1].innerHTML +
        "\t" +
        someExporter.tbody.childNodes[i].childNodes[3].innerHTML
    );
}

pystub · Apr 13, 2013

Here is some Python, Tested with version 3.3

Code:

import csv
import re

with open ('Sheet1.csv') as f:
	reader = csv.reader (f)

	langRe = re.compile (r'^([A-Za-z]{2})\b.*') # here we control what in the first row triggers generation of a translation and what it will be named
	translations = []
	columnIndex = 0
	for column in next (reader): # pick the first row and run through every cell
		result = langRe.search (column)
		if result:
			lang = result.group (1)
			translations.append ({'file': open ('{0}.txt'.format(lang), 'w'), 'index': columnIndex})
		columnIndex += 1

	for row in reader: # section to discard comment rows. should be possible to merge to main loop, but this makes it more clear... i think
		haveAll = True
		for t in translations:
			if row[t['index']] == '':
				haveAll = False
		if haveAll:
			break

	for row in reader:
		for t in translations:
			if row[0] == '':
				t['file'].write ('\n')
			else:
				t['file'].write ("{0}\t{1}\n".format (row[0], row[t['index']]))

Save it in a, let's say, "convert.py" next to the csv which you should name "Sheet1.csv".

Of course, this does imply that you have to download the csv from the Google Drive every time, but it should help some. I'm still trying to wrap my head around the Google's API

Lain · Apr 18, 2013

Corwin pointed me here the other day though I haven't really had time to give it a good look over.. I might just have to make an application to make this easier for users and make it use git/github so users can make pull requests and update the repository directly

Corwin · Jun 7, 2013

pystub said:
I'm still trying to wrap my head around the Google's API

What news?

dodgepong · Jul 10, 2013

Have we considered using CrowdIn for crowdsourced translation? It's free for Open Source projects: http://crowdin.net/

ColterTV · Jul 11, 2013

I'll be around reading whats the decision made about this, I've been the spanish translator so far (ColterTV)

dodgepong · Aug 1, 2013

I created a CrowdIn project here: http://crowdin.net/project/obsproject

I haven't uploaded all the strings from all the existing localizations yet (I believe German, Swedish, and Spanish are the only ones I've uploaded all the existing strings for), but after playing with CrowdIn last night, I think it's definitely the way to go. It shows how complete each translation is, and gives an easy interface for translators to translate each string without having to know how the resulting locale file is formatted. It lets translators discuss each translastion if they have questions, and lets you set up a glossary of terms to define words more clearly to help in translating them. It has a nice, simple proofreading process with varying levels of permissions (translator, proofreader, and manager) and it is trivial to add more languages to translate into. Also, I got it approved as an Open Source project, to it is free for unlimited use!

The only downside is that it requires us to change the format of the locale file from

Identifier "translation"

to

Identifier="translation"

and change the extension from .txt to .ini. (Basically convert the locale file into an INI and use INI format)

Personally, I think it's worth it, but it's up to the translators and Jim. So let me know what you think!

dodgepong · Aug 1, 2013

ColterTV brought up a good point in IRC that sometimes the word is not enough to know how to translate it; sometimes you need context. For example, there is a string called "Order" in the main application file, with the identifier "Order". It's hard to know what that means just by looking at it.

CrowdIn has a feature that lets you tag each string with a "context", which lets you give a brief explanation of how that word is used. Then we could say the context for "Order" is that it's the menu item that lets you select how to order sources, not an Order as in a command, and not as in a request to purchase something.

There are 2 ways I can see to do this: we can either include the context in the locale file itself and make the file a CSV (Identifier,"translation",Context), or edit Contexts on CrowdIn itself as needed (and keep using the INI format) and keep Contexts out of the locale files.

My inclination would be to keep the INI files and just use CrowdIn to specify contexts as needed, but I'd liek to hear anyone else's opinions, too.

dodgepong · Aug 2, 2013

I have uploaded the Japanese, Greek, and Traditional Chinese localizations to CrowdIn. I'm not sure about the Chinese, though, since from what I can tell, "tw" is supposed to be used for the Twi language, according to ISO 639-1, but we seem to be using it for "Taiwan". I think this is a problem with using 2-letter language codes instead of 2-letter region codes, since Simplified Chinese is zh_CN and Traditional is zh_TW.

More discussion here: http://stackoverflow.com/questions/4892 ... al-chinese

ColterTV · Aug 2, 2013

Awesome, I've been translating more strings for the spanish one, I think I'll have 100% soon, I'm missing some strings yet as I need to see their context

dodgepong · Aug 3, 2013

Oftentimes the Identifier can give a clue as to where you can find the string used in the program, so you might be able to use that to find where a string is used to understand the context.

Having said that, I would like to start compiling a list of tough strings to translate which would benefit from a more thorough explanation of context, and I can try to go through them and find where they are used and update CrowdIn with their context.

In other news, I have finished uploading the rest of the translations. As far as I can tell, everything that is in Github and OBS itself is now on CrowdIn, save for the HTML help files. I haven't yet decided if we want to bother putting those up...I suppose it couldn't hurt.

OrionRBR · Aug 3, 2013

Hey its nice to have the users to translate the program,i just finish the translation for brazilian portuguese today its just needs approval for the translations

OBS localization - organization of work

Corwin

New Member

micechal

Member

Muf

micechal

Member

Corwin

New Member

Lain

R1CH

Corwin

New Member

pystub

New Member

pystub

New Member

Lain

Corwin

New Member

dodgepong

Administrator

ColterTV

Member

dodgepong

Administrator

dodgepong

Administrator

dodgepong

Administrator

ColterTV

Member

dodgepong

Administrator

OrionRBR

New Member