Translation, Machine Translation, and a Crowded Seminar

I gave in several places a talk entitled “Analytic and Probabilistic Properties of Boolean Functions.” This is a fairly large area so the talks can differ quite a bit. The lecture at the NYU CS theory seminar was described over a Chinese blog entitled (according to the automatic translation) “Notes from Math Library”.  The posting is in Chinese and I used Google translation which gave me this:

Gil Kalai’s Seminar

Hebrew University of Jerusalem last Thursday, and Yale University professor Gil Kalai come Courant Institute speaking seminar. You may have some recollection of the name, because I have a Notes from “Test Your Intuition” in reference to his blog “Combinatorics and More”.

Usually during the same period of the “theory seminar”, where more than 15 people are already rare, but this time Prof. Kalai speaking seminar attracted 38 people to come. Usually there is space left in the room suddenly a full-house, there are several people sitting listening to underground.

Prof. Kalai gave a three guess one of the more accessible and interesting, you can talk about here…

and here is a small expert from the Hebrew translation:

בדרך כלל בתקופה המקבילה של סמינר “תיאוריה”, שבו יותר מ -15 אנשים כבר נדיר, אבל הפעם פרופ ‘קלעי מדבר הסמינר נמשך 38 אנשים לבוא. בדרך כלל יש מקום השמאלי בחדר פתאום הבית מלא, יש כמה אנשים יושבים האזנה תת קרקעית.

פרופ ‘נתן קלעי שלוש משער אחד נגיש יותר ומעניין, אתה יכול לדבר על זה כאן.

Perhaps you can look at the Chinese version and test how good the translations are.  One thing that I liked about the English translation is that it refers to my conjectures as “guesses”. Certainly, this shows that the Google-translation involves deep understanding. (Here is a blog description of a similar lecture at LA.)  One of the “guesses” refers to old conjectures by Itai Benjamini, Oded Schramm and myself about noise stability of monotone Boolean functions in TC0. I will discuss these conjectures in a later post.

So what are these people listening to the talk  “underground” about? I will explain just after the dividing line.

Giving lectures to small audiences is certainly an important part of the business.  Once I gave a lecture to an initial audience of twelve. Five people were sleeping and some others were slowly slipping away. I saw an attractive woman sleeping and I thought to myself “at least she will not leave”. The next time I turned  from the blackboard, she was gone. One can take comfort from a famous story (unfortunately I forgot most of the details) about a physicist from Chicago who gave a course in a branch of the university located two hours away from town to an audience of two people. The happy ending of the story is that after some years  both students won Nobel prizes, even some years before the teacher himself did.

However, the NYU seminar was very crowded and at some point there was no place for additional chairs. There was a big table in the middle of the seminar room and two people listened to the lecture half  sitting half lying under the table.  Those were the “underground” listeners. You can see the head of one devoted “underground” listener in the pictures below.  It was certainly pleasant to see a large crowd, and having “underground” listeners was flattering and hilarious.

Pictures: Muli Safra

Is good machine translation possible?

The google translation was certainly useful but is a good quality machine translation possible? The answer is probably yes. But this is a very difficult, while exciting task.  Ambiguity (look  here) and, especially, ambiguity caused by the huge number of ways a sentence in a natural language can be parsed, is a major difficulty. Yehoshua Bar Hillel (whom we mentioned in this post about Chomskian linguistics) was a philosopher at the Hebrew University who wrote some important papers on the difficulty of machine translation more than fifty years ago. (See this paper.) I find this subject very exciting for various reasons. My father Hanoch Kalai was a translator and he translated from German, Yiddish and English into Hebrew. (He also translated his original Russian last name into the Hebrew name Kalai.) Another connection with translation is that when I came back to Jerusalem in the mid-’80s I worked as a consultant for several years in a machine-translation company called “Tovna” which in the end did not make it. At least some of the difficulties “Tovna” faced could have been  overcome today with the Internet serving as a huge source of data. But the problems of ambiguity of the sentence in the source language, and making the right choices in the target language are formidable.

Flush by Virginia Woolf  (translated to Hebrew by my father Hanoch Kalai). On top is the my father’s Hebrew translation to a poem by Rainer Maria Rilke.

1. Douglas Hofstadter wrote a long book on the subject of translation (including machine translation) called “Le ton beau de Marot”. The main thesis seems to be that one cannot expect machines to translate texts well without a “deep” understanding of the context of the text to be translated. The main examples of translations discussed in detail in the book are translations (by (wo)men/machines/etc.) of poems, especially one particular poem by a French Renaissance poet called Clement Marot. Hofstadter’s views on even human translators are highly idiosyncratic, but the book is interesting and thought-provoking (if occasionally irritating) nonetheless.

Conversely, it is interesting that when one has a “deep” understanding of context, one can sometimes overcome a deficient vocabulary or grammar. This is especially true in “microdomains” of knowledge. Hence, I can (with a lot of effort) understand a paper written in German (a language that I understand almost not at all) on geometry or topology (if the subject is close to my expertise), but not on number theory (actually, I have trouble with papers in English on number theory).

Some computer scientists claim that access to a sufficiently big and rich archive of examples will, by the use of eg. Bayesian filters, obviate the need for a translation program to “understand” the text to be translated; I confess that claims of this kind strike me as a little premature.

Thanks for the entertaining post and pics, Gil.

The professor at Chicago was Chandrasekhar, and the students Yang and Lee.

The wikipedia entry for translation http://en.wikipedia.org/wiki/Translation is an interesting source. Indeed, there are claims that poems cannot be “really” translated. There is an interesting debate if much statistics without much linguistics can lead to good machine translation. (As for “understanding” it is not clear how it can be formally defined. In fact, you can argue that the statistics and Bayesian filters lead to some sort of “understanding”)

I find the (Hebrew) eincyclopedia entry for machine translation hilarious.

It is based on an old version of the wikipedia article,
