High-tech federal agency trying to build a better translator

Sunday, November 5, 2006


The Associated Press

CAMBRIDGE, Mass. -- The past few years have shown that U.S. government intelligence goes only so far. One of the biggest challenges is recognizing vital information in foreign languages -- and acting quickly on it.

That's why the military would love software that can listen to TV broadcasts or phone conversations and read Web sites in Arabic and Chinese, translate them into English and summarize the key elements for humans.

But each of those steps has long bedeviled computer scientists. Perfecting them and combining them -- well, that is "DARPA hard." That means it's difficult even by the extreme standards of the Pentagon's next-generation technology arm, the Defense Advanced Research Projects Agency.

Last year DARPA launched a project that aims to create that real-time translation software. It's called GALE, for Global Autonomous Language Exploitation. It hired three teams of researchers to chase the problem for up to five years. Each year, their progress would be evaluated, and the worst-performing team could be eliminated. Or the program could be shut down entirely.

DARPA's three choices were IBM Corp., SRI International and BBN Technologies Inc.

GALE's goal is to deliver, by 2010, software that can almost instantly translate Arabic and Mandarin Chinese with 90 to 95 percent accuracy.

That might be impossible. Humans might not even be that precise. Consider all the ways we mishear each other, or fail to grasp idioms, or apply one subjective interpretation instead of another. Why else do new translations of "Don Quixote" keep emerging, 400 years after it was written?

Fortunately for the GALE teams, they didn't have to be near 95 percent right away. In the first year, they were expected to translate Arabic and Mandarin speech with 65 percent accuracy; with text the goal was 75 percent.

How hard was that? Before GALE, DARPA estimated that the best systems could translate foreign news stories at 55 percent accuracy. But DARPA wants translations not only from such controlled, well-articulated sources. GALE incorporates man-on-the-street interviews and raucous colloquial chats on the Web.

That's where things get tricky. Background noise, dialects, accents, slang, short words like "on" or "of" that most speakers don't bother to clearly enunciate -- these are the stuff of nightmares for speech-recognition and machine-translation engineers.

Not to mention that Chinese and Arabic are structured very differently than English, making them a pain to translate.

The name of the game is to fine-tune the computer process, known as an algorithm, that does the language analysis. Programming missteps can cause a computer to gain minimal insight from the new language data it is fed. It could even get worse at its translation task.

"It's sort of trial and error guided by intuitions and some knowledge," BBN's Rich Schwartz said.

Though that's not how it gets described in computer scientists' meetings. "Rewrote the forward pass of the decoder algorithm to be a recursive transversal over the hypergraph, rather than a loop over spans," one BBN programmer assured his team in a May presentation.

Speech recognition, machine translation and language distillation don't harbor many secret recipes. Everyone knows what everyone else is trying to do -- tweak algorithms over and over.

The defining element of GALE -- the government's evaluation -- was on the honor system, in keeping with the field's open nature. The teams got the test in June -- hours of audio and dozens of documents in Arabic and Mandarin -- and were expected to turn in their results later.

DARPA judges scored the computer translations by counting the number of human edits that the sentences needed in order for them to have the correct meaning. By this measure, the results largely met DARPA's demands of 75 percent accuracy for text translation and 65 percent for speech.

The BBN-led team produced 75.3 percent accuracy with Arabic text, 75.2 percent in Chinese. It scored 69.4 percent in Arabic speech; 67.1 percent in Mandarin. IBM scored higher with Arabic text and SRI scored higher in Mandarin.

Then came the distillation section: open-ended questions posed to each team's computers -- based on 600,000 documents in Arabic, Chinese and English.

"How did Israel react to the Hamas election victory?" was one such question. "Describe attacks in Kuwait," was another.

DARPA wanted to see how well the computers replicated human performance on such questions, including how precisely they could recall certain facts.

Here, too, the computers managed some articulate responses. "Since Jan. 10 (2005), police have clashed with Muslim fundamentalists and pursued them around the country, killing eight militants and arresting scores of others," went one BBN response to the Kuwait question.

But it was not until three months later -- after all three teams began working on year two of GALE in case they were picked to continue -- that the researchers got DARPA's ruling about who passed.

So who got rejected? No one.

At least not yet.

DARPA Director Anthony Tether and GALE program manager Joseph Olive decided each team had shown significant progress worth continuing to track.

But they did tighten the screws. In addition to expecting better translation accuracy in each of GALE's four remaining years, DARPA will measure that performance more stringently. Now a high level of accuracy must be sustained over a very high percentage of documents. A bad patch of computer translation cannot be averaged away.

Just days after being informed of the new framework, Makhoul already had his eye on the next GALE evaluation, in June, and how his team would deliver the performance DARPA -- and BBN -- needed.

"It's the same feeling again," he said. "The pressure -- it's not off. It's higher, in fact. Now the goals are harder for the second year than they were before."

On the Net:


Respond to this story

Posting a comment requires free registration: