Before September, translation didn’t matter — at least, from an infosec standpoint. Taking content written in one language and changing it to another wasn’t at the top of most CSOs’ lists of data risks. Then Norwegian news network NRK uncovered a breach at Statoil, one of the world’s biggest oil and gas companies.
NRK reports that the $46 billion business used Translate.com, a free online tool, to translate “notices of dismissal, plans of workforce reductions and outsourcing, passwords, code information, and contracts.” Then, the story continued, Lise Lyngsnes Randeberg, a college professor, Googled Statoil: In her results were the company’s translations.
“Wow! What is this?” Randeberg thought, telling NRK, “This was information from organizations, private companies, government agencies.” In other words, stuff Statoil may not have wanted Randeberg — or any Google user — to read.
The translation industry saw the breach coming. “It was something that we had been warning companies about [for] 10 years or so,” says Don DePalma, Chief Strategist at Cambridge-based think tank Common Sense Advisory. “It’s been a question that’s been coming up, given the way [free online translation] works: Is that something that would expose information?”
How online translation services work
So how did it happen? Only Translate.com knows for certain. Neither they nor Chicago-based parent company Emerge Media responded to CSO’s requests for comment.
In general, here’s how free online translation works: Every word you enter is stored in a translation engine where machine learning uses your entry — and its translation — to improve future results. That means anyone who uses the tool after you either has use of or access to your data, if not both. Whether your information winds up on Google from there depends on where and how the tool provider stores it.
Create a translation policy with security in mind
When it comes to preventing your own translation-related data breach, the first step is to determine when employees can — and can’t — use free tools. At BASF, that answer is never. After learning employees were translating “important emails about new products, business plans, [and] PowerPoint presentations” online, independent technology consultant Kirti Vashee says the company blocked all free translation sites.
For an option that’s less severe, you can always limit the use of free translation tools by topic. Maybe it’s okay to enter product shipment details in the software, but not receiver contracts. Vashee says this is problematic, though: Employees often use free translation to see what something’s about. “People will use Google [Translate] and Bing [Translator] because they get a memo in Chinese and just want to know, ‘What is he talking about?’” Employees who don’t speak a language might not realize content is about a sensitive topic until they’ve already translated it.
A more secure option is to create your own machine learning engines and move translation in-house. That’s what Volkswagen did, Vashee explains: “They specifically don’t want to use outside engines because of the risk of exposure.” Of course, in 2016 Volkswagen’s revenue was $251.6 billion. That’s more than the GDP of many sovereign nations, including Chile and Finland. At a company that large, internalizing translation is easy. For other businesses, it’s simply not realistic.
Professional translation services an option, but have their own risks
So what can those companies do? Instead of plugging data in random tools online, tell employees to route all translation through a professional provider. Translation vendor selection is usually based on quality, turnaround and cost. To ensure data security, ask prospective resources how they receive and deliver files for translation. If they say email, watch out. “[Email is] 10 times riskier than any [online] solution because it’s very easy to break into people’s email,” Vashee says.
Email is also readily forwarded — something many translation companies depend on. A human translator gets the job by specializing in that content type and the language direction needed — English into Polish, for example. If either of those factors change, so does the translator. As a result, even the largest translation companies don’t have in-house resources for everything you need. DePalma says, “There’s a lot of reselling in the industry,” translation companies outsourcing work to other providers.
“Let’s say somebody comes along and wants Albanian to Polish,” he explains, “There’s a very small demand for that and they’re probably not going to provision for that on a 24/7/365 basis.” So after you email your file to your selected translation company, they forward it to another one, likely a business you’ve never heard of that only offers Polish. But your data won’t stay there. That company forwards your files to an independent translator somewhere else.
“[It’s] an infinite chain of inheritance,” says DePalma. Twenty-six percent of the average translation company’s income comes from other translation companies, constituting one-fourth of all words translated worldwide.
“As soon as [your file] goes outside the company,” he adds, “it’s in the wild.” In the end, if no human resource is found, your project could wind up on Translation.com, except this time, you’re paying a translation provider to put it there. According to Common Sense Advisory, 64 percent of translation professionals say their colleagues frequently use free translation services on the web.
“When [your data is] in the wild,” DePalma continues, “you then have to rely on the provisions, the security mechanisms, just the entire range of anybody who touches that information to keep it secret and secure.”
That’s a lot of trust for a single vendor. So as the Russian proverb says, trust but verify. To track your data while it’s in translation, Vashee recommends translation management software (TMS), an industry-specific tool that tracks every word from the moment it leaves your office to the moment it comes back.
With TMS, no one accesses data without your direct approval; files cannot be forwarded without your knowledge. “You go in and you provide access,” Vashee says. “If you say, ‘Here are 100 valid IDs and the only people that will be able to touch this data use these 100 valid IDs,’ [you’ll] be able to know exactly what they did every time they touched the data. “That’s a high level of security. A TMS system properly set up will give you some protection.”
This protection isn’t perfect. TMS systems are sold to both translation companies and clients; advanced systems extract content directly from GitHub, Adobe CQ, and other platforms where it’s created. Ask how that connection is secured. Then ask where and how the TMS stores your files.
Even more importantly, does the TMS you use let translators take data out? DePalma mentions that translators are prone to removing materials from TMS to move it into a tool they might like better. They log in, hit export, then suddenly your data is back in the wild. Tell your TMS provider that you want this option turned off.
In the end, though, DePalma says no matter how well you lock down the tech, the riskiest part of any translation project is the translator: “Even if they couldn’t pull [your data] out exactly, what they could do is a screen capture, then do an OCR, and then from that, put it into another tool.” To DePalma’s knowledge, this type of breach is simply “theoretical.” But before September, so was Statoil’s.