Are cyber companies selling snake oil?

Daniel Woolfolk checks out the tonic at the Authentic8 booth at RSA in San Francisco.

As cybersecurity professionals milled about the convention floor at the RSA security conference, vendors peddled wares metaphorically doused in machine learning and artificial intelligence. At one booth, a company took that formula for success to its logical end, hiring an actor complete with fake mustache and top-hat to hawk the “extract of AI in a bottle,” an overt snake-oil salesman amidst a sea of plausibly deniable snake-oil salesmen.

From the exposition floor to the panel chambers, there is a pervasive sense that Machine Learning and Artificial Intelligence have moved from useful definition to marketing buzzwords, a way for the process to obscure the result. How to accurately understand the terms, away from metaphorical and literal snake oil pitches, was the focus of a panel this morning. In as much as there is an antidote, the panelists agreed, it is with transparency.

Speaking under the banner of “Age of the Machines in Cyber ― Machine Learning and AI, the Next Frontier” was panel moderator Ira Winkler, president of Secure Mentem. On the panel were Oliver Friedrichs, founder and CEO of Phantom, a security operations company. Next to Friedrichs sat the chief technology officer of cybersecurity company Versive, Dustin Hillard, and rounding out the panel was Ramesh Sepehrrad, VP of technology at Freddie mac, with a focus on risk and resiliency.

Winkler opened the discussion by acknowledging the show floor hype around poorly defined concepts (though he didn’t specifically mention the literal snake-oil pitchman). To cut through the confusion, Winkler instead asked the panel to define “machine learning.” Friedrichs offered a straightforward statement on the process: train an algorithm on items categorized as good or bad, and then let the algorithm sort unlabeled items into those categories based on what it learned. Hillard expanded on this, noting that it can both do classification for binary categories or run regressions if what’s measured is not an on/off trait but a quantity. For Hillard, AI is an umbrella over the machine learning, the process that integrates the data into the system and provides humans with a useful tool to analyze it.

“AI is a foundation, and machine learning sits on top of it,” said Sepehrrad, inverting the definition. Machine learning is processing data, analyzing that data, then processing new data, reinforcing learning all the while. Still, when it comes to communicating these concepts clearly, she noted “We have a language problem.”

Even if a machine learning process is understood within an agency or company, or within the team in a company that developed it, there’s no guarantee that anyone outside that agency knows how it works. For most of us, most of the time, that’s fine; algorithms can function as an opaque black box that eats data and spits out results. But that it makes it very hard to make that data and that process accountable in any way.

Winkler asked the panel how to prevent companies from misusing machine learning. The answer, echoed by the panel, was transparency.

“There was an accord announced earlier this week, setting out rights we should have thought of earlier,” said Sepehrrad, noting that in China we can see the development of AI without the checks of a democratic society. “Do we want a user defined tech experience, or tech define user experience? This isn’t just about getting VC money. This is going to have implications for future generations. We to get a more multidisciplinary village; technology is part of modern life today.”

“Understanding & controlling AI and Machine Learning is about data transparency,” said Hillard, “Most research driven by data set first; transparency into what is available & what is collected & how it is used, lack is vital, and we saw a lack of it in regards to Facebook. Algorithms increase the opacity.”

Friedrichs tied the call for transparency back into one of the big vulnerabilities of machine learning systems, the way that people can feed incorrect data into the process and get the algorithm to accept false results and then produce false results. As one specific example, he cited self-driving cars that could no longer recognize stop signs when the stop signs had sticker on them. Knowing the data enables the people managed the data and the algorithm to spot flaws, and correct them.

“Can you envision real-life skynet? Will machines get autonomous on their own?” Winkler asked the panel, making a hard pivot from commercial accident to the apocalyptic.

“Attackers are using machine learning to to figure out what to attack,” said Friedrichs. Hillard noted that in the financial space, we’ve seen mistakes where trading algorithms rapidly lost large sums of money, based on quirks in the trading algorithm.

“There are two ways AI can mess up,” said Hillard, “One is the algorithim is left alone and does as programmed and goofs. The other is that the AI figures something new out.”

This last point is what we know as “emergent behavior”, which we see in scenarios where simulated robots find novel solutions to the problems they are given and then adapt in sometimes disastrous ways.

Winkler next asked the panel about the possibility of an adversary messing with drones and missiles to change how they work, which Hillard acknowledged as a real possibility, though no one present took the time to dwell on that danger.

“To understand attacks,” Sepehrrad, “think back to what motivates attack, defense, what drives that goal and who is behind the algorithm design and how they designed it.”

“If Skynet is a thing, it’s cybersecurity folk who will invent or create it,” said Winkler.

How can humans build complex machines and then trust that those complex machines will be used responsibility? This isn’t a new problem, though with references to both fictional misuse and the real examples from modern-day technology companies, it appears to be a question the technology security community is at least wrestling with publicly. In as much as there’s an answer, it’s transparency, though that process of transparency will likely have to be politically motivated itself, lest it become just another jar of snake oil.