Computer science prof wins Sloan Fellowship

In today’s day and age, data is everywhere. What can we do with it all, and how can we analyze it quickly and accurately? Ask Dr. Mark Schmidt, one of the two UBC researchers who were awarded a Sloan Fellowship this year.

The Sloan Fellowships are $65,000, two-year grants awarded to researchers in science, technology, engineering, mathematics and economics. Granted by the Alfred P. Sloan Foundation, they are given annually to 126 researchers in the US and in Canada. Since 1984, 22 UBC Science researchers have been recognized as Sloan Fellows.

“We sort of have data coming out of the ears as we build technology to collect data,” said Schmidt. “But it’s analyzing the data that people are realizing is an issue because we have all this data and we don’t really know what to do with it.”

So what can we do with the massive data horde we have today? We can teach machines how to analyze it so it becomes useful to us humans, a process known as machine learning.

Schmidt was awarded the fellowship largely due to his innovations in machine learning. Machine learning is about taking a large supply of data and analyzing it to find patterns in the data. Speech recognition software, like Siri, is a good example. First, a lot of data on how sounds are transformed into words was collected. Machine learning was then used to build a system that accurately interprets the sounds you’re making into the words you’re speaking.

This graph shows how algorithms have become fast and more efficient over time. The horizontal axis represents time and the vertical axis represents error. Older algorithms (yellow) were very slow but had very little error. Faster algorithms were created by only analyzing some of the data (orange). The method was faster but had an accuracy limit. Schmidt's algorithm is faster and has no accuracy limit.
This graph shows how algorithms have become fast and more efficient over time. The horizontal axis represents time and the vertical axis represents error. Older algorithms (yellow) were very slow but had very little error. Faster algorithms were created by only analyzing some of the data (orange). The method was faster but had an accuracy limit. Schmidt's algorithm is faster and has no accuracy limit. Aiken Lao / The Ubyssey

Schmidt said the word learning is used because the machine learns how to predict what new data is based on the old data it was given rather than just memorizing old patterns. 

“It’s only really learning if you can now take a new input and produce the correct output. We really study methods that can work well in new situations, not just work with the data that you’ve seen,” he said.

Schmidt and his students have used machine learning to investigate a number of important issues. These include teaching machines to recognize brain tumours and evaluate their severity, building systems that detect abnormal heart motions before a heart attack occurs and modelling how ideas spread around social networks, among many other applications.

But with all the data we have constantly increasing, it can take a lot of time and effort to process it all. But have no fear — Schmidt is already on it.

“I’m trying to develop better algorithms and usually that means faster algorithms. Faster could mean many things, but often it means that you can have a bigger data set or you can make a decision quicker or something like that,” he said. Schmidt added that problems often arise when the time it takes to process a data set increases exponentially.

Algorithms are like math expressions or tools that affect data processing time. There are some algorithms that can process data faster and there are others that process data more slowly.

In 2012, Schmidt essentially developed a new tool for processing data known as the stochastic average gradient (SAG). Since the algorithm looks at random pieces of data rather than the entire data set, it can deal with a large volume of data. Usually, this ability to analyze a lot of data comes at a price — you have to take an exponential number of steps to analyze it all. But with Schmidt’s method, you get a more accurate analysis in a shorter amount of time. Win-win.

In terms of applications, Schmidt and his students are currently working on quantum computing and molecular computing, building a system to help freshwater fisheries detect overfishing in BC lakes and building robots that scan farming fields for pests and crop diseases. He’s also keen on finding a system that discovers the interesting parts of data by directing the random attention in algorithms to more helpful or important data.

Schmidt said that the $65,000 from the fellowship would go toward student salaries, sending students to conferences and possibly buying equipment.