Microsoft has released an updated version of Microsoft Cognitive Toolkit, a system for deep learning that is used to speed advances in areas such as speech and image recognition and search relevance on CPUs and NVIDIA GPUs.
The toolkit, previously known as CNTK, was initially developed by computer scientists at Microsoft who wanted a tool to do their own research more quickly and effectively. It quickly moved beyond speech and morphed into an offering that customers, including a leading international appliance maker and Microsoft’s flagship product groups, depend on for a wide variety of deep learning tasks.
“We’ve taken it from a research tool to something that works in a production setting,” said Frank Seide, a principal researcher at Microsoft Artificial Intelligence and Research and a key architect of the Microsoft Cognitive Toolkit.
With the latest version of the toolkit, which is available on GitHub, developers can use Python or C++ programming languages in working with the toolkit. With the new version, researchers also can do a type of artificial intelligence work called reinforcement learning.
Decades of computer vision research, one ‘Swiss Army knife’
Decades of computer vision research, one ‘Swiss Army knife’
When Anne Taylor walks into a room, she wants to know the same things that any person would.
Where is there an empty seat? Who is walking up to me, and is that person smiling or frowning? What does that sign say?
For Taylor, who is blind, there aren’t always easy ways to get this information. Perhaps another person can direct her to her seat, describe her surroundings or make an introduction.
There are apps and tools available to help visually impaired people, she said, but they often only serve one limited function and they aren’t always easy to use. It’s also possible to ask other people for help, but most people prefer to navigate the world as independently as possible.
That’s why, when Taylor arrived at Microsoft about a year ago, she immediately got interested in working with a group of researchers and engineers on a project that she affectionately calls a potential “Swiss Army knife” of tools for visually impaired people.
“I said, ‘Let’s do something that really matters to the blind community,’” said Taylor, a senior project manager who works on ways to make Microsoft products more accessible. “Let’s find a solution for a scenario that really matters.”
That project is Seeing AI, a research project that uses computer vision and natural language processing to describe a person’s surroundings, read text, answer questions and even identify emotions on people’s faces. Seeing AI, which can be used as a cell phone app or via smart glasses from Pivothead, made its public debut at the company’s Build conference this week. It does not currently have a release date.
Taylor said Seeing AI provides another layer of information for people who also are using mobility aids such as white canes and guide dogs.
“This app will help level the playing field,” Taylor said.
At the same conference, Microsoft also unveiled CaptionBot, a demonstration site that can take any image and provide a detailed description of it.
Very deep neural networks, natural language processing and more
Seeing AI and CaptionBot represent the latest advances in this type of technology, but they are built on decades of cutting-edge research in fields including computer vision, image recognition, natural language processing and machine learning.
In recent years, a spate of breakthroughs has allowed computer vision researchers to do things they might not have thought possible even a few years before.
“Some people would describe it as a miracle,” said Xiaodong He, a senior Microsoft researcher who is leading the image captioning effort that is part of Microsoft Cognitive Services. “The intelligence we can say we have developed today is so much better than six years ago.”
The field is moving so fast that it’s substantially better than even six months ago, he said. For example, Kenneth Tran, a senior research engineer on his team who is leading the development effort, recently figured out a way to make the image captioning system more than 20 times faster, allowing people who use tools like Seeing AI to get the information they need much more quickly.
A major a-ha moment came a few years ago, when researchers hit on the idea of using deep neural networks, which roughly mimic the biological processes of the human brain, for machine learning.
Machine learning is the general term for a process in which systems get better at doing something as they are given more training data about that task. For example, if a computer scientist wants to build an app that helps bicyclists recognize when cars are coming up behind them, it would feed the computer tons of pictures of cars, so the app learned to recognize the difference between a car and, say, a sign or a tree.
Computer scientists had used neural networks before, but not in this way, and the new approach resulted in big leaps in computer vision accuracy.
Several months ago, Microsoft researchers Jian Sun and Kaiming He made another big leap when they unveiled a new system that uses very deep neural networks – called residual neural networks – to correctly identify photos. The new approach to recognizing images resulted in huge improvements in accuracy. The researchers shocked the academic community and won two major contests, the ImageNet and Microsoft Common Objects in Context challenges.
Tools to recognize and accurately describe images
That approach is now being used by Microsoft researchers who are working on ways to not just recognize images but also write captions about them. This research, which combines image recognition with natural language processing, can help people who are visually impaired get an accurate description of an image. It also has applications for people who need information about an image but can’t look at it, such as when they are driving.
The image captioning work also has received accolades for its accuracy as compared to other research projects, and it is the basis for the capabilities in Seeing AI and Caption Bot. Now, the researchers are working on expanding the training set so it can give users a deeper sense of the world around them.
Margaret Mitchell, a Microsoft researcher who specializes in natural language processing and has been one of the industry’s leading researchers on image captioning, said she and her colleagues also are looking at ways a computer can describe an image in a more human way.
For example, while a computer might accurately describe a scene as “a group of people that are sitting next to each other,” a person may say that it’s “a group of people having a good time.” The challenge is to help the technology understand what a person would think was most important, and worth saying, about the picture.
“There’s a separation between what’s in an image and what we say about the image,” said Mitchell, who also is one of the leads on the Seeing AI project.
Other Microsoft researchers are developing ways that the latest image recognition tools can provide more thorough explanations of pictures. For example, instead of just describing an image as “a man and a woman sitting next to each other,” it would be more helpful for the technology to say, “Barack Obama and Hillary Clinton are posing for a picture.”
That’s where Lei Zhang comes in.
When you search the Internet for an image today, chances are high that the search engine is relying on text associated with that image to return a picture of Kim Kardashian or Taylor Swift.
Zhang, a senior researcher at Microsoft, is working with researchers including Yandong Guo on a system that uses machine learning to identify celebrities, politicians and public figures based on the elements of the image rather than the text associated with it.
Zhang’s research will be included in the latest vision tools that are part of Microsoft Cognitive Services. That’s a set of tools that is based on Microsoft’s cutting-edge machine learning research, and which developers can use to build apps and services that do things like recognize faces, identify emotions and distinguish various voices. Those tools also have provided the technical basis for Microsoft showcase apps and demonstration websites such as how-old.net, which guesses a person’s age, and Fetch, which can identify a dog’s breed.
Microsoft Cognitive Services is an example of what is becoming a more common phenomenon – the lightning-fast transfer of the latest research advances into products that people can actually use. The engineers who work on Microsoft Cognitive Services say their job is a bit like solving a puzzle, and the pieces are the latest research.
“All these pieces come together and we need to figure out, how do we present those to an end user?” said Chris Buehler, a software engineering manager who works on Microsoft Cognitive Services.
From research project to helpful product
Seeing AI, the research project that could eventually help visually impaired people, is another example of how fast research can become a really helpful tool. It was conceived at last year’s //oneweek Hackathon, an event in which Microsoft employees from across the company work together to try to make a crazy idea become a reality.
The group that built Seeing AI included researchers and engineers from all over the world who were attracted to the project because of the technological challenges and, in many cases, also because they had a personal reason for wanting to help visually impaired people operate more independently.
“We basically had this super team of different people from different backgrounds, working to come up with what was needed,” said Anirudh Koul, who has been a lead on the Seeing AI project since its inception and became interested in it because his grandfather is losing his ability to see.
For Taylor, who joined Microsoft to represent the needs of blind people, it was a great experience that also resulted in a potential product that could make a real difference in people’s lives.
“We were able to come up with this one Swiss Army knife that is so valuable,” she said.
The next phase of Microsoft Academic: intelligent bots at your service!
The next phase of Microsoft Academic: intelligent bots at your service!
Progress in AI research and applications is exploding, and that explosion extends to our own team working on academic services. Continuing our work supercharging Bing and Cortana, we are also applying new technologies to Microsoft Academic, which serves the research community. If you’re not familiar with Microsoft Academic, this online destination helps researchers connect with the papers, conferences, people, and ideas that are most relevant, using bots that read, understand, and deliver the scientific news and papers researchers need to further their work.
Designed by and for researchers like myself, the site puts the broadest and deepest set of scientific information at your fingertips, with the ability to go beyond keywords to the contextual meaning of the content. Recently, we further enhanced the analytic content so users can see the latest research, news, and people, ranked by importance and credibility. Users can even drill down on the people, events, and institutions they care most about.
Behind the scenes, we are taking advantage of the fact that machines do not require time to sleep or eat, and have superior memory to humans. We have trained our AI robots to read, classify, and tag every document published to the web in real time. The result is a massive collection of academic knowledge we call the Microsoft Academic Graph (MAG), which is growing at roughly 1 million articles per week. While one set of robots is busy gathering knowledge from the web, another set of robots is dedicated to analyzing citation behaviors and computing the relative importance of each node in the MAG so that users are always presented with information they need and want.
Microsoft Academic is based on the work our team developed for Microsoft Cognitive Services, including open APIs that give developers AI-based semantic search tools and entity-linking capabilities. We’re also applying AI semantic search—which is contextual and conversational—to Cortana, Bing, and more.
As a research organization, we understand the pivotal role that open communication plays in advancing science. As such, we’re making the back-end dataset and algorithms available to all through Cognitive Services. There, everyone can access and conduct research on the massive and growing dataset through the cloud-based APIs. This means you don’t have to worry about the logistics of transmitting the massive dataset over the Internet, or manage a cluster of computers just to host and analyze the data. We are particularly excited that the research community has taken advantage of these cloud resources and already is collaborating on a common data and benchmarks platform to advance the state of the art. Earlier this year, we saw 81 teams participate in the WSDM Cup 2016 to develop new methods to rank papers, including newly published ones that have yet to receive any citations. An ongoing challenge is the KDD Cup 2016, which is focused on finding a better way to rank the importance of research institutions. The results of the first two stages of the contest have already been published, and I cannot wait to see the final outcomes and learn what new insights and technologies the 500 participating teams have developed when results are announced in August at KDD 2016 in San Francisco!
I encourage you to start experiencing the breadth and depth of what Microsoft Academic currently has to offer and to continue this journey with us in our mission to empower every academic and every academic institution on the planet to achieve more.