Using data science to beat cancer

Dividing cancer cell, SEMUsing data science to beat cancer

By Nancy Brinker as written on
The complexity of seeking a cure for cancer has vexed researchers for decades. While they’ve made remarkable progress, they are still waging a battle uphill as cancer remains one of the leading causes of death worldwide.
Yet scientists may soon have a critical new ally at their sides —  intelligent machines — that can attack that complexity in a different way.
Consider an example from the world of gaming: Last year, Google’s artificial intelligence platform, AlphaGo, deployed techniques in deep learning to beat South Korea Grand Master Lee Sedol in the immensely complex game of Go, which has more moves than there are stars in the universe.
Those same techniques of machine learning and AI can be brought to bear in the massive scientific puzzle of cancer.
One thing is certain — we won’t have a shot at conquering cancer with these new methods if we don’t have more data to work with. Many data sets, including medical records, genetic tests and mammograms, for example, are locked up and out of reach of our best scientific minds and our best learning algorithms.
The good news is that big data’s role in cancer research is now at center stage, and a number of large-scale, government-led sequencing initiatives are moving forward. Those include the U.S. Department of Veteran Affairs’ Million Veteran Program; the 100,000 Genomes Project in the U.K.; and the NIH’s The Cancer Genome Atlas, which holds data from more than 11,000 patients and is open to researchers everywhere to analyze via the cloud. According to a recent study, as many as 2 billion human genomes could be sequenced by 2025.
There are other trends driving demand for fresh data, including genetic testing. In 2007, sequencing one person’s genome cost $10 million. Today you can get this done for less than $1,000. In other words, for every person sequenced 10 years ago, we can now do 10,000. The implications are big: Discovering that you have a mutation linked to higher risk of certain types of cancer can sometimes be a life-saving bit of information. And as costs approach mass affordability, research efforts approach massive potential scale.
A central challenge for researchers (and society) is that current data sets lack both volume and ethnic diversity. In addition, researchers often face restrictive legal terms and reluctant sharing partnerships. Even when organizations share genomic data sets, the agreements are typically between individual institutions for individual data sets. While there are larger clearinghouses and databases operating today that have done great work, we need more work on standardized terms and platforms to accelerate access.
The potential benefits of these new technologies go beyond identifying risk and screening. Advances in machine learning can help accelerate cancer drug development and therapy selections, enabling doctors to match patients with clinical trials, and improving their abilities to provide custom treatment plans for cancer patients (Herceptin, one of the earliest examples, remains one of the best).
We believe three things need to happen to make data more available for use for cancer research and AI programs. First, patients should be able to contribute data easily. This includes medical records, radiology images and genetic testing. Laboratory companies and medical centers should adopt a common consent form to make it easy and legal for data sharing to occur. Second, more funding is needed for researchers working at the intersection of AI, data science and cancer. Just as the Chan Zuckerberg Foundation is funding new tool development for medicine, new AI techniques need to be funded for medical applications. Third, new data sets should be generated, focused on people of all ethnicities. We need to make sure that advances in cancer research are accessible to all.

How web search data might help diagnose serious illness earlier

[vc_row][vc_column][vc_column_text][vc_single_image image="11475" img_size="full" alignment="center"][vc_column_text]

How web search data might help diagnose serious illness earlier

By Mike Brunker as written on
Early diagnosis is key to gaining the upper hand against a wide range of diseases. Now Microsoft researchers are suggesting that records of the topics that people search for on the Internet could one day prove as useful as an X-ray or MRI in detecting some illnesses before it’s too late.
The potential of using engagement with search engines to predict an eventual diagnosis – and possibly buy critical time for a medical response — is demonstrated in a new study by Microsoft researchers Eric Horvitz and Ryen White, along with former Microsoft intern and Columbia University doctoral candidate John Paparrizos.
In a paper published Tuesday in the Journal of Oncology Practice, the trio detailed how they used anonymized Bing search logs to identify people whose queries provided strong evidence that they had recently been diagnosed with pancreatic cancer – a particularly deadly and fast-spreading cancer that is frequently caught too late to cure. Then they retroactively analyzed searches for symptoms of the disease over many months prior to identify patterns of queries most likely to signal an eventual diagnosis.
“We find that signals about patterns of queries in search logs can predict the future appearance of queries that are highly suggestive of a diagnosis of pancreatic adenocarcinoma,” – the medical term for pancreatic cancer, the authors wrote. “We show specifically that we can identify 5 to 15 percent of cases while preserving extremely low false positive rates” of as low as 1 in 100,000.
The researchers used large-scale anonymized data and complied with best practices in ethics and privacy for the study.


Eric Horvitz

Eric Horvitz, a technical fellow and managing director of Microsoft’s Redmond, Washington, research lab (Photography by Scott Eklund/Red Box Pictures)

Horvitz, a technical fellow and managing director of Microsoft’s research lab in Redmond, Washington, said the method shows the feasibility of a new form of screening that could ultimately allow patients and their physicans to diagnose pancreatic cancer and begin treatment weeks or months earlier than they otherwise would have. That’s an important advantage in fighting a disease with a very low survival rate if it isn’t caught early.
Pancreatic cancer — the fourth leading cause of cancer death in the United States – was in many ways the ideal subject for the study because it typically produces a series of subtle symptoms, like itchy skin, weight loss, light-colored stools, patterns of back pain and a slight yellowing of the eyes and skin that often don’t prompt a patient to seek medical attention.
Horvitz, an artificial intelligence expert who holds both a Ph.D. and an MD from Stanford University, said the researchers found that queries entered to seek answers about that set of symptoms can serve as an early warning for the onset of illness.
But Horvitz said that he and White, chief technology officer for Microsoft Health and an information retrieval expert, believe that analysis of search queries could have broad applications.
“We are excited about applying this analytical pipeline to other devastating and hard-to-detect diseases,” Horvitz said.
Horvitz and White emphasize that the research was done as a proof of concept that such a “different kind of sensor network or monitoring system” is possible. The researchers said Microsoft has no plans to develop any products linked to the discovery.
Instead, the authors said, they hope the positive results from the feasibility study will excite the broader medical community and generate discussion about how such a screening methodology might be used.  They suggest that it would likely involve analyzing anonymized data and having a method for people who opt in to receive some sort of notification about health risks, either directly or through their doctors, in the event algorithms detected a pattern of search queries that could signal a health concern.
But White said the search analysis would not be a medical opinion.
“The goal is not to perform the diagnosis,” he said. “The goal is to help those at highest risk to engage with medical professionals who can actually make the true diagnosis.”
White and Horvitz said they wanted to take the results of the pancreatic cancer study directly to those in a position to do something with the results, which is why they chose to first publish in a medical journal.
“I guess I’m at a point now in my career where I’m not interested in the potential for impact,” White said of the decision. “I actually want to have impact. I would like to see the medical community pick this up and take it as a technology, and work with us to enable this type of screening.”
And Horvitz, who said he lost his best childhood friend and, soon after, a close colleague in computer science to pancreatic cancer, said the stakes are too high to delay getting the word out.
“People are being diagnosed too late,” he said. “We believe that these results frame a new approach to pre-screening or screening, but there’s work to do to go from the feasibility study to real-world fielding.”
Horvitz and White have previously teamed up on other search-related medical studies – notably a 2008 analysis of “cyberchondria” – or “medical anxiety that is stimulated by symptom searches on the web,” as Horvitz puts it – and analyses of search logs that identify adverse effects of medications.