On October 31, 2013, the city of Austin, Texas, faced a destructive flood. At the time, I was visiting David Maidment, Chaired Professor of the Civil Engineering Center for Research in Water Resources on site at the University of Texas at Austin. The day before the flood, we had been discussing research and analytics around the long-standing drought conditions across western Texas. Overnight, a flash flood wreaked havoc on the Austin area, largely due to the failure of a stream gauge on Onion Creek, which prevented local emergency response officials from being properly informed about the situation.
On the morning of October 30, the stream gauge monitoring Onion Creek’s was operational and reporting that the stream level was rising to dangerous levels. First responders were monitoring the gauge so that they would be prepared for sending out support crews. However, around 5:00 a.m., the stream level reported by the gauge dropped to zero—which is not uncommon in the southern United States, where washes and stream levels can quickly drop to normal levels once the initial precipitation pattern passes. With the disaster appearing to have been averted, emergency responder turned their attention elsewhere. In actuality, the gauge had failed, the stream overran its banks, and more than 500 homes flooded and five people died.
Since the Onion Creek event, every year and often several times each year, Texas and nearby Oklahoma have experienced several floods, some of which have been more deadly than the 2013 event. In May 2015 a flood in this region claimed 48 lives, including two first responders, Deputy Jessica Hollis of the Austin Police Department and Captain Jason Farley of the Claremore, Oklahoma, Fire Department.
Researchers from the University of Texas at Austin (UT Austin) are collaborating with other researchers, federal agencies, commercial partners, and first responders to create the National Flood Interoperability Experiment (NFIE). The goals of the NFIE include standardizing data, demonstrating a scalable solution, and helping to close the gap between national flood forecasting and local emergency response. The objective is to create a system that interoperates between different publically available data sources to model floods, based on predictions.
Systems for each of the 13 water regions in the United States were developed, two of them at Microsoft Research by my visiting researcher, Marcello Somos (New England region), and intern Fernando Salas (Gulf region), both from the UT Austin. After Marcello and Fernando returned to Austin, they collaborated with other institutions to create a national flood map for the entire nation. This interoperated data product was used by NOAA to run a summer institute at the National Water Center in Tuscaloosa, Alabama, with 38 top hydrology and meteorology graduate students from around the world.
My colleague Prashant Dhingra and I presented Microsoft Azure and the recently announced big data advanced analytics and intelligence platform, Cortana Intelligence Suite, to the students at the annual National Water Center Summer Institute. Several enterprising attendees created interesting analytics projects. Tim Petty, a PhD candidate at the University of Alaska, Fairbanks, wanted to address “the Onion Creek Problem,” and what we can do to estimate flood levels when stream gauges fail. And so project SHEM began.
Streamflow hydrology estimate using machine learning (SHEM) is a Cortana Intelligence Suite experiment that creates a predictive model that can act as a proxy streamflow data when a stream gauge fails. And due to the machine learning capabilities, it can even make estimates of stream levels where there is no actual stream gauge present.
SHEM differs from most existing models as it does not rely on distances between stream gauges and their location attributes, but is based solely on machine learning to process from historical patterns of discharge and interpret large volumes of complex hydrology data. This “training” prepares SHEM to predict streamflow information for a given location and time as it is impacted by multivariate attributes (for example, type of stream, type of reservoir, amount of precipitation, and surface and subsurface flow conditions).
Using Cortana Intelligence Suite (CIS), our joint research team was able to ingest, clean, refine, and format the historical US Geological Survey stream gauge data. We leveraged the Boosted Decision Tree Regression module which is one of many built-in machine learning algorithms. We also used built-in modules for data cleaning and transformation as well as modules for model scoring and evaluation. Wherever custom functionality is needed, you can add R or python modules directly to the workflow. And this is the advantage of Azure Machine Learning—that you can test multiple built-in or hand-coded algorithms and workflows in order to build an optimal solution rerunning and testing with reproducible results.
As with NFIE a year ago, SHEM is in the early stages of development and expanding it to more and more states is ongoing work. But the results bode well. All indications are that Cortana Intelligence Suite can use NFIE data and analysis products to effectively provide a reasonable estimate when a gauge is not present. Another byproduct of this experiment is that we can evaluate where there is the greatest variance in accuracy, which can, in turn, give us a good idea where it might be best to install new stream gauges.
And that should help all of us sleep a lot better—even in Austin.