Developing Expertise in Data Science
Imagine the vast quantity of data an online retailer such as Amazon collects from shoppers in a day. Or the amount of data the New York City Police Department collects on “stop and frisks” in a year.
Analyzing and interpreting such huge, quickly changing data sets is the province of the interdisciplinary field of data science.
Data science is “an intersection of mathematics, computer science, and statistics that has developed this new field, where we’re working with very different data, with very different techniques than were common 20 years ago,” says Shonda Kuiper, professor of mathematics and statistics and an expert in statistics pedagogy.
“Students need those skills to compete in today’s world,” says Jeff Jonkman, associate professor of mathematics and statistics. “Students are very interested in it. A lot of high school students who come to visit want to talk about data science.”
Kuiper, Jonkman, Samuel Rebelsky, professor of computer science, and other faculty have been working on incorporating data science into the curriculum in various ways. Two new data science courses have been developed, an introductory course and a capstone course. In Rebelsky’s CS 151 course, he’s relating the study of functional problem solving to the practice of data science.
To support these efforts, the Roy J. Carver Charitable Trust awarded Grinnell a $200,000 grant in 2016. Since it began its grant-making activities in 1987, the Carver Trust has distributed more than $258 million in the form of nearly 2,000 individual grants. The Carver Trust focuses its charitable giving on biomedical and scientific research; primary, secondary and higher education; and youth-related needs.
“The grant has provided us some time and space to really think about how to move forward in this new discipline in an efficient way,” Kuiper says. She took 10 online courses and received a certification in data science during spring 2017.
Of those 10 courses, three were review, “but there were a lot of things that were very new to me as well,” she says. Such as natural language processing.
“Analyzing text is something that wasn’t commonly done by statisticians before, but now it’s something that we should know how to do,” Kuiper says. “How do you take blogs, how do you take movie scripts, and find patterns?”
During the summer of 2017, Kuiper and Jonkman developed the curriculum for a new 200-level course, Introduction to Data Science. They sketched out what topics they needed to cover and in what depth. They worked “with students collaboratively to develop and design tutorials to train students in these new areas,” Kuiper says. They had 10 tutorials set up before classes started.
They also looked at what people were doing at other schools “where data science is trying to grow into whatever it will be when it grows up,” Jonkman says.
“I think one thing we agreed on,” Kuiper adds, was that “we really wanted this to be an applied class, where students have a final project that they could present to possible employers when they leave the class.”
In addition to developing the curriculum together, Kuiper and Jonkman team-taught the course in the fall. “The team-teaching is incredibly beneficial,” Jonkman says, “because Shonda knows more about it than I do, and I can learn more as I go and hopefully be a lot more ready in the spring.” They’ll each teach the course again, separately, in spring 2018.
Kuiper adds, “There is no way I think either one of us could have done this without mutual support.”
One of the biggest challenges with this course is that everything is very new, she says. “The tutorials we built this summer are now outdated, because if we’re pulling live data, it’s changing constantly. And we’re using free software, which I think is also very beneficial for the students to use, but that creates a lot of messiness in the classroom and with the data itself.”
In the course, students examined many different types of data, including college data sets; millions of New York Police Department reports; housing prices in Ames, Iowa; movie ratings pulled from the web; and a database of global terrorism incidents.
“Our goal is to be as interdisciplinary as possible,” Kuiper says. “I think every discipline is now using data in new ways.”