Jeanne Kramer-Smyth, @spellboundblog, Describer
Jeanne grew up in New York state near a large reservoir that feeds New York City. “Many of the choices of my life have led me to today.” Her father was a birdwatcher and naturalist, so she says she grew up with a very strong connection to nature. Jeanne did her undergraduate degree in environmental science, and went to work in database development. She has a Masters degree in library science with a focus in archives, and then became an archivist with a focus on digital preservation. “As far as what I care about, I would say that I worry about what we’re doing to our planet. I believe in data. I believe in science and facts.”
Jeanne says she’s excited that Data Refuge is paying attention to all the different parts of the archiving process and that the web archiving tools are easy to use and that she can continue to use them after this event. “Sometimes at hackathons, it feels like there’s a lot of energy in the room and then everyone goes home. Data Refuge is building the workflow and the tools to support it, and that is very exciting to me.”
Johann, web developer, Researcher and Harvester
Johann is a web developer. Like many at #datarescueDC, his connection to the data was personal: his wife is an environmental historian who works extensively with archives in the US and Canada, so Johann knows the importance of public data well. He personally has been concerned to see information disappear from public-facing websites like WhiteHouse.gov in the past few weeks.
His experience at #datarescueDC was marked by friendly collaboration. Johann would like to encourage others to keep contributing to #datarefuge whenever they have time.
Daisy Glotzhober, Bagger and checker
Daisy Glotzhober is here today because she understands that the results of the recent election have significant implications on the government’s open data policy, particularly when it comes to climate change. The new administration confirmed Scott Pruitt to the Environmental Protection Agency (EPA), demonstrating its hostility to climate change, science, and facts overall. These events have inspired her to engage in a more activist role in her day to day.
As a data scientist in consulting, Daisy supports government agencies that rely on climate and other federal data for decision management. The data rescue movement is intended to be an insurance policy. “There are concerns that the current administration could compromise or purge data sets, with environmental and climate data being particularly vulnerable. We want to ensure that these data sets are still publicly accessible in that event.” This movement is so important because retaining open access to data is critical to advancing scientific research, which is the backbone of government decision-making and public policy.
Robin, cloud architect, Harvester
Robin is a cloud architect. Though Robin is shocked to realize that some data appears to already be disappearing, he hopes that most of the work saving data to #datarefuge will seem wasted in retrospect.
He’s cautiously hopeful that federal workers will defend the data produced by their agencies. Even so, he believes events like #datarescueDC are a great opportunity to bring like-minded people together, re-establish community, and prepare for the challenges ahead.
He looks forward to contributing his skills to strengthening the technical toolkit used by #datarefuge after #datarescueDC is over.
Victoria Levchenko, seeder-sorter and a guide at the event
Victoria is in her final year at Georgetown. She studies government and has a heavy focus on computer and data science. She transferred to Georgetown because of an intense interest in federal data. At the time, she was impacted by the idea that “big data is going to change government.” She says, “There’s this mythology that data in the government is centralized and accessible -- that people are able to use it in innovative and cutting-edge ways. But it was disappointing to discover that wasn’t the case.” In other words, there’s a lot of data out there, but it doesn’t come pre-packed or centralized in an easy to use way. Each agency has its own separate data sets. “As someone who likes taking small datasets and working on them, there are so many stories to be told but [we need] people need to facilitate that happening.”
When Trump was elected she noticed others talking about how afraid they were about the EPA shutting down and other impacts. “I realized there is so much data that could be lost - which means that our history would be lost.” So when she heard about DataRefuge and DataRescueDC she was eager to join in and contribute as a volunteer and as a guide. “Data Refuge is wonderful because it’s doing something that the government can’t - centralizing this data.”
To get involved, Victoria says, “The first step is to think about your history and your community’s history that can be told by data. If there’s something you’re interested in, think of whether info about that was collected. You can send that data to Data Refuge. Or if you’re concerned about preserving this information, talk to people about it. Make them understand this is an issue. Everything that was created can be taken away, so tell others about the vulnerability of data.”
Mood/atmosphere: Cheerful, engaged, a little bit confused
Moshe works as a developer in New York. When a friend sent him an article about a similar data rescue event in Berkley, he knew he wanted to get involved. “I’ve been looking to use my skills as a developer to do something useful given the hostility of the current administration.I know some people down here and got in touch via Slack to come down here.
Why this is important: The data is the raw material for making decisions. If now is a particularly rough time for expertise and throughtful decision making in the policy world and it’s important to preserve the underpinnings that allow for better decision making. If tehre’s any change that it wouldn’t be maintained it would be a good thing to keep it.
The biggest challenge is not 100% knowing how this data is used on the other side of this process. I’m not sure how a scientist is going to use it? How do scientists in the future use it?
Data sets: downloaded a summary of toxic chemicals being monitored by the EPA. Each one has a report of kind of like why it’s bad or why it needs to be tracked. It was cool to see all of those chemicals get downloaded and tracked. Health implications. It felt extra relevant.
Basically data is really annoying to collect, but once you have it it lets people think about how the world works without having to watch every part, it gives you a record of what’s going on so that you can figure out how it works. We don’t always know how this stuff is used or is going to be used. So there’s a potential for the future - if it describes reality well it can shed light on many more things.
Susan Yount, CPA, civil servant, seeder and sorter
Susan Yount became a crusader for open data in 2009 when she came to DC to work on financial transparency as a civil servant. She fondly remembers the excitement around the Obama administration’s push for transparency and openness through the Open Government Initiative, which required federal agencies to release its data online. “What felt very vibrant to me was the White House’s involvement in data-they seemed very engaged. It was very exciting to think of the government doing something cutting-edge.” In addition to setting data standards for financial information, Susan also supported federal agencies creating repositories for open data.
The availability of this open data concerns her most, she says, because it forms a vital aspect of a functioning democracy and effective governance. “You cannot make rational decisions about governing without data. Otherwise it’s just guesses and opinions.” She’s happy to lend an afternoon a month to Data Refuge because, “someday it could be my data set.”
Dina M. Bagger and checker
“90% of the time spent working with data is cleaning the data, not analyzing it. As a user, now I have to understand what the data are and clean it. There’s so much talk of citizen scientists, but unless it’s your job to crawl through and clean these data sets, I don’t see people using these datasets efficiently. It’s a larger struggle within the data community. Times are different - we have to think differently about how the government should present data to its citizens.”
J. Montgomery, Bagger and checker
“The government may have jumped on the bandwagon of digitization but they have failed the American people. They’ve put out data but they haven’t provided information, which is usable data. As a user, you have to have a high level of sophistication to turn the data into information that is useful to you.”
Justin, Librarian and Software Developer at GWU Library, Harvester
Justin is a librarian and software developer who has been developing the Social Feed Manager and using it to contribute social media data to the End of Term archiving effort, so he’s was primed for #datarefuge as it began to take shape. During #datarescueDC, he was surprised to find unexpected personal and social connections to the data being archived.
He began his day working on a USGS dataset because several members in his family work in the geological sciences, including his recently passed father-in-law and it was clear to him that the dataset he was working with was on an older server that might be prone to failure. Like many librarians, Justin emphasizes that archiving this data is necessary not just because of political threats, but because federal agencies often do not have the resources or funding to protect vulnerable data sets.
“From an ethical perspective, this data does not belong to the agency or the administration. It belongs to the citizenry. It’s taxpayer-paid and funded. Regardless of the decision of an administration or agency makes, people have a right to access the data. This DataRescueDC event is going to preserve that right.” In addition to losing access to the data, there is a concern that federal funding to support data collection will end. Preserving this data now ensures that those collecting the data can pick up where they left off in the future.
This person says this effort is important because “without the data, you can wreck a lot of stuff with good intentions. Even the greenest, most environmentally-minded initiative can result in problems that you didn’t intend to cause. For example, switching to electric cars results in more harmful batteries. Data ensures that when we try to do something, we do what we were intending to. Data is the only way that we are effective.” (Editorial note: For example, the data can be used in a lifecycle analysis to determine whether the benefits outweigh the cost.)
The way this data is used goes beyond scientists and research. It answers questions like “Can I move here and drink the water? Do I know my child’s school wasn’t built upon a site where there used to be a chemical factory? If the data goes away we’ll never find it.”
If you’re looking to help in this effort, this person suggested to make Freedom of Information Act (FOIA) requests around data that you’re seeking to preserve or that you have trouble accessing. “The government has an obligation to give you the information you seek. Tell the government agencies that you can’t access this information that you have a right to get.”
The data we worked on today is used for critical infrastructure for wastewater treatment, power plants, resilience, emergency planning, water quality, compliance data, and more. Here are some examples of the data volunteers preserved today, and its significance:
- Oil storage tanks locations: Incredibly important in an emergency like Katrina -- in that case a tank collapsed and leaked into the surrounding area
- EPA’s Toxic Release Inventory: logs 27 years of data showing what types of releases are happening in communities, what’s being emitted, and because it tracks over time you can see progress in pollution prevention. (p2 pollution prevention) Or also changes towards the negative, to ensure companies are doing the right thing. Can compare within an industry sector - for mining companies for example, might see one company that is doing better, or companies that are performing better to use as an example for the rest of the industry. Identify best practices in reducing pollution.
- NEPAssist: Assembles multiple data sets if screening for big project to asses the environmental impact. Has been replicated in a few other countries.
- Facility Registry Service: connects to energy admin, so you can find out for a power plant for its fuel consumption. Then look at EPA data to see what it’s air emissions are. So helps you compare coal versus gas plants or see toxic release inventory, across multiple agencies.
- EPA EnviroAtlas: spatial data with 100 layers dealing with land use, land cover, permeable or impermeable surfaces, climate change data, ecosystem and eco region data on habitats in those areas. Current and historical data. People use the Enviroatlas data to compare the impacts of climate change over time, get a better picture nationally of different eco regions. If we didn’t have this resource, we wouldn’t have that insight into the impacts of climate change and habitats on ecosystems.
Data helps you answer questions, such as “How are companies trying to be more sustainable? How much of their waste is recovered or recycled?” Then we can see how manufacturing has advanced to be more environmentally friendly and efficient. Other companies or industries can then learn from that success and implement the same strategies.
Data also helps you answer questions like, “What is the life cycle of an industrial facility?” Building a plant, operating and dealing with emissions and chemicals on-site, risks for the community. Take the chemical plant explosion in West, Texas. This data helps explain what might otherwise be under people’s radars. It’s incredibly important information for communities to know. Factories have tended to located themselves in poorer, minority communities, whose residents aren’t as likely to be heard. We need this environmental data, along with demographic data, health data, and more, to understand the implications and effects these decisions have on the communities.