Below we've outlined the critical technical considerations for planning a DataRescue event.
Key Steps
- Read the DataRescue Paths available as part of DataRefuge's Overview or EDGI's DataRescue Event Toolkit.
- Join the DataRefuge Slack team and start a channel for your event.
- Review the workflow documentation and decide which paths your event will have.
- Schedule a call with DataRefuge to:
- review the workflow and confirm event logistics like volunteer support
- receive access to the Archivers app to archive complex datasets
- Schedule a call with EDGI to:
- receive training on using Agency Primers and EDGI's Chrome Extension to identify and preserve web pages on federal government web sites
- receive an orientation on event harvesting tools
Event Preservation Tools
Archivers App
A DataRefuge organizer will set up your event in the app and coordinate initial account creation. The Archivers app enables us to keep track of all the DataRescue event preservation and coordinate the work across different roles.
The app includes URLs coming from two main sources: - URLs nominated by Seeders at previous DataRescue events - URLs identified by a Union of Concerned Scientists survey which asked the scientific community to list the most vulnerable and important data currently accessible through federal websites.
Agency Primers and Chrome Extension for Seeding
An EDGI coordinator will set up access to Agency Primer and Sub-primer documents as well as a seed progress spreadsheet. These documents will inform the work of the Seeders at your event. They will tell them which website or website sections they should be focusing on for URL discovery.
The workflow is designed to triage whether a URL will be stored by the Internet Archive or in the DataRefuge repository based on whether it can be automatically crawled by the Internet Archive web crawler or needs to be manually harvested.
- Nominating crawlable URLs makes use of Internet Archive's existing infrastructure. See Seeding for more information on this process.
- Datasets manually harvested are uploaded through the Archivers app to an Amazon S3 storage managed by DataRefuge.
Permissions and Credentials
- All Path II Attendees need to have an account on the Archivers app.
- You will need to generate invites for each one within the app, and paste the URL generated in a Slack Direct Message or email.
- Each participant invited will automatically "belong" to your event in the app.
- In addition, Checkers and Baggers need to be given additional privileges in the app to access the Checking (i.e. "Finalize") and Bagging sections.
Technical Resources
- Access to Wi-Fi
- Extra Power Strips and Extension Cords
- Backup storage (e.g., large (>16GB) thumb drives)
- Backup cloud compute resources