Seeding

What Do Seeders Do?

Seeders canvass the resources of a given government agency, identifying important URLs. They identify whether those URLs can be crawled by the Internet Archive's web crawler. Using the EDGI Nomination Chrome extension, Seeders nominate crawlable URLs to the Internet Archive or add them to the Archivers app if they require manual archiving.

Recommended Skills
Consider this path if you’re comfortable browsing the web and have great attention to detail. An understanding of how web pages are structured will help you with this task.

Choosing the Website

Seeders use the EDGI Archiving Primers, or a similar set of resources, to identify important and at-risk data. Talk to the DataRescue organizers to learn more.

Canvassing the Website and Evaluating Content

Start exploring the website assigned, identifying important URLs.
Decide whether the data on a page or website subsection can be automatically captured by the Internet Archive web crawler.
EDGI's Guides have information critical to the seeding and sorting process:
- Understanding the Internet Archive Web Crawler
- Seeding the Internet Archive’s Web Crawler

Crawlable URLs

URLs judged to be crawlable are nominated ("seeded") to the Internet Archive, using the EDGI Nomination Chrome extension.

To learn more about nominating URLs, refer to this Google Doc, watch this training video on Agency Primers and EOT or talk to the DataRescue organizers.

Wherever possible, add in the Agency Office Code from the sub-primer database.

Uncrawlable URLs

If URL is judged not crawlable, check one of the checkboxes next to the four types of uncrawlables in the Chrome Extension. This will add the URL to the Researching queue in the Archivers app.
The URL will be automatically associated with a universal unique identifier (UUID).
You can check whether the page or some files are archived using the Internet Archive's Wayback Machine Chrome Extension

Not Sure?

This sorting is only provisional; when in doubt, Seeders nominate the URL and mark it as possibly not crawlable.