What Do Describers Do?
Describers create a descriptive record in the DataRefuge CKAN repository for each bag. Then they link the record to the bag and make the record public.
Recommended Skills
Consider this path if you have experience working with scientific data (particularly climate or environmental data) or with metadata practices.
Consider this path if you have experience working with scientific data (particularly climate or environmental data) or with metadata practices.
Getting Set up as a Describer
- Apply to become a Describer by asking your DataRescue guide or by filling out this form.
- Note that an email address is required to apply.
- Note also that you should be willing to have your real name be associated with the datasets, to follow archival best practices (see guidelines on archival best practices for DataRefuge for more information).
- The organizers of the event (in-person or remote) will send you an invite to the Archivers app, which helps us coordinate all the data archiving work we do.
- Click the invite link, and choose a user name and a password.
- Create an account on the DataRefuge Slack using this slack-in (or use the Slack team recommended by your event organizers). This is where people share expertise and answer each other's questions.
- Ask your event organizer to send you an invite.
- The organizers will also create an account for you in the datarefuge.org CKAN instance.
- Test that you can log in successfully.
- Get set up with Python and the
bagit-python
script to make a bag at the command line - If you need any assistance:
- Talk to your DataRescue guide if you are at an in-person event.
- Or post questions in the DataRefuge Slack
#describers
channel (or other channel recommended by your event organizers).
Claiming a Bag
- You will work on datasets that were bagged by Baggers.
- Go to the Archivers app, click
URLS
and thenDESCRIBE
: all the URLs listed are ready to be added to the CKAN instance.- Available URLs are ones that have not been checked out by someone else, i.e. that do not have someone's name in the User column.
- Select an available URL and click its UUID to get to the detailed view, then click
Checkout this URL
. It is now ready for you to work on, and no one else can do anything to it while you have it checked out.
Note: URL vs UUID
The
The
URL
is the link to examine and harvest, and the UUID
is a canonical ID we use to connect the URL with the data in question. The UUID will have been generated earlier in the process. UUID stands for Universal Unique Identifier.
QA Step
- In the Archivers app, scroll down to the
Describe
section. - The URL of the zipped bag is in the
Bag Url / Location
field. - Cut and paste that URL into your browser and download it.
- After downloading, unzip it.
- Spot-check some of the files (make sure they open and look normal, i.e., not garbled).
- If the file fails QA:
- Uncheck the Bagging checkbox.
- Make a note in the
Notes From Bagging
field, explaining in what way the bag failed QA and asking a bagger to please fix the issue.
Create New Record in CKAN
- Go to CKAN and click Organizations in the top menu.
- Choose the organization (i.e., federal agency) that your dataset belongs to, e.g.
NOAA
, and click it.- If the Organization you need does not exist yet, create it by clicking
Add Organization
.
- If the Organization you need does not exist yet, create it by clicking
- Click "Add Dataset".
- Start entering metadata in the new record, following the metadata template below:
- Title: Title of dataset, e.g., "Form EIA-411 Data".
- Custom Text: DO NOT Fill OUT (this field does not function properly at this time)
- Description: Usually copied and pasted description found on webpage.
- Tags: Basic descriptive keywords, e.g., "electric reliability", "electricity", "power systems".
- License: Choose value in dropdown. If there is no indicated license, select "Other - Public Domain".
- Organization: Choose value in dropdown, e.g., "United States Department of Energy".
- Visibility: Select "Public".
- Source: URL where site is live, also in JSON, e.g. "http://www.eia.gov/electricity/data/eia411/".
- To decide what value to enter in each field:
- Open the JSON file that is in the bag you have downloaded; it contains some of the metadata you need.
- Go to the original location of the item on the federal agency website (found in the JSON file), to find more facts about the item such as description, title of the dataset, etc.
- Alternatively, you can also open the HTML file that should be included in the bag and is a copy of that original main page.
Enhancing Existing Metadata
These sites have federally-sourced metadata that can be added to the CKAN record for more accurate metadata:
- EPA:
These sites are sources of scientific metadata standards to review when choosing keywords:
- GCMD Keywords, downloadable CSV files of the GCMD taxonomies:
- ATRAC, a free tool for accessing geographic metadata standards including auto-populating thesauri (GCMD and others commonly used with climate data):
Linking the CKAN Record to the Bag
- Click "Next: Add Data" at the bottom of the CKAN form.
- Enter the following information:
- Link: Bag URL, e.g.,
https://drp-upload-bagger.s3.amazonaws.com/remote/77DD634E-EBCE-412E-88B5-A02B0EF12AF6_2.zip
. - Name: filename, e.g.,
77DD634E-EBCE-412E-88B5-A02B0EF12AF6_2.zip
. - Format: select "Zip".
- Link: Bag URL, e.g.,
- Click "Finish".
- Test that the link you just created works by clicking it, and verifying that the file begins to download.
- Note that you don't need to finish downloading it again.
- Alternatively, use WGET to test without downloading:
wget --spider [BAG URL]
Adding the CKAN record to the "Data Rescue Events" group
- Once the record is created, click the tab
Groups
- Select
Data Rescue Events
in the dropdown and clickAdd to Group
. - In the future, it will be useful to be able to differentiate that among different groups of records based on how they were generated.
Finishing Up
- In the Archivers app, add the URL to the CKAN record in the
CKAN URL
field.- The syntax will be:
https://www.datarefuge.org//dataset/[datasetNameGeneratedByCkan]
- The syntax will be:
- Add any useful notes to document your work.
- Check the Describe checkbox (far right on the same line as the "Describe" section heading) to mark that step as completed.
- Click
Save
. - Click
Checkin this URL
, to release it.