Module 3.5: File Formats for Open Civic Data
Last updated
Last updated
This module introduces the concepts of open and machine-readable file format. We build an understanding of the importance of using open and machine-readable formats when sharing civic data and the benefits to data users.
What are open file formats?
What are machine-readable file formats?
How do these formats support use?
What are examples of open file formats?
How can we make data available?
The librarian is working with their local open data portal to make the Wi-Fi data available to the public. When the IT department provided the data to the librarian, it was in an Excel spreadsheet format. The manager of the open data portal explains that open file formats are preferred and requests that the librarian make the data available in the portal as a .csv file.
Overview: Open file formats are those file formats with published and open documentation. This open documentation allows for developers to create software that can read the file format. This means that the file format is not reliant on a single or limited set of software programs to read it.
In this exercise, we will examine the digital files on one of our own devices – a laptop, phone, tablet, desktop. We will consider the file formats that we create and interact with and how they may support or impede access and reuse.
This activity can be done individually, with time to debrief and share after completion.
Supplies:
Device with files
Pen or pencil
Paper for note taking
Time: 30 minutes
Activity:
Access your files
For this exercise, you will need access to a device with files that you have created and downloaded. To narrow your focus, you can choose a folder or subset of your files on your device. You will want to be able to identify the file format of your files, using the file extension (e.g. .csv, docx, .mp4).
2. Inventory your files
Inventory the file formats on your device or in the selected folder on your device, noting:
The file extension (set of characters at the end of the filename that support us in recognizing the file format)
Which file formats you recognize and which are unfamiliar to you
The program that you would use to interact with the file
3. Classify your files
Compare your file format inventory to the list of open file formats on Wikipedia.
This comparison will support you in identifying which file formats you interact with are “open” and which we would characterize as proprietary or restricted in nature.
Notate open versus proprietary versus uncertain with a symbol (e.g. star for open format).
4. Reflect on your inventory
Consider: If you were to provide a hard drive with your files to a friend or colleague, would they have any difficulty accessing the file formats because of software dependencies?
5. Reconvene (if working in group setting)
If completing this activity in a group setting, reconvene together. Discuss what you observed about the nature of the file formats you interact with.
Overview: In this exercise, we will examine the file formats found through data portals. This activity will build our understanding of the standards used for sharing open civic data and the expectations for file formats that data portal managers communicate to data publishers. This activity can be done individually or in pairs.
Supplies: Computer or Internet-connected device Pen or pencil Paper for notetaking
Time: 20-30 minutes
Activity:
Select Data Portal
Individually or in pairs, locate an open data portal from your local city, region, state, or another location of interest to you. DataPortals.org, a searchable catalog of open data portals, can be a helpful starting point for locating a data portal for this exercise. You can return to a data portal that you’ve explored for another module activity.
2. Examine three open civic datasets
Browse the data portal and locate three datasets and the accompanying data files. Inventory the data formats associated with the data files, using the file extensions as an indicator.
Are the formats open or proprietary file formats? If you’re uncertain, use the Wikipedia list of open data formats and the format tables available on the FOSS Open Standards/Comparison of File Formats resource
Are the same data files available in more than one format (e.g. both a csv file and an Excel (.xls or xlsx) file format)?
3. Explore the data portal guidance
Now, look to see if the data portal has guidance for data publishers on preparing data for sharing. On some portals, this guidance may be under an “About” tab. Consider:
Is there guidance about preferred file formats for data sharing? If so, what are the preferred formats?
Are these recommendations reflected in the data files you reviewed in the portal?
4. Debrief [if working in a group setting]
If completing this activity in a group environment, reconvene to share observations about file formats used to share open civic data in portals. What file formats did your group observe? Did the data portals have guidance for data publishers regarding file formats? If so, what were the recommended file formats?
European Data Portal Module: Choosing the Right File Format for Open Data The European Data Portal provides access to data shared in EU-based open data portals. To support open data initiatives, the European Data Portal created a series of modules on open data, including a module on choosing file formats. The module creators emphasize the value of the csv file format for tabular data.