iTEM Open Transport Data

iTEM is working to create a common, public, “best available” database of historical transport statistics for baseline calibration of models, through a transparent, scientific process.

Project overview

Problem & consequences

Historical data for key transport quantities differ across global transport energy models. These differences arise from inconsistency in:

  • the precise definitions of the concepts that are measured,
  • sources of ‘raw’, derived, and pre-processed data, and
  • processes (methods and assumptions) used to clean and harmonize data.

Differences in base-year and historical data, in turn, complicate the interpretation of models' future projections, since differences in projections could be attributed to genuine uncertainty in the state or fundamental trend of the global transport system; modeling methods; or simply different sources of historical data. The costly process of disentangling these factors stands in the way of generating useful, model-based knowledge about future challenges and policy options for transport system transitions.

Goals

The iTEM Open Data project aims to establish a process by which the transport research community will main a high-quality database of historical transport statistics.

The database itself will:

  • Collate publicly-available historical transport data.
  • Be available online, free of cost.
  • Be packaged with metadata and information about concepts, upstream sources, and processing steps applied.

The process for producing the database will allow any member of the community to:

  • Scrutinize the methods and code used to clean and harmonize data.
  • Reproduce and modify the data production process.
  • Contribute or dicuss changes and additional upstream sources.

Through both, iTEM aims to employ and advance best practices in the collaborative development of open source scientific software, to enable frequent, regular updates that improve the quality of the data.

Current data and process

The transportenergy/database GitHub repository contains the code used to produce the database and is the focal point for development.

Documentation of the database is automatically generated from the GitHub repository. It is updated in parallel with the code, and includes a complete description of the upstream data sources, data sets, and cleaning procedure.

All information below is repeated from the documentation.

Sources

As of November 2020, eleven upstream data sets are handled by the processing code, including:

  • Four sets of data on passenger transport,
  • Five sets of data on freight transport,
  • Two sets of data on socio-demographic, population, and economic quantities.

The upstream sources providing these data sets include:

Use and cite

The 2020-04-15 version of the database is attached to the Zenodo record for DOI 10.5281/zenodo.4287423.

More recent versions are generated automatically with every code change, and published with corresponding GitHub release.

The data is licensed under the Creative Commons Attribution 4.0 International license. The code used to produce the database is licensed under the GNU General Public License, version 3.

If you use or reference the data, or use the code, in preparation of any scientific publication, please cite the appropriate reference. See the the documentation for detailed instructions. Use the citations in BibTeX and other formats provided by Zenodo for every record:

@dataset{item-open-data,
  author = {Linero, Humberto and Yeh, Sonia and Kishimoto, Paul Natsuo and Cazzola, Pierpaolo and Fulton, Lewis and McCollum, David and Miller, Joshua and Kyle, G.Page},
  doi = {10.5281/zenodo.4287423},
  month = {10},
  publisher = {Zenodo},
  title = {{The International Transport Energy Modeling (iTEM) Open Data \& Harmonized Transport Database}},
  version = {2020-04-15},
  year = {2020},
}

Contribute

Development of the data cleaning code, and thus the data it produces, uses the standard GitHub workflow and best practices in testing and continuous integration. View and open issues in the GitHub repository to:

  • Report problems with the code or data,
  • Suggest improvements to the cleaning methods and code, or
  • Nominate new data sources or sets to incorporate.

To contribute code directly, open a pull request.

For questions about the data, documentation, or iTEM in general, e-mail: mail@transportenergy.org