Open energy system databases

Open energy system database projects employ open data methods to collect, clean, and republish energy-related datasets for open use. The resulting information is then available, given a suitable open license, for statistical analysis and for building numerical energy system models, including open energy system models. Permissive licenses like Creative Commons CC0 and CC BY are preferred, but some projects will house data made public under market transparency regulations and carrying unqualified copyright.

The databases themselves may furnish information on national power plant fleets, renewable generation assets, transmission networks, time series for electricity loads, dispatch, spot prices, and cross-border trades, weather information, and similar. They may also offer other energy statistics including fossil fuel imports and exports, gas, oil, and coal prices, emissions certificate prices, and information on energy efficiency costs and benefits.

Much of the data is sourced from official or semi-official agencies, including national statistics offices, transmission system operators, and electricity market operators. Data is also crowdsourced using public wikis and public upload facilities. Projects usually also maintain a strict record of the provenance and version histories of the datasets they hold. Some projects, as part of their mandate, also try to persuade primary data providers to release their data under more liberal licensing conditions.

Two drivers favor the establishment of such databases. The first is a wish to reduce the duplication of effort that accompanies each new analytical project as it assembles and processes the data that it needs from primary sources. And the second is an increasing desire to make public policy energy models more transparent to improve their acceptance by policymakers and the public. Better transparency dictates the use of open information, able to be accessed and scrutinized by third-parties, in addition to releasing the source code for the models in question.

Background
In the mid-1990s, energy models used structured text files for data interchange but efforts were being made to migrate to relational database management systems for data processing. These early efforts however remained local to a project and did not involve online publishing or open data principles.

The first energy information portal to go live was OpenEI in late 2009, followed by reegle in 2011.

A 2012 paper marks the first scientific publication to advocate the crowdsourcing of energy data. The 2012 PhD thesis by Chris Davis also discusses the crowdsourcing of energy data in some depth. A 2016 thesis surveyed the spatial (GIS) information requirements for energy planning and finds that most types of data, with the exception of energy expenditure data, are available but nonetheless remain scattered and poorly coordinated.

In terms of open data, a 2017 paper concludes that energy research has lagged behind other fields, most notably physics, biotechnology, and medicine. The paper also lists the benefits of open data and open models and discusses the reasons that many projects nonetheless remain closed. A one-page opinion piece from 2017 advances the case for using open energy data and modeling to build public trust in policy analysis. The article also argues that scientific journals have a responsibility to require that data and code be submitted alongside text for peer review.

Database design
Data models are central to the design and organization of databases. Open energy database projects generally try to develop and adhere to well resolved data models, using defacto and published standards where applicable. Some projects attempt to coordinate their data models in order to harmonize their data and improve its utility. Defining and maintaining suitable metadata is also a key issue. The life-cycle management of data includes, but is not limited to, the use of version control to track the provenance of incoming and cleansed data. Some sites allow users to comment on and rate individual datasets.

Dataset copyright and database rights
Issues surrounding copyright remain at the forefront with regard to open energy data. As noted, most energy datasets are collated and published by official or semi-official sources. But many of the publicly available energy datasets carry no license, limiting their reuse in numerical and statistical models, open or otherwise. Copyright protected material cannot lawfully be circulated, nor can it be modified and republished.

Measures to enforce market transparency have not helped much because the associated information is again not licensed to enable modification and republication. Transparency measures include the 2013 European energy market transparency regulation 543/2013. Indeed, 543/2013 "is only an obligation to publish, not an obligation to license". Notwithstanding, 543/2013 does enable downloaded data to be computer processed with legal certainty.

Energy databases with hardware located with the European Union are protected under a general database law, irrespective of the legal status of the information they hold. Database rights not waived by public sector providers significantly restrict the amount of data a user can lawfully access.

A December 2017 submission by energy researchers in Germany and elsewhere highlighted a number of concerns over the re-use of public sector information within the Europe Union. The submission drew heavily on a recent legal opinion covering electricity data.

Energy statistics
National and international energy statistics are published regularly by governments and international agencies, such as the IEA. In 2016 the United Nations issued guidelines for energy statistics. While the definitions and sectoral breakdowns are useful when defining models, the information provided is rarely in sufficient detail to enable its use in high-resolution energy system models.

Published standards
There are few published standards covering the collection and structuring of high-resolution energy system data. The IEC Common Information Model (CIM) defines data exchange protocols for low and high voltage electricity networks.

Non-open data
Although this page is about genuinely open data, some important databases remain closed.

Data collected by the International Energy Agency (IEA) is widely quoted in policy studies but remains nonetheless paywalled. Researchers at Oxford University have called for this situation to change.

Open energy system database projects
Energy system models are data intensive and normally require detailed information from a number of sources. Dedicated projects to collect, collate, document, and republish energy system datasets have arisen to service this need. Most database projects prefer open data, issued under free licenses, but some will accept datasets with proprietary licenses in the absence of other options.

The OpenStreetMap project, which uses the Open Database License (ODbL), contains geographic information about energy system components, including transmission lines. Wikimedia projects such as Wikidata and Wikipedia have a growing set of information related to national energy systems, such as descriptions of individual power stations.

The following table summarizes projects that specifically publish open energy system data. Some are general repositories while others (for instance, oedb) are designed to interact with open energy system models in real-time.

Three of the projects listed work with linked open data (LOD), a method of publishing structured data on the web so that it can be networked and subject to semantic queries. The overarching concept is termed the semantic web. Technically, such projects support RESTful APIs, RDF, and the SPARQL query language. A 2012 paper reviews the use of LOD in the renewable energy domain.

Climate Compatible Growth starter datasets
The Climate Compatible Growth (CCG) programme provides starter kits for the following 69countries: Algeria, Angola, Argentina, Benin, Botswana, Bolivia, Brazil, Burkina Faso, Burundi, Cambodia, Cameroon, Central African Republic, Chad, Chile, Colombia, Côte d'Ivoire, Democratic Republic of Congo, Djibouti, Ecuador, Egypt, Equatorial Guinea, Eritrea, Eswatini, Ethiopia, Gabon, Gambia, Ghana, Guinea, Guinea-Bissau, Indonesia, Kenya, Laos, Lesotho, Liberia, Libya, Malawi, Malaysia, Mali, Mauritania, Morocco, Mozambique, Myanmar, Namibia, Niger, Nigeria, Papua New Guinea, Paraguay, Peru, Philippines, Republic of Congo, Republic of Korea, Rwanda, Senegal, Sierra Leone, Somalia, South Africa, South Sudan, Sudan, Taiwan, Tanzania, Thailand, Togo, Tunisia, Uganda, Uruguay, Venezuela, VietNam, Zambia, and Zimbabwe.

The datasets are hosted on the Zenodo science archive site, visit that site and search for "ccg starter kit".

Energy Research Data Portal for South Africa
The Energy Research Data Portal for South Africa is being developed by the Energy Research Centre, University of Cape Town, Cape Town, South Africa. Coverage includes South Africa and certain other African countries where the Centre undertakes projects. The website uses the CKAN open source data portal software. A number of data formats are supported, including CSV and XLSX. The site also offers an API for automated downloads. , the portal contained 65datasets.

energydata.info
The energydata.info project from the World Bank Group, Washington, DC, USA is an energy database portal designed to support national development by improving public access to energy information. As well as sharing data, the platform also offers tools to visualize and analyze energy data. Although the World Bank Group has made available a number of dataset and apps, external users and organizations are encouraged to contribute. The concepts of open data and open source development are central to the project. energydata.info uses its own fork of the CKAN open source data portal as its web-based platform. The Creative Commons CC BY 4.0 license is preferred for data but other open licenses can be deployed. Users are also bound by the terms of use for the site.

, the database held 131datasets, the great majority related to developing countries. The datasets are tagged and can be easily filtered. A number of download formats, including GIS files, are supported: CSV, XLS, XLSX, ArcGIS, Esri, GeoJSON, KML, and SHP. Some datasets are also offered as HTML. Again,, four apps are available. Some are web-based and run from a browser.

Enipedia
The semantic wiki-site and database Enipedia lists energy systems data worldwide. Enipedia is maintained by the Energy and Industry Group, Faculty of Technology, Policy and Management, Delft University of Technology, Delft, the Netherlands. A key tenet of Enipedia is that data displayed on the wiki is not trapped within the wiki, but can be extracted via SPARQL queries and used to populate new tools. Any programming environment that can download content from a URL can be used to obtain data. Enipedia went live in March 2011, judging by traffic figures quoted by Davis.

A 2010 study describes how community driven data collection, processing, curation, and sharing is revolutionizing the data needs of industrial ecology and energy system analysis. A 2012 chapter introduces a system of systems engineering (SoSE) perspective and outlines how agent-based models and crowdsourced data can contribute to the solving of global issues.

, the site has gone offline pending a move to the enipedia.org domain.

Open Energy Platform
The Open Energy Platform (OEP) is a collaborative versioned dataset repository for storing open energy system model datasets. A dataset is presumed to be in the form of a database table, together with metadata. Registered users can upload and download datasets manually using a web-interface or programmatically via an API using HTTP POST calls. Uploaded datasets are screened for integrity using deterministic rules and then subject to confirmation by a moderator. The use of versioning means that any prior state of the database can be accessed (as recommended in this 2012 paper). Hence, the repository is specifically designed to interoperate with energy system models. The backend is a PostgreSQL object-relational database under subversion version control. Open-data licenses are specific to each dataset. Unlike other database projects, users can download the current version (the public tables) of the entire PostgreSQL database or any previous version. The development is being led by a cross-project community.

Open Data Energy Networks
The Open Data Energy Networks (Open Data Réseaux Énergies or ODRÉ) portal is run by eight partners, led by the French national transmission system operator (TSO) Réseau de Transport d'Électricité (RTE). The portal was previously known as Open Data RTE. The site offers electricity system datasets under a Creative Commons CC BY 2.0 compatible license, with metadata, an RSS feed for notifying updates, and an interface for submitting questions. Re-users of information obtained from the site can also register third-party URLs (be they publications or webpages) against specific datasets.

The portal uses the French Government Licence Ouverte license and this is explicitly compatible with the United Kingdom Open Government Licence (OGL), the Creative Commons CC BY 2.0 license (and thereby later versions), and the Open Data Commons ODC-BY license.

The site hosts electricity, gas, and weather information related to France.

UK Power Networks Open Data Portal
The Open Data Portal is run by UK Power Networks, a GB Distribution Network Operator (DNO), hosted on the OpenDataSoft platform. The Portal offers electricity network datasets under a Creative Commons CC BY 4.0 compatible license, with metadata, a newsfeed, and a data request form. Re-users of information obtained from the site can also register third-party URLs (be they publications or webpages) against specific datasets. A number of download formats, including GIS files, are supported: CSV, XLS, GeoJSON, KML, and SHP. The site also offers an API for automated downloads.

The portal uses the Creative Commons License and also hosts datasets from other sources which are licensed under the Open Government Licence (OGL).

The site hosts electricity datasets related to UK Power Networks' three license areas in London, the East and South East of England.

Open Power System Data


The Open Power System Data (OPSD) project seeks to characterize the German and western European power plant fleets, their associated transmission network, and related information and to make that data available to energy modelers and analysts. The platform was originally implemented by the University of Flensburg, DIW Berlin, the Technical University of Berlin, and the energy economics consultancy Neon Neue Energieökonomik, all from Germany. The first phase of the project, from August 2015 to July 2017, was funded by the Federal Ministry for Economic Affairs and Energy (BMWi) for $1$. The project later received funding for a second phase, from January 2018 to December 2020, with ETH Zurich replacing Flensburg University as a partner.

Developers collate and harmonize data from a range of government, regulatory, and industry sources throughout Europe. The website and the metadata utilize English, whereas the original material can be in any one of 24languages. Datasets follow the emerging frictionless data package standard being developed by Open Knowledge Foundation (OKF). The website was launched on 28October 2016. , the project offers the following primary packages, for Germany and other European countries:


 * details, including geolocation, of conventional power plants and renewable energy power plants
 * aggregated generation capacity by technology and country
 * hourly time series covering electrical load, day-ahead electricity spot prices, and wind and solar resources
 * a script to filter and download NASA MERRA-2 satellite weather data

In addition, the project hosts selected contributed packages:


 * electricity demand and self-generation time series for representative south German households
 * simulated PV and wind generation capacity factor time series for Europe, generated by the Renewables.ninja project

To facilitate analysis, the data is aggregated into large structured files (in CSV format) and loaded into data packages with standardized machine-readable metadata (in JSON format). The same data is usually also provided as XLSX (Excel) and SQLite files. The datasets can be accessed in real-time using stable URLs. The Python scripts deployed for data processing are available on GitHub and carry an MIT license. The licensing conditions for the data itself depends on the source and varies in terms of openness. Previous versions of the datasets and scripts can be recovered in order to track changes or replicate earlier studies. The project also engages with energy data providers, such as transmission system operators (TSO) and ENTSO-E, to encourage them to make their data available under open licenses (for instance, Creative Commons and ODbL licenses).

In a 2019 publication, OPSD developers describe their design choices, implementation, and provisioning. Information integrity remains key, with each data package having traceable provenance, curation, and packing. From October 2018, each new or revised data package is assigned a unique DOI to ensure that external references to current and prior versions remain stable.

A number of published electricity market modeling analyses are based on OPSD data.

In 2017, the Open Power System Data project won the Schleswig-Holstein Open Science Award and the Germany Land of Ideas award.

OpenEI
Open Energy Information (OpenEI) is a collaborative website, run by the US government, providing open energy data to software developers, analysts, users, consumers, and policymakers. The platform is sponsored by the United States Department of Energy (DOE) and is being developed by the National Renewable Energy Laboratory (NREL). OpenEI launched on 9December 2009. While much of its data is from US government sources, the platform is intended to be open and global in scope.

OpenEI provides two mechanisms for contributing structured information: a semantic wiki (using MediaWiki and the Semantic MediaWiki extension) for collaboratively-managed resources and a dataset upload facility for contributor-controlled resources. US government data is distributed under a CC0 public domain dedication, whereas other contributors are free to select an open data license of their choice. Users can rate data using a five-star system, based on accessibility, adaptability, usefulness, and general quality. Individual datasets can be manually downloaded in an appropriate format, often as CSV files. Scripts for processing data can also be shared through the site. In order to build a community around the platform, a number of forums are offered covering energy system data and related topics.

Most of the data on OpenEI is exposed as linked open data (LOD) (described elsewhere on this page). OpenEI also uses LOD methods to populate its definitions throughout the wiki with real-time connections to DBPedia, reegle, and Wikipedia.

OpenEI has been used to classify geothermal resources in the United States. And to publicize municipal utility rates, again within the US.

OpenGridMap
OpenGridMap employs crowdsourcing techniques to gather detailed data on electricity network components and then infer a realistic network structure using methods from statistics and graph theory. The scope of the project is worldwide and both distribution and transmission networks can be reverse engineered. The project is managed by the Chair of Business Information Systems, TUM Department of Informatics, Technical University of Munich, Munich, Germany. The project maintains a website and a Facebook page and provides an Android mobile app to help the public document electrical devices, such as transformers and substations. The bulk of the data is being made available under a Creative Commons CC BY 3.0 IGO license. The processing software is written primarily in Python and MATLAB and is hosted on GitHub.

OpenGridMap provides a tailored GIS web application, layered on OpenStreetMap, which contributors can use to upload and edit information directly. The same database automatically stores field recordings submitted by the mobile app. Subsequent classification by experts allows normal citizens to document and photograph electrical components and have them correctly identified. The project is experimenting with the use of hobby drones to obtain better information on associated facilities, such as photovoltaic installations. Transmission line data is also sourced from and shared with OpenStreetMap. Each component record is verified by a moderator.

Once sufficient data is available, the transnet software is run to produce a likely network, using statistical correlation, Voronoi partitioning, and minimum spanning tree (MST) algorithms. The resulting network can be exported in CSV (separate files for nodes and lines), XML, and CIM formats. CIM models are well suited for translation into software-specific data formats for further analysis, including power grid simulation. Transnet also displays descriptive statistics about the resulting network for visual confirmation.

The project is motivated by the need to provide datasets for high-resolution energy system models, so that energy system transitions (like the German Energiewende) can be better managed, both technically and policy-wise. The rapid expansion of renewable generation and the anticipated uptake of electric vehicles means that electricity system models must increasingly represent distribution and transmission networks in some detail.

, OpenGridMap techniques have been used to estimate the low voltage network in the German city of Garching and to estimate the high voltage grids in several other countries.

Power Explorer
The Power Explorer portal is a part of the larger Resource Watch platform, hosted by the World Resources Institute. The initial Global Power Plant Database, an open source database of the power plants globally, was released in April 2018. , the portal itself is still under development.

Power Explorer is also supported by Google with various research partners, including KTH, Global Energy Observatory, Enipedia, and OPSD.

PowerGenome
The PowerGenome project aims to provide a coherent dataset covering the United States electricity system. PowerGenome was initially designed to service the GenX model, but support for other modeling frameworks is in planning. The PowerGenome utility also pulls from upstream datasets hosted by the Public Utility Data Liberation project (PUDL) and the EIA, so those dependencies need to be met by users. Datasets are occasionally archived on Zenodo. A video describing the project is available.

reegle
reegle is a clean energy information portal covering renewable energy, energy efficiency, and climate compatible development topics. reegle was launched in 2006 by REEEP and REN21 with funding from the Dutch (VROM), German (BMU), and UK (Defra) environment ministries. Originally released as a specialized internet search engine, reegle was relaunched in 2011 as an information portal.

reegle offers and utilizes linked open data (LOD) (described elsewhere on this page). Sources of data include UN and World Bank databases, as well as dedicated partners around the world. reegle maintains a comprehensive structured glossary (driven by an LOD-compliant thesaurus) of energy and climate compatible development terms to assist with the tagging of datasets. The glossary also facilitates intelligent web searches.

reegle offers country profiles which collate and display energy data on a per-country basis for most of the world. These profiles are kept current automatically using LOD techniques. As of 2021, the portal is no longer active.

Renewables.ninja
Renewables.ninja is a website that can calculate the hourly power output from solar photovoltaic installations and wind farms located anywhere in the world. The website is a joint project between the Department of Environmental Systems Science, ETH Zurich, Zürich, Switzerland and the Centre for Environmental Policy, Imperial College London, London, United Kingdom. The website went live during September 2016. The resulting time series are provided under a Creative Commons CC BY-NC 4.0 license (which is unfortunately not open data conformant) and the underlying power plant models are published using a BSD-new license. , only the solar model, written in Python, has been released.



The project relies on weather data derived from meteorological reanalysis models and weather satellite images. More specifically, it uses the 2016 MERRA-2 reanalysis dataset from NASA and satellite images from CM-SAF SARAH. For locations in Europe, this weather data is further "corrected" by country so that it better fits with the output from known PV installations and windfarms. Two 2016 papers describe the methods used in detail in relation to Europe. The first covers the calculation of PV power. And the second covers the calculation of wind power.

The website displays an interactive world map to aid the selection of a site. Users can then choose a plant type and enter some technical characteristics. , only year 2014 data can be served, due to technical restrictions. The results are automatically plotted and are available for download in hourly CSV format with or without the associated weather information. The site offers an API for programmatic dataset recovery using token-based authorization. Examples deploying cURL and Python are provided.

A number of studies have been undertaking using the power production datasets underpinning the website (these studies predate the launch of the website), with the bulk focusing on energy options for Great Britain.

SMARD


The SMARD site (pronounced "smart") serves electricity market data from Germany, Austria, and Luxembourg and also provides visual information. The electricity market plots and their underlying time series are released under a permissive CC BY 4.0 license. The site itself was launched on 3July 2017 in German and an English translation followed shortly. The data portal is mandated under the German Energy Industry Act (Energiewirtschaftsgesetz or EnWG) section §111d, introduced as an amendment on 13October 2016. Four table formats are offered: CSV, XLS, XML, and PDF. The maximum sampling resolution is $€490,000$. Market data visuals or plots can be downloaded in PDF, SVG, PNG, and JPG formats. Representative output is shown in the thumbnail (on the left), in this case mid-winter dispatch over two days for the whole of Germany. The horizontal ordering by generation type is first split into renewable and conventional generation and then based on merit. Auser guide is updated as required.

Further information

 * Open energy data wiki maintained by the Open Energy Modelling Initiative
 * The list is under a Creative Commons CC‑BY‑4.0 license and many of the datasets cited are similarly licensed.