The Photovoltaic field array (PVDAQ) data is composed of time-series, raw performance data taken through a variety of sensors connected to a PV array. The data is typically taken at 15 minute averaged resolution, but can vary between systems. NREL source data is typically aggregated into the main database every 24 hours. Data is then processed to the NREL PVDAQ data lake on a monthly basis.
Some datasets have been acquired through previous research agreements with site owners, and with their permission, have now been made public. Those datasets are static and do not show any additional data increments.
Our researchers utilize the data to monitor the durability of PV systems under a wide variety of conditions. Similar data within NREL archives also provides insites into experimental emerging technology systems. Addtionally, the data has proven useful in assisting in the development of data quality assurance software, and data analysis and machine learning tools.
All Data for PVDAQ and DOE Solar Data Prize is covered under the DOI:10.25984/1846021
The American-Made Solar Data Bounty Prize was open to U.S.-based PV system owners and entities authorized to share data from PV systems. These owners were invited to submit at least five years of historical time series data at a minimum of 15-minute time resolution for one or two of their systems. Datasets collected through this prize are meant to assist commercial and academic research and development efforts seeking to improve the accuracy of PV system modeling, and thus lower the risk associated with developing and operating those assets.
The Data Prize entries were submitted in one of two categories: systems < 5 MW DC capacity, and those >5 MW DC capacity. The data from the submissions are available to the public for download as part of the PVDAQ Data repository. The following are the system IDs of the winners, in numerical order, not placement by award.
- 2105 - A 237 kW multi building roof top deployment with highly variable mount orientations in Hawaii
- 2107 - A 893 kW Fixed ground-mount facility in a highly active agricultural area in California
- 9068 - A 4.7 MW Single-axis tracked facility in Colorado
- 7333 - A 257 MW Single-axis tracker facility in California. This dataset is at a very high time resolution of 10s for all channels.
- 9069 - A 38.7 MW Fixed ground-mount facility in Georgia
These datasets differ from the regular PVDAQ repository storage architecture (See below) where data is broken down by year, month, and day. In each of the prize repositories the available metadata, any support files, and the entire dataset as was submittied and curated is available. Some of these the datasets are broken down by sensor channel set type, and in others the data is labeled by sensor channel tag names or bundles.
Note: Some of the prize datasets are extremely large and can have 10s of GBs of data. These could take a long time to download so please plan accordingly
The PVDAQ data is partitioned by system_id, year, month and day. Raw data is reported at 15 minute increments in ISO 8601 date and time. The timestamp is striped and data is averaged daily. An example file output is included here.
- pvdaq_inverters - metadata about the inverter hardware on the system
- pvdaq_meters - metadata about the meter hardware on the system
- pvdaq_metrics - metadata about the sensor values captured as part of the PV time-series
- pvdaq_mount - mounting configuration of the array or subsets of the array
- pvdaq_other_instruments - metadata about other ancillary equipment fielded on the system
- pvdaq_site - geo location details of a PV array
- pvdaq_system - basic details about a PV array
- pvdaq_pvdata - PV time series data.
- inverter_id (string) - database primary key
- name (string) - alias given to the inverter by the array owner or autogenerated
- manufacturer (string)
- model (string)
- serial_num (string)
- num_strings (string)- how many strings are tied to the inverter
- modules_per_string (string)- how many modules are tied to each string
- type (string)- indicates type of inverter such as micro, string, central, etc.
- quantity (string)- number of inverters fielded at the array site
- time_interval (string)- Is the data left(L), center(C), or right(R) aligned during the acquisition interval
- site_id (string) - associated site
- system_id (bigint) - associated system
- comments (string)- any additional details
- meter_id (string)- primary key of the meter
- name (string) - alias given to the meter by the array owner or autogenerated
- manufacturer (string)
- model (string)
- serial_num (string)
- time_interval (string)- Is the data left(L), center(C), or right(R) aligned during the acquisition interval
- type (string)- is the type of the meter production, site or revenue
- site_id (string) - associated site
- system_id (bigint) - associated system
- comments (string)- any additional details
- system_id (int) - associated system for the metric
- metric_id (int) - primary key of the metric
- sensor_name (string) - referenced name produced by the instrumentation or tagged by array owner
- common_name (string) - a general grouping of sensor types (e.g. DC voltage, AC energy, POA irradiance)
- raw_units (string) - raw unscaled or uncalibrated units of the values produced by the sensor
- units (string) - units of the values produced by the sensor. Could be modified raw_units by calc_scale and calc_offset.
- calc_scale (double) - scaling for adjusting the sensor values (default 1)
- calc_offset (double) - offset for adjusting the sensor values (default 0)
- calc_details (string) - mathematical equation used to calculate the sensor value, if needed.
- aggregation_type (string) - avg, min, max, sample, union, median, or calculated
- source_type (string) - What is generating the sensor value (Inverters, meters or other instruments). Can be NULL
- source_id (int) - The assicated primary key of the senor type generating the value. Can be NULL
- comments (string) - any additional details
- standard_name (string)- a unique autogenerated name based on either the primary key and sensor_name or a combination of common_name, sensor_type, and sensor_id
- module_id (string)- the module primary key
- name (string) - alias given to the module by the array owner or autogenerated
- inverter_id (string)- the associate inverter primary key tied to this module, if known.
- manufacturer (string)
- model (string)
- serial_num (string)
- type (string)- what is the technology of the module: CdTe, Crystalline Si, multicrystalline Si, etc.
- quantity (string) - number of modules installed on system
- reference_module (string)- is this a reference module
- start_on (string) - date module was installed
- end_on (string) - date module was removed
- site_id (string) - associated site
- system_id (bigint) - associated system
- comments (string)- any additional details
- mount_id (bigint) - the primary key for the mount
- name (string) - alias given to the mount by the array owner or autogenerated
- manufacturer (string)
- model (string)
- azimuth (string)- pointing of the mount in compass direction decimal degrees. 0 degrees = north, 90 degrees = east
- tilt (string) - angle of mount pointing in degrees
- tracking (string)- is the mount tracking or fixed
- type (string)- configuration of the mount: ground, roof, canopy, etc.
- site_id (string) - associated site
- system_id (bigint) - associated system
- comments (string)- any additional details
- instrument_id (string) - the primary key of the instrument
- name (string) - alias given to the other instrument by the array owner or autogenerated
- manufacturer (string)
- model (string)
- serial_num (string)
- time_interval (string)- Is the data left(L), center(C), or right(R) aligned during the acquisition interval
- type (string) - identifies what the instrument is: ref cell, weather station, thermocouple, etc.
- site_id (string) - associated site
- system_id (bigint) - associated system
- comments (string)- any additional details
- site_id (string) - primary key of the site
- system_id (bigint) - associated system
- public_name (string) - unique given name to the site
- location (string) - text descriptive name of site location. Could include street address type details
- latitude (string) - decimal latitude geo location
- longitude (string) - decimal longitude geo location
- elevation (string) - distance in meters above sea level, if known
- av_pressure (string) - average annual atmospheric pressure at site in psi
- av_temp (string)- average ambient temperature in degrees Celsius at site
- climate_type (string) - The Koppen-Geiger classifier for the site location
- system_id (bigint) - primary key of the system
- site_id (bigint)- associated site representing geolocation details for system
- public_name (string)- unique name given to the array
- area (string)- covered area of the array in square meters
- power (string)- maximum calculated or nameplate DC power of the array in kW
- started_on (string)- date system became active
- ended_on (string) - day system was deactivated
- comments (string) - any additional details
- system_id (string) (Partitioned) - associated system for the data
- measured_on (timestamp) - local timestamp as generated by the instrumentation. Could include DST.
- utc_measured_on (timestamp) - calculated UTC timestamp from the measured_on value. Could include DST.
- metric_id (int) - associated metric_id for the data
- value (double) - value of the data. Join to metric_id table record for units or other details.
Note: not every site or system_id will contain data for each attribute included in the data dictionary.
The PVDAQ Dataset is made available in Parquet format on AWS and is partitioned by year
, month
, day
in AWS Glue and Athena. The schema may change across dataset years on S3.
Partition Keys of pvdaq_pvdata
table,
- year (string) (Partitioned)
- month (string) (Partitioned)
- day (string) (Partitioned)
- s3://oedi-data-lake/pvdaq/inverters/*.parquet
- s3://oedi-data-lake/pvdaq/meters/*.parquet
- s3://oedi-data-lake/pvdaq/metrics/*.parquet
- s3://oedi-data-lake/pvdaq/mount/*.parquet
- s3://oedi-data-lake/pvdaq/other_instruments/*.parquet
- s3://oedi-data-lake/pvdaq/site/*.parquet
- s3://oedi-data-lake/pvdaq/system/*.parquet
- s3://oedi-data-lake/pvdata/system_id=/year=/month=/day=/*.parquet
s3://oedi-data-lake/pvdaq/
The PVDAQ Access repository contains a small python program that can bundle all the daily data from a site and download it onto your local system. If accessing the data for a Solar Data Prize site, some adjustment to the code would be necessary, since all the data sits within a single directory for each site.
https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html
RdTools is an open-source library to support reproducible technical analysis of time series data from photovoltaic energy systems, particularly degredation effects.
Rd Tools
A toolbox provides a set of well-documented functions for simulating the performance of photovoltaic energy systems.
pv_lib-toolbox
PVAnalytics is a python library that supports analytics for PV systems. It provides functions for quality control, filtering, and feature labeling and other tools supporting the analysis of PV system-level data. [PV_Analytics[(https://github.com/pvlib/pvanalytics)
A multi-institution consortium focused on discovery, development, de-risking, and enabling the commercialization of new materials and designs for PV modules.
Main Site
- Validation models for PV performance
- Machine Learning training set for validation of satellite imagery of PV Array sites
- Machine Learning training set for Detection of Inverter Clipping - Real Data
- Machine Learning training set for Detection of Inverter Clipping - Simulated Data
- Machine learning training set for Detection of soiling cleaning events
- Example data of Soiling signal in time-series data
- Spectral Irradiance Data Sets Albuqueque
https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html
https://www.nrel.gov/docs/fy17osti/69131.pdf
A multi-institution consortium focused on discovery, development, de-risking, and enabling the commercialization of new materials and designs for PV modules.
Main Site
- Validation models for PV performance
- Machine Learning training set for validation of satellite imagery of PV Array sites
- Machine Learning training set for Detection of Inverter Clipping - Real Data
- Machine Learning training set for Detection of Inverter Clipping - Simulated Data
- Machine learning training set for Detection of soiling cleaning events
- Example data of Soiling signal in time-series data
- Spectral Irradiance Data Sets Albuqueque
https://www.nrel.gov/pv/real-time-photovoltaic-solar-resource-testing.html
https://www.nrel.gov/docs/fy17osti/69131.pdf
Athena data connection using PyAthena:
import pandas as pd
from pyathena import connect
conn = connect(
s3_staging_dir='s3://<user-defined>/<>', ##user defined staging directory
region_name='us-west-2',
work_group='<USER SPECIFIC WORKGROUP>' ##specify workgroup if exists
)
Example #1: Querying with a limit:
df = pd.read_sql("SELECT * FROM oedi.<> limit 8;", conn)
For jupyter notebook example see our notebook which includes partitions and data dictionary: examples repository
Copyright (c) 2024, Alliance for Sustainable Energy LLC, All rights reserved.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
-
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
-
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
-
Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.