Skip to content

Commit 1cec2aa

Browse files
authored
Merge branch 'datacommonsorg:master' into master
2 parents 39a9546 + 1b63c12 commit 1cec2aa

4 files changed

Lines changed: 38 additions & 19 deletions

File tree

Gemfile.lock

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -220,7 +220,7 @@ GEM
220220
gemoji (>= 3, < 5)
221221
html-pipeline (~> 2.2)
222222
jekyll (>= 3.0, < 5.0)
223-
json (2.18.1)
223+
json (2.19.2)
224224
kramdown (2.4.0)
225225
rexml
226226
kramdown-parser-gfm (1.1.0)

assets/images/dc/concept12.png

2.99 KB
Loading

data_model.md

Lines changed: 21 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ As a simple example, here are a set of nodes and edges that represent the follow
3434
- Santa Clara county and Berkeley are contained in the state of California
3535
- The latitude of Berkeley, CA is 37.8703
3636

37-
![knowledge graph]({{site.url}}/assets/images/dc/concept1.png){: width="600"}
37+
![knowledge graph](/assets/images/dc/concept1.png){: width="600"}
3838

3939
Each node consists of some kind of entity or value, and each edge describes some kind of property. More specifically, each node consists of the following objects:
4040

@@ -47,7 +47,7 @@ As in other knowledge graphs, each pair of connected nodes is a _triple_ consist
4747

4848
You can get information about a node and its edges by looking at the [Knowledge Graph browser](https://datacommons.org/browser){: target="_blank"}. If you know the [DCID](#unique-identifier-dcid) for a node, you can access it directly by typing <code>https://datacommons.org/browser/<var>DCID</var></code>. For example, here is the entry for the `City` node, available at [https://datacommons.org/browser/City](https://datacommons.org/browser/City){: target="_blank"}:
4949

50-
![KG browser]({{site.url}}/assets/images/dc/concept2.png){: width="900"}
50+
![KG browser](/assets/images/dc/concept2.png){: width="900"}
5151

5252
Every node entry shows a list of outgoing edges, or _properties,_ and incoming edges. [Properties](#property) are discussed in more detail below.
5353

@@ -83,21 +83,21 @@ Note that not all statistical variables have observations for all places or othe
8383

8484
For example, inspecting [Health > Health Insurance (Household) > No Health Insurance > Households Without Health Insurance](https://datacommons.org/tools/statvar#sv=Count_Household_NoHealthInsurance){: target="_blank"} shows us that the statistical variable `Count_Household_NoHealthInsurance` is available in the United States at state, county, and city levels:
8585

86-
![Stat Var Explorer]({{site.url}}/assets/images/dc/concept4.png){: width="900"}
86+
![Stat Var Explorer](/assets/images/dc/concept4.png){: width="900"}
8787

8888
On the other hand, the [Average Retail Price of Electricity](https://datacommons.org/tools/statvar#Quarterly_Average_RetailPrice_Electricity=&sv=Quarterly_Average_RetailPrice_Electricity){: target="_blank"}, or `Quarterly_Average_RetailPrice_Electricity`, is only available at the state level states in the US but not at the city or county level.
8989

90-
![Stat Var Explorer]({{site.url}}/assets/images/dc/concept5.png){: width="900"}
90+
![Stat Var Explorer](/assets/images/dc/concept5.png){: width="900"}
9191

9292
## Unique identifier: DCID
9393

9494
Every node has a unique identifier, called a Data Commons ID, or DCID. In the [Knowledge Graph browser](https://datacommons.org/browser/){: target="_blank"}, you can view the DCID for any node or edge. For example, the DCID for the city of Berkeley is `geoid/0606000`:
9595

96-
![KG browser]({{site.url}}/assets/images/dc/concept6.png){: width="600"}
96+
![KG browser](/assets/images/dc/concept6.png){: width="600"}
9797

9898
DCIDs are not restricted to entities; statistical variables also have DCIDs. For example, the DCID for the Gini Index of Economic Activity is `GiniIndex_EconomicActivity`:
9999

100-
![Stat Var Explorer]({{site.url}}/assets/images/dc/concept7.png){: width="900"}
100+
![Stat Var Explorer](/assets/images/dc/concept7.png){: width="900"}
101101

102102
### Task: Find a DCID for an entity or variable {#find-dcid}
103103

@@ -110,7 +110,7 @@ To find the DCID for a place using the datacommons.org website:
110110
1. Scroll to the **In Arcs** section to look up the places of interest.
111111
1. If necessary, continue to drill down on links until you find the place of interest.
112112

113-
![KG browser]({{site.url}}/assets/images/dc/concept8.png){: width="900"}
113+
![KG browser](/assets/images/dc/concept8.png){: width="900"}
114114

115115
To find the DCID for a place using other methods:
116116

@@ -124,7 +124,7 @@ To find the DCID for a statistical variable using the datacommons.org website:
124124
1. Search for the variable of interest, and optionally filter by data source and dataset.
125125
1. Look under the heading for the DCID.
126126

127-
![Stat Var Explorer]({{site.url}}/assets/images/dc/concept9.png){: width="900"}
127+
![Stat Var Explorer](/assets/images/dc/concept9.png){: width="900"}
128128

129129
To find the DCID for a statistical variable using other methods:
130130

@@ -139,7 +139,7 @@ Other properties are links to other entities/events/ etc. In the Knowledge Graph
139139

140140
For example, in this node for the city of Addis Ababa, Ethiopia, the `typeOf` and `containedInPlace` edges link to other entities, namely `City` and `Ethiopia`, whereas all the other values are terminal.
141141

142-
![KG browser]({{site.url}}/assets/images/dc/concept10.png){: width="600"}
142+
![KG browser](/assets/images/dc/concept10.png){: width="600"}
143143

144144
Note that the DCID for a property is the same as its name.
145145

@@ -151,25 +151,30 @@ For example, the value of the statistical variable [`Median Age of Female Popula
151151

152152
Time series made up of many observations underlie the data available in the [Timeline Explorer](https://datacommons.org/tools/timeline){: target="_blank"} and timeline graphs. For example, here is the [median income in Berkeley, CA over a period of ten years](https://datacommons.org/tools/timeline#place=geoId%2F0606000&statsVar=Median_Income_Person){: target="_blank"}, according to the US Census Bureau:
153153

154-
![Timeline Explorer]({{site.url}}/assets/images/dc/concept11.png){: width="900"}
154+
![Timeline Explorer](/assets/images/dc/concept11.png){: width="900"}
155155

156156
## Provenance, Source, Dataset
157+
{: #sources}
157158

158159
Every node and triple also have some important properties that indicate the origin of the data.
159160

160-
- [`Provenance`](https://datacommons.org/browser/Provenance){: target="_blank"}: All triples have a provenance, typically the URL of the data provider's website; for example, [www.abs.gov.au](https://datacommons.org/browser/dc/base/AustraliaStatistics){: target="_blank"}. In addition, all entity types also have a provenance, defined with a DCID, such as [`AustraliaStatistics`](https://datacommons.org/browser/dc/base/AustraliaStatistics){: target="_blank"}. (For many property types, which are defined by the Data Commons schema, their provenance is always datacommons.org.)
161-
- [`Source`](https://datacommons.org/browser/Source){: target="_blank"}: This is a property of a provenance, and a dataset, usually the name of an organization that provides the data or the schema. For example, for provenance [www.abs.gov.au](www.abs.gov.au), the source is the [Australian Bureau of Statistics](https://datacommons.org/browser/dc/s/AustralianBureauOfStatistics){: target="_blank"}.
162-
- [`Dataset`](https://datacommons.org/browser/Dataset){: target="_blank"}: This is the name of a specific dataset provided by a provider. Many sources provide multiple datasets. For example, the source Australian Bureau of Statistics provides two datasets, [Australia Statistics](https://datacommons.org/browser/dc/d/AustralianBureauOfStatistics_AustraliaStatistics){: target="_blank"} (not to be confused with the provenance above), and [Australia Subnational Administrative Boundaries](https://datacommons.org/browser/dc/d/AustralianBureauOfStatistics_AustraliaSubnationalAdministrativeBoundaries){: target="_blank"}.
161+
- [`Source`](https://datacommons.org/browser/Source){: target="_blank"}: This is the organization that provides the data, and is usually specified as the name of the organization; for example, [Australian Bureau of Statistics](https://datacommons.org/browser/dc/s/AustralianBureauOfStatistics){: target="_blank"}.
162+
- [`Dataset`](https://datacommons.org/browser/Dataset){: target="_blank"}: This is the name of a specific dataset provided by a source. In relational database terminology, a `dataset` roughly corresponds to a "database". Many sources provide multiple datasets. For example, the source Australian Bureau of Statistics provides two datasets, [Australia Statistics](https://datacommons.org/browser/dc/d/AustralianBureauOfStatistics_AustraliaStatistics){: target="_blank"}, and [Australia Subnational Administrative Boundaries](https://datacommons.org/browser/dc/d/AustralianBureauOfStatistics_AustraliaSubnationalAdministrativeBoundaries){: target="_blank"}.
163+
- [`Provenance`](https://datacommons.org/browser/Provenance){: target="_blank"}: A provenance is a subset of a dataset. For small datasets, it may represent the entire dataset. For example, Sweden Census is both a [dataset](https://datacommons.org/browser/dc/d/StatisticsSweden_SwedenCensus){: target="_blank"} and a [provenance](https://datacommons.org/browser/dc/base/Sweden_Census){: target="_blank"}.
164+
165+
For larger datasets, a provenance usually represents a subset of the dataset, roughly corresponding to a "table" in relational database terminology. Thus, there may be several provenances for a given dataset. For example, [Brazil VIS DATA 3](https://datacommons.org/browser/dc/d/BrazilMinistryOfDevelopmentAndSocialAssistanceFamilyAndFightAgainstHunger_BrazilVisData3){: target="_blank"} is a dataset that comprises 2 provenances: [Brazil Food Distribution](https://datacommons.org/browser/dc/base/Brazil_FoodDsitribution){: target="_blank"} and [Brazil Rural Development Program](https://datacommons.org/browser/dc/base/Brazil_RuralDevelopmentProgram){: target="_blank"}.
166+
167+
In Data Commons, a provenance is the physical unit of an import, and thus contains detailed import information. A dataset is more of an abstract concept; each provenance has a property that points to the dataset it belongs to.
163168

164-
![Knowledge graph]({{site.url}}/assets/images/dc/concept12.png){: width="600"}
169+
![Knowledge graph](/assets/images/dc/concept12.png){: width="600"}
165170

166171
Note that a given statistical variable may have multiple provenances, since many data sets define the same variables. You can see the list of all the data sources for a given statistical variable in the Statistical Variable Explorer. For example, the explorer shows multiple sources (Censuses from India, Mexico, Vietnam, OECD, World Bank, etc.) for the variable [Life Expectancy](https://datacommons.org/tools/statvar#LifeExpectancy_Person=&sv=LifeExpectancy_Person){: target="_blank"}:
167172

168-
![Stat Var Explorer]({{site.url}}/assets/images/dc/concept13.png){: width="900"}
173+
![Stat Var Explorer](/assets/images/dc/concept13.png){: width="900"}
169174

170175
You can see a list of all sources and data sets in several places:
171176

172177
- The [Data Sources](https://datacommons.org/data/){: target="_blank"} pages
173178
- The **Data source** and **Dataset** drop-down menus in the Statistical Variable Explorer
174179

175-
![Stat Var Explorer]({{site.url}}/assets/images/dc/concept14.png){: width="600"}
180+
![Stat Var Explorer](/assets/images/dc/concept14.png){: width="600"}

glossary.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,17 +7,22 @@ parent: How to use Data Commons
77
---
88

99
{: .no_toc}
10-
# Glossary of Common Terms
10+
# Glossary of common terms
1111

1212
{: .no_toc}
1313
This page contains a selection of key terms important to understanding the structure of data within Data Commons.
1414

15-
## Term List
15+
## Term list
1616
{: .no_toc}
1717

1818
* TOC
1919
{:toc}
2020

21+
### [Dataset](https://datacommons.org/browser/Dataset){: target="_blank"}
22+
{: #dataset}
23+
24+
A collection of data, provided by a [source](#source). For example, [Brazil Census](https://datacommons.org/browser/dc/d/BrazilianInstituteOfGeographyAndStatisticsIbge_BrazilCensus){: target="_blank"} is a dataset provided by the source Brazilian Institute of Geography and Statistics. See [Key concepts](data_model.md#sources) for more details.
25+
2126
### [Date](https://datacommons.org/browser/date){: target="_blank"}
2227
{: #date}
2328

@@ -75,13 +80,22 @@ When a variable has values from multiple [facets](#facet), one facet is designat
7580

7681
Attributes of the entities in the Data Common knowledge graph. Instead of statistical values, properties describe unchanging characteristics of entities, like [scientific name](https://datacommons.org/browser/scientificName){: target="_blank"}.
7782

83+
### [Provenance](https://datacommons.org/browser/Provenance){: target="_blank"}
84+
85+
A subset of data in a [dataset](#dataset). For small datasets, the provenance may represent the entire dataset. Larger datasets may comprise multiple provenances. See [Key concepts](data_model.md#sources) for more details.
86+
7887
### [Scaling Factor](https://datacommons.org/browser/scalingFactor){: target="_blank"}
7988
{: #scaling-factor}
8089

8190
Property of [variables](#variable) that measure proportions, used in conjunction with the measurementDenominator property to indicate the multiplication factor applied to the proportion's denominator (with the measurement value as the final result of the multiplication) when the numerator and denominator are not equal.
8291

8392
As an example, in 1999, [approximately 36% of Canadians were Internet users](https://datacommons.org/browser/dc/o/0d9e3dd3y6yt3){: target="_blank"}. Here the measured value of `Count_Person_IsInternetUser_PerCapita` is 36, and the scaling factor or denominator for this per capita measurement is 100. Without the scaling factor, we would interpret the value to be 36/1, or 3600%.
8493

94+
### [Source](https://datacommons.org/browser/Source){: target="_blank"}
95+
{: #source}
96+
97+
The provider of a dataset, usually an organization or agency. For example, [Brazilian Institute of Geography and Statistics](https://datacommons.org/browser/dc/s/BrazilianInstituteOfGeographyAndStatisticsIbge) is a source that provides census and statistical datasets. See [Key concepts](data_model.md#sources) for more details.
98+
8599
### [Statistical Variable](https://datacommons.org/browser/StatisticalVariable){: target="_blank"}
86100
{: #variable}
87101

0 commit comments

Comments
 (0)