You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Each node consists of some kind of entity or value, and each edge describes some kind of property. More specifically, each node consists of the following objects:
40
40
@@ -47,7 +47,7 @@ As in other knowledge graphs, each pair of connected nodes is a _triple_ consist
47
47
48
48
You can get information about a node and its edges by looking at the [Knowledge Graph browser](https://datacommons.org/browser){: target="_blank"}. If you know the [DCID](#unique-identifier-dcid) for a node, you can access it directly by typing <code>https://datacommons.org/browser/<var>DCID</var></code>. For example, here is the entry for the `City` node, available at [https://datacommons.org/browser/City](https://datacommons.org/browser/City){: target="_blank"}:
Every node entry shows a list of outgoing edges, or _properties,_ and incoming edges. [Properties](#property) are discussed in more detail below.
53
53
@@ -83,21 +83,21 @@ Note that not all statistical variables have observations for all places or othe
83
83
84
84
For example, inspecting [Health > Health Insurance (Household) > No Health Insurance > Households Without Health Insurance](https://datacommons.org/tools/statvar#sv=Count_Household_NoHealthInsurance){: target="_blank"} shows us that the statistical variable `Count_Household_NoHealthInsurance` is available in the United States at state, county, and city levels:
85
85
86
-
{: width="900"}
86
+
{: width="900"}
87
87
88
88
On the other hand, the [Average Retail Price of Electricity](https://datacommons.org/tools/statvar#Quarterly_Average_RetailPrice_Electricity=&sv=Quarterly_Average_RetailPrice_Electricity){: target="_blank"}, or `Quarterly_Average_RetailPrice_Electricity`, is only available at the state level states in the US but not at the city or county level.
89
89
90
-
{: width="900"}
90
+
{: width="900"}
91
91
92
92
## Unique identifier: DCID
93
93
94
94
Every node has a unique identifier, called a Data Commons ID, or DCID. In the [Knowledge Graph browser](https://datacommons.org/browser/){: target="_blank"}, you can view the DCID for any node or edge. For example, the DCID for the city of Berkeley is `geoid/0606000`:
DCIDs are not restricted to entities; statistical variables also have DCIDs. For example, the DCID for the Gini Index of Economic Activity is `GiniIndex_EconomicActivity`:
99
99
100
-
{: width="900"}
100
+
{: width="900"}
101
101
102
102
### Task: Find a DCID for an entity or variable {#find-dcid}
103
103
@@ -110,7 +110,7 @@ To find the DCID for a place using the datacommons.org website:
110
110
1. Scroll to the **In Arcs** section to look up the places of interest.
111
111
1. If necessary, continue to drill down on links until you find the place of interest.
@@ -124,7 +124,7 @@ To find the DCID for a statistical variable using the datacommons.org website:
124
124
1. Search for the variable of interest, and optionally filter by data source and dataset.
125
125
1. Look under the heading for the DCID.
126
126
127
-
{: width="900"}
127
+
{: width="900"}
128
128
129
129
To find the DCID for a statistical variable using other methods:
130
130
@@ -139,7 +139,7 @@ Other properties are links to other entities/events/ etc. In the Knowledge Graph
139
139
140
140
For example, in this node for the city of Addis Ababa, Ethiopia, the `typeOf` and `containedInPlace` edges link to other entities, namely `City` and `Ethiopia`, whereas all the other values are terminal.
Note that the DCID for a property is the same as its name.
145
145
@@ -151,25 +151,30 @@ For example, the value of the statistical variable [`Median Age of Female Popula
151
151
152
152
Time series made up of many observations underlie the data available in the [Timeline Explorer](https://datacommons.org/tools/timeline){: target="_blank"} and timeline graphs. For example, here is the [median income in Berkeley, CA over a period of ten years](https://datacommons.org/tools/timeline#place=geoId%2F0606000&statsVar=Median_Income_Person){: target="_blank"}, according to the US Census Bureau:
Every node and triple also have some important properties that indicate the origin of the data.
159
160
160
-
-[`Provenance`](https://datacommons.org/browser/Provenance){: target="_blank"}: All triples have a provenance, typically the URL of the data provider's website; for example, [www.abs.gov.au](https://datacommons.org/browser/dc/base/AustraliaStatistics){: target="_blank"}. In addition, all entity types also have a provenance, defined with a DCID, such as [`AustraliaStatistics`](https://datacommons.org/browser/dc/base/AustraliaStatistics){: target="_blank"}. (For many property types, which are defined by the Data Commons schema, their provenance is always datacommons.org.)
161
-
-[`Source`](https://datacommons.org/browser/Source){: target="_blank"}: This is a property of a provenance, and a dataset, usually the name of an organization that provides the data or the schema. For example, for provenance [www.abs.gov.au](www.abs.gov.au), the source is the [Australian Bureau of Statistics](https://datacommons.org/browser/dc/s/AustralianBureauOfStatistics){: target="_blank"}.
162
-
-[`Dataset`](https://datacommons.org/browser/Dataset){: target="_blank"}: This is the name of a specific dataset provided by a provider. Many sources provide multiple datasets. For example, the source Australian Bureau of Statistics provides two datasets, [Australia Statistics](https://datacommons.org/browser/dc/d/AustralianBureauOfStatistics_AustraliaStatistics){: target="_blank"} (not to be confused with the provenance above), and [Australia Subnational Administrative Boundaries](https://datacommons.org/browser/dc/d/AustralianBureauOfStatistics_AustraliaSubnationalAdministrativeBoundaries){: target="_blank"}.
161
+
-[`Source`](https://datacommons.org/browser/Source){: target="_blank"}: This is the organization that provides the data, and is usually specified as the name of the organization; for example, [Australian Bureau of Statistics](https://datacommons.org/browser/dc/s/AustralianBureauOfStatistics){: target="_blank"}.
162
+
-[`Dataset`](https://datacommons.org/browser/Dataset){: target="_blank"}: This is the name of a specific dataset provided by a source. In relational database terminology, a `dataset` roughly corresponds to a "database". Many sources provide multiple datasets. For example, the source Australian Bureau of Statistics provides two datasets, [Australia Statistics](https://datacommons.org/browser/dc/d/AustralianBureauOfStatistics_AustraliaStatistics){: target="_blank"}, and [Australia Subnational Administrative Boundaries](https://datacommons.org/browser/dc/d/AustralianBureauOfStatistics_AustraliaSubnationalAdministrativeBoundaries){: target="_blank"}.
163
+
-[`Provenance`](https://datacommons.org/browser/Provenance){: target="_blank"}: A provenance is a subset of a dataset. For small datasets, it may represent the entire dataset. For example, Sweden Census is both a [dataset](https://datacommons.org/browser/dc/d/StatisticsSweden_SwedenCensus){: target="_blank"} and a [provenance](https://datacommons.org/browser/dc/base/Sweden_Census){: target="_blank"}.
164
+
165
+
For larger datasets, a provenance usually represents a subset of the dataset, roughly corresponding to a "table" in relational database terminology. Thus, there may be several provenances for a given dataset. For example, [Brazil VIS DATA 3](https://datacommons.org/browser/dc/d/BrazilMinistryOfDevelopmentAndSocialAssistanceFamilyAndFightAgainstHunger_BrazilVisData3){: target="_blank"} is a dataset that comprises 2 provenances: [Brazil Food Distribution](https://datacommons.org/browser/dc/base/Brazil_FoodDsitribution){: target="_blank"} and [Brazil Rural Development Program](https://datacommons.org/browser/dc/base/Brazil_RuralDevelopmentProgram){: target="_blank"}.
166
+
167
+
In Data Commons, a provenance is the physical unit of an import, and thus contains detailed import information. A dataset is more of an abstract concept; each provenance has a property that points to the dataset it belongs to.
Note that a given statistical variable may have multiple provenances, since many data sets define the same variables. You can see the list of all the data sources for a given statistical variable in the Statistical Variable Explorer. For example, the explorer shows multiple sources (Censuses from India, Mexico, Vietnam, OECD, World Bank, etc.) for the variable [Life Expectancy](https://datacommons.org/tools/statvar#LifeExpectancy_Person=&sv=LifeExpectancy_Person){: target="_blank"}:
167
172
168
-
{: width="900"}
173
+
{: width="900"}
169
174
170
175
You can see a list of all sources and data sets in several places:
171
176
172
177
- The [Data Sources](https://datacommons.org/data/){: target="_blank"} pages
173
178
- The **Data source** and **Dataset** drop-down menus in the Statistical Variable Explorer
174
179
175
-
{: width="600"}
180
+
{: width="600"}
A collection of data, provided by a [source](#source). For example, [Brazil Census](https://datacommons.org/browser/dc/d/BrazilianInstituteOfGeographyAndStatisticsIbge_BrazilCensus){: target="_blank"} is a dataset provided by the source Brazilian Institute of Geography and Statistics. See [Key concepts](data_model.md#sources) for more details.
@@ -75,13 +80,22 @@ When a variable has values from multiple [facets](#facet), one facet is designat
75
80
76
81
Attributes of the entities in the Data Common knowledge graph. Instead of statistical values, properties describe unchanging characteristics of entities, like [scientific name](https://datacommons.org/browser/scientificName){: target="_blank"}.
A subset of data in a [dataset](#dataset). For small datasets, the provenance may represent the entire dataset. Larger datasets may comprise multiple provenances. See [Key concepts](data_model.md#sources) for more details.
Property of [variables](#variable) that measure proportions, used in conjunction with the measurementDenominator property to indicate the multiplication factor applied to the proportion's denominator (with the measurement value as the final result of the multiplication) when the numerator and denominator are not equal.
82
91
83
92
As an example, in 1999, [approximately 36% of Canadians were Internet users](https://datacommons.org/browser/dc/o/0d9e3dd3y6yt3){: target="_blank"}. Here the measured value of `Count_Person_IsInternetUser_PerCapita` is 36, and the scaling factor or denominator for this per capita measurement is 100. Without the scaling factor, we would interpret the value to be 36/1, or 3600%.
The provider of a dataset, usually an organization or agency. For example, [Brazilian Institute of Geography and Statistics](https://datacommons.org/browser/dc/s/BrazilianInstituteOfGeographyAndStatisticsIbge) is a source that provides census and statistical datasets. See [Key concepts](data_model.md#sources) for more details.
0 commit comments