Skip to content
Merged
Show file tree
Hide file tree
Changes from 64 commits
Commits
Show all changes
77 commits
Select commit Hold shift + click to select a range
d6241c9
remove references to dataclass objects
kmoscoe Jun 12, 2025
101c5fd
Fix a copy-paste error.
kmoscoe Jun 12, 2025
504d852
remove extra file
kmoscoe Jun 12, 2025
1697b22
Merge branch 'datacommonsorg:master' into master
kmoscoe Jun 12, 2025
426a81c
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Jun 24, 2025
a40c7c8
Merge branch 'master' of https://github.com/kmoscoe/docsite
kmoscoe Jun 24, 2025
4c69c15
Merge branch 'datacommonsorg:master' into master
kmoscoe Sep 17, 2025
dd5b50f
Merge branch 'datacommonsorg:master' into master
kmoscoe Sep 23, 2025
23cb4c4
Merge branch 'datacommonsorg:master' into master
kmoscoe Sep 23, 2025
1157311
Merge branch 'datacommonsorg:master' into master
kmoscoe Sep 24, 2025
23d3429
Merge branch 'datacommonsorg:master' into master
kmoscoe Sep 25, 2025
2a3409f
Merge branch 'datacommonsorg:master' into master
kmoscoe Sep 30, 2025
516ed75
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Oct 7, 2025
564457c
Merge branch 'datacommonsorg:master' into master
kmoscoe Oct 7, 2025
7052453
Merge branch 'datacommonsorg:master' into master
kmoscoe Oct 7, 2025
5da50d1
Merge branch 'datacommonsorg:master' into master
kmoscoe Oct 8, 2025
a011388
Merge branch 'master' of https://github.com/kmoscoe/docsite
kmoscoe Oct 8, 2025
12b9749
Merge branch 'datacommonsorg:master' into master
kmoscoe Oct 14, 2025
f4861e4
Merge branch 'datacommonsorg:master' into master
kmoscoe Oct 15, 2025
33234d2
Fix a copy-paste error.
kmoscoe Jun 12, 2025
169781f
Merge branch 'datacommonsorg:master' into master
kmoscoe Oct 24, 2025
f3a9005
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Oct 27, 2025
205cb04
Merge branch 'datacommonsorg:master' into master
kmoscoe Nov 3, 2025
3daf24f
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Nov 5, 2025
7599eff
Merge branch 'datacommonsorg:master' into master
kmoscoe Nov 5, 2025
4dd0251
Merge branch 'master' of https://github.com/kmoscoe/docsite
kmoscoe Nov 5, 2025
54ca4cf
Remove unused file
kmoscoe Nov 5, 2025
5ab6c5c
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Nov 11, 2025
1c2a36c
Merge branch 'datacommonsorg:master' into master
kmoscoe Nov 24, 2025
5d2800b
Merge branch 'datacommonsorg:master' into master
kmoscoe Nov 25, 2025
b2cbfd4
Merge branch 'datacommonsorg:master' into master
kmoscoe Dec 3, 2025
494375d
Merge branch 'datacommonsorg:master' into master
kmoscoe Dec 9, 2025
f4da5c3
Merge branch 'datacommonsorg:master' into master
kmoscoe Dec 9, 2025
73c0d41
Merge branch 'datacommonsorg:master' into master
kmoscoe Dec 9, 2025
735db87
Merge branch 'datacommonsorg:master' into master
kmoscoe Dec 16, 2025
d165d9e
Merge branch 'datacommonsorg:master' into master
kmoscoe Dec 17, 2025
4559295
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Dec 17, 2025
85d15a4
Merge branch 'master' of https://github.com/kmoscoe/docsite
kmoscoe Dec 17, 2025
d80ed79
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Jan 13, 2026
ef9fe1f
Update Quickstart to use explicit schema
kmoscoe Jan 14, 2026
5a29614
Update custom_dc/quickstart.md
kmoscoe Jan 14, 2026
e291b91
Changes from Keyur
kmoscoe Jan 14, 2026
10f1329
Merge branch 'explicit' of https://github.com/kmoscoe/docsite into ex…
kmoscoe Jan 14, 2026
6184f92
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Jan 14, 2026
de7d36b
Remove references to implicit schema in the custom data page
kmoscoe Jan 14, 2026
9859714
Remove implicit schema from config reference
kmoscoe Jan 14, 2026
bd73db0
More changes
kmoscoe Jan 14, 2026
90b24f7
Merge branch 'datacommonsorg:master' into explicit
kmoscoe Jan 14, 2026
cbf7ccd
Update custom_dc/config.md
kmoscoe Jan 14, 2026
63c8332
Update custom_dc/custom_data.md
kmoscoe Jan 14, 2026
5a5563f
add explanation of format option
kmoscoe Jan 27, 2026
bf1871d
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Jan 28, 2026
329fe2c
Merge branch 'datacommonsorg:master' into master
kmoscoe Feb 4, 2026
c08d79a
Merge branch 'datacommonsorg:master' into master
kmoscoe Feb 4, 2026
663fa30
Merge branch 'master' of https://github.com/datacommonsorg/docsite
kmoscoe Feb 9, 2026
b1a3705
Merge branch 'master' of https://github.com/kmoscoe/docsite
kmoscoe Feb 9, 2026
31b45ba
Merge branch 'master' of https://github.com/kmoscoe/docsite into expl…
kmoscoe Feb 9, 2026
b8823ea
Start work on overhaul of custom entities page
kmoscoe Feb 9, 2026
a8788a4
Merge branch 'master' of https://github.com/datacommonsorg/docsite in…
kmoscoe Feb 17, 2026
340811c
Finish first draft of overhaul of custom entities page
kmoscoe Feb 17, 2026
81e0aeb
Merge branch 'datacommonsorg:master' into explicit
kmoscoe Feb 17, 2026
5fc3c19
Merge branch 'explicit' of https://github.com/kmoscoe/docsite into ex…
kmoscoe Feb 17, 2026
3523fe0
More changes
kmoscoe Feb 18, 2026
7f5e6a3
minor tweaks
kmoscoe Feb 18, 2026
efe2999
Update custom_dc/custom_entities.md
kmoscoe Feb 18, 2026
a1dab5c
Update custom_dc/custom_entities.md
kmoscoe Feb 18, 2026
7bae820
Update custom_dc/custom_entities.md
kmoscoe Feb 18, 2026
5089f7a
Fixes from Gemini
kmoscoe Feb 18, 2026
e96acd1
Merge branch 'explicit' of https://github.com/kmoscoe/docsite into ex…
kmoscoe Feb 18, 2026
5e7d5c4
Merge branch 'master' of https://github.com/datacommonsorg/docsite in…
kmoscoe Mar 2, 2026
16b4d2b
Add character limit to entity definitions
kmoscoe Mar 2, 2026
73e6876
Merge branch 'datacommonsorg:master' into explicit
kmoscoe Mar 2, 2026
64e75ac
Merge branch 'explicit' of https://github.com/kmoscoe/docsite into ex…
kmoscoe Mar 2, 2026
7c33b35
Merge branch 'master' of https://github.com/datacommonsorg/docsite in…
kmoscoe Mar 3, 2026
6a06ba1
Fix CSV formatting
kmoscoe Mar 3, 2026
de0060e
Incorporate changes from Keyur
kmoscoe Mar 9, 2026
35560d7
Merge branch 'master' into explicit
kmoscoe Mar 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 8 additions & 4 deletions custom_dc/custom_data.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ At a high level, you need to provide the following:
- All observations data must be in CSV format, using the schema described later.
- You must also provide a JSON configuration file, named `config.json`, that specifies how to map and resolve the CSV contents to the Data Commons schema knowledge graph. The contents of the JSON file are described below.

If you need to define new custom entities, please see [Define custom entities](custom_entities.md) for details.
If you need to define new non-place entities, please see [Define non-place entities](custom_entities.md) for details.

{: #dir}
### Files and directory structure
Expand Down Expand Up @@ -58,7 +58,10 @@ In addition, even if you aggregate by geographical area, you may want to measure

#### Entities and entity types

Schema.org and the base Data Commons knowledge graph define entity types for just about everything in the world. An _entity type_ is a high-level concept, and is derived directly from a [`Class`](https://datacommons.org/browser/Class){: target="_blank"} type. The most common entity types in Data Commons are place types, such as `City`, `Country`, `AdministrativeArea1`, etc. Examples of other entity types are `Hospital`, `PublicSchool`, `Company`, `BusStation`, `Campground`, `Library` etc. It is rare that you would need to create a new entity type, unless you are working in a highly specialized domain.
Schema.org and the base Data Commons knowledge graph define entity types for just about everything in the world. An _entity type_ is a high-level concept, and is derived directly from a [`Class`](https://datacommons.org/browser/Class){: target="_blank"} type. Non-place entities are of two types:
- The thing you are measuring, known as the `populationType` in Data Commons. Often this is a `Person`, which is a commonly used population in Data Commons. But it could be something else entirely, like the beds in a hospital, the price of a commodity, Olympic medals won by a country, or the surface area of an ocean.
- The level at which you want to aggregate the data. Most commonly in Data Commons this is a place type such as `City`, `Country`, `AdministrativeArea1`, etc. But it could also be a non-place entity, like an organization, a company, or a Examples of other entity types are `Hospital`, `PublicSchool`, `Company`, `BusStation`, `Campground`, `Library` etc.
It is rare that you would need to create a new entity type, unless you are working in a highly specialized domain.

An _entity_ is an instance of an entity type. For example, for `PublicSchool`, base Data Commons has many U.S. schools in its knowledge graph, such as [`nces/010162001665`](https://datacommons.org/browser/nces/010162001665){: target="_blank"} (Adams Elementary School) or [`nces/010039000201`](https://datacommons.org/browser/nces/010039000201){: target="_blank"} (Wylam Elementary School). Base Data Commons contains thousands of places and other entities, but it's possible that it does not have specific entities that you need. For example, it has about 100 instances of `Company`, but you may want data for other companies besides those. As another example, let's say your organization wants to collect (possibly private) data about different divisions or departments of your org; in this case you would need to define entities for them.

Expand Down Expand Up @@ -100,7 +103,7 @@ To search using the Python APIs:

Your data undoubtedly contains metrics and observed values. In Data Commons, the metrics themselves are known as statistical variables, and the time series data, or values over time, are known as observations. While observations are always numeric, statistical variables must be defined as _nodes_ in the Data Commons knowledge graph.

Data Commons already has thousands of statistical variables in its knowledge graph; you may be able to simply reuse existing ones. To browse and search for existing variables, see the [Statistical Variable Explorer](https://datacommons.org/tools/statvar){: target="_blank"}.
Data Commons already has thousands of statistical variables in its knowledge graph; you may be able to simply reuse or extend existing ones. To browse and search for existing variables, see the [Statistical Variable Explorer](https://datacommons.org/tools/statvar){: target="_blank"}.

If you do need to define a statistical variable, it must follow a certain model. The variable consists of a measure (e.g. "median age") on a set of things of a certain type (e.g. "persons") that satisfy some set of constraints (e.g. "gender is female"). To explain what this means, consider the following example. Let's say your dataset contains the number of schools in U.S. cities, broken down by level (elementary, middle, secondary) and type (private, public), reported for each year (numbers are not real, but are just made up for the sake of example):

Expand Down Expand Up @@ -156,7 +159,7 @@ The names and order of the columns aren't important, as you can map them to the

## Prepare your data

Nodes in the Data Commons knowledge graph are defined in Metadata Content Format (MCF). For custom Data Commons, if you need to define new statistical variables, you must define them as new _nodes_ using MCF. When you define any variable in MCF, you explicitly assign it a DCID.
Nodes in the Data Commons knowledge graph are defined in Metadata Content Format (MCF). For custom Data Commons, if you need to define new statistical variables, you must define them as new _nodes_ using MCF. When you define any variable in MCF, you explicitly assign it a DCID. You can also _extend_ existing statistical variables, that is, adding more arbitrary key-value fields, by redefining them in an MCF file.

You can define your statistical variables in a single MCF file, or split them into as many separate MCF files as you like. MCF files must have a `.mcf` suffix.

Expand Down Expand Up @@ -327,6 +330,7 @@ Here are the rules for observation values:
- For null or not-a-number values, we recommend that you use blanks. (The strings `NaN`, `NA`, and `N/A` are also accepted.) These values will be ignored and not displayed in any charts or tables.
- Do not use negative numbers or inordinately large numbers to represent NaNs or nulls.

{: #json}
### Step 3: Write the JSON config file

You must define a `config.json` in the top-level directory where your CSV files are located. You need to provide these specifications:
Expand Down
Loading
Loading