-
Notifications
You must be signed in to change notification settings - Fork 20
Overview for coders
See also Coding style
A Token is a small piece of script from the game, together with a Loc: its location in the game code.
A token is the smallest piece of script produced by the parser, and is usually the smallest meaningful part, but there are methods
to break a token up into subtokens; for example to handle scope chains like root.primary_title.tier which is one token that can
be broken up into three smaller ones by the validator.
A Loc specifies the pathname to the file in which a Token appears (starting from the game directory or the mod directory), as
well as the line number and column number within that file. It is mostly used for error reporting, both in the reports themselves
and to control what gets reported.
Most tokens are directly from the script files, but there are also synthetic tokens that represent the logical value, not the literal value, of a piece of script. This occurs mostly in macro processing.
Two tokens compare equal if their strings are equal, regardless of their locs.
"Number" generally means a Pdx fixed-point value, represented in the validator as f64. The fixed-point quantity is really a 64-bit value scaled down by a factor of 100,000. This gives it 5 decimals and about 47 bits of precision in the integral part. These properties mean that the f64 representation is lossy, but that hasn't caused problems so far.
Numbers with more precision than that are called precise_number and are rarely used. They mostly show up in the gfx parts of the script. They are also represented as f64 in the validator; the difference is in the checks done on the numeric literal in the script.
"Number" is often contrasted with "integer" in the code. Integer is usually also a Pdx fixed-point value, just one that is constrained to be a whole number. "Numeric" and "integer" are contrasted in a similar way.
Pdx script is what's in nearly all the .txt files in the game.
The language doesn't have an official name.
In fact, most of the terms in this section are made up for the validator and don't come from Pdx.
The language is mostly declarative, with some bits of imperative logic (the effects) and conditionals (the triggers).
Pdx script is parsed into Rust datatypes by the parse::pdxfile module.
The core of the representation of Pdx script is the Block.
Blocks are delimited by { and }.
An entire file is also a Block.
Blocks can contain a mix of these kinds of items:
- Assignments:
key = value - Definitions:
key = { ... } - Loose sub-blocks:
{ ... } { ... } ... - Loose values:
value value ... - Comparisons:
key < valuefor a variety of comparators, including=for equality -
key < { ... }is accepted by the parser but is not used anywhere
The same key can occur multiple times in a block. Sometimes they are added together into a list of some sort, sometimes they override each other (latest key wins). Overriding is often an error and is reported by the validator.
Note that there is overlap between comparisons and assignments.
The parser cannot distinguish them and doesn't try.
They are handled the same and it's up to the validator to accept only = or also other comparators.
A block may be tagged by a special token in front of it, for example color = hsv { 1.0 0.5 0.5 }.
This is handled specially by the parser for a limited number of tags, so that it is treated as a definition of color rather than a color = hsv assignment followed by a loose block.
A Block contains a vector of BlockItem to represent its contents.
A BlockItem represents the variations listed above. It is an enum that is either a keyed Field, a loose Block, or a loose Value.
A Field contains a key, a comparator, and either a block or a value.
It's handled this way, rather than distinguishing between assignments and definitions at this level, because the validator often needs to look up a key regardless of whether it is in front of a block or a value.
A BV is an enum that is either a block or a value.
It is returned when you look up a key without specifying whether you want only values or only blocks.
The code is careful to always call a variable of this type a "bv" and not something confusing like "value".
A BV contains convenience methods expect_block and expect_value to return the expected type and emit a warning if it doesn't contain the expected type.
Both keys and values are represented by Token.
You will often see a Value being called "token" in the code rather than "value", because "value" is a bit vague.
Values (and sometimes keys) can be converted into numbers or dates if they are valid tokens of that type.
TODO
The datatype language is the code between [ ] in localization and gui files. It consists of sections (called codes by tiger) separated by . where each section can also have arguments between ( ). The arguments can then contain literals between ' ' or other datatype codes.
The game engine makes a strict distinction between promotes and functions. Functions can be used to end code chains and they return a value. Promotes are used in the codes leading up to the function. Often there are promotes and functions with the same names.
The datatype language has hundreds of types, some of which correspond to scope types or to basic values such as CString or int32. Most of the types are rarely used and refer to specific gui windows and similar stuff. The numbers from script are usually the CFixedPoint type.
The datatype code in localization values is often evaluated in a scope context, which allows it to refer to named scopes provided by that context. It's often not clear what the context is for a particular key.
A code chain must start with a "global" promote or function, which is one that does not expect a datatype as input. Every following promote or function is defined to operate on a particular datatype provided by the previous code in the chain, and a promote will return a datatype for the next code to use while a function will return a datatype to be inlined in the localization or used as argument to a calling code.
Because the logfiles from which we got the datatype information often leave out type information, we have a Datatype::Unknown to represent that lack of knowledge. Datatype::Unknown is considered compatible with every other datatype. There's an ongoing process of gradually testing and filling in the types of the arguments and return values.
Pdx script defines a huge variety of different kinds of database items: faiths, landed titles, characters, buildings, coats of arms, and a hundred more.
These are all categorized by the giant Item enum, which is used as a lookup key (together with a string or Token to identify the item) in many places.
Each validator in the data directory loads and validates one kind of item or a small group of related items.
Because of the name of the enum, the code does not make much distinction between "item" and "item type", though you will see "itype" used in some older parts of the code. This distinction may be worth improving.
TODO
The overall execution of the validator is controlled by the Everything type, which also stores all the loaded data. Execution is in two main phases: loading and validation.
In the loading phase, Everything is mutable and every validator in the data modules sees only part of Everything. (Usually this is the database field, which loads all the Item types that don't have specialized handlers).
The loading phase parses (nearly) all the script files of the base game and the mod, and saves them as key and block pairs. (Types Token and Block respectively; see above for what those are). The loading phase does minimal validation; generally, only the validation that's necessary to warn about things that can't be loaded, before they are thrown away.
The loading phase relies heavily on the parsers in the parse module. It runs parsers in parallel in different threads as much as possible, which is apparently not very much (parallellization efficiency rarely goes over 3 cores). This deficiency still needs to be investigated. Actually adding items to the loaded data is done sequentially, because the logic of which items override which other items needs to be respected.
In the validation phase, Everything is immutable and a reference to it is passed to every validation function. This phase is when crosschecking is done: every item's references to other items are checked to see if the other items exist.
Validation relies heavily on the Validator type, which is a type that wraps around a Block and performs a variety of checks on it (driven by the caller), and at the end warns about any parts of the Block that were not validated in this process.
Validation of localization is done in a specialized handler (in data::localization). Localization is validated both independently by this validator for all localization keys, and also in response to calls from other validators which want to evaluate a specific localization string in their own scope context. Localization can use information from the scope that is displaying the localized string, so validating a localization string in this scope context can generate more accurate warnings. Note that this means many localization strings get validated multiple times in different contexts, which is ok because the error module does deduplication of messages.