Skip to content

Commit decfe3e

Browse files
committed
Terminal semicolon warning
1 parent 0a06e10 commit decfe3e

4 files changed

Lines changed: 52 additions & 8 deletions

File tree

NEWS.md

Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8,23 +8,21 @@
88
at exact thresholds (e.g. a split in 2 of 3 trees with `p = 2/3`). The
99
majority threshold `p = 0.5` is unchanged (a split must occur in more than
1010
half the trees).
11+
- `ReeadCharacters()` no longer warns on a `STATELABELS` block with a terminal
12+
semicolon.
1113

1214
## Performance
1315

1416
- Guarantee preorder return from `root_on_node()` to simplify `Consensus()`
15-
internal pre-processing
17+
internal pre-processing.
1618
- `Consensus()` and `SplitFrequency()` defer materialising a split's bit pattern
1719
until it is needed, so splits that never reach the consensus threshold are no
18-
longer built. Identical results; up to ~13× faster for large trees (greatest
19-
gains for tall trees / many tips), with no change at small sizes.
20+
longer built.
2021
- `RenumberTips()` relabels an unlabelled `multiPhylo` or `list` of trees in a
2122
single C++ pass instead of a per-tree R loop, with a no-op fast path for trees
22-
already in the target order. Speeds up `Consensus()` and other callers when
23-
combining many trees; results are unchanged.
23+
already in the target order.
2424
- `Consensus()` no longer copies every input tree to strip branch lengths and
25-
node labels (the consensus core ignores both); it now coerces in place,
26-
trimming wrapper overhead (~25% faster on small forests of many short trees).
27-
Results are unchanged.
25+
node labels; it now coerces in place.
2826

2927
## Dependencies
3028

R/parse_files.R

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -283,6 +283,13 @@ ReadCharacters <- function(filepath, character_num = NULL, encoding = "UTF8") {
283283
}
284284
stateStarts <- grep("^\\d+", stateLines)
285285
stateEnds <- grep("[,;]$", stateLines)
286+
# When the closing ';' of the block also serves as the terminator of the
287+
# last character entry (e.g. Wills 2012), stripping it above leaves the
288+
# last entry without an explicit comma/semicolon. Treat end-of-block as
289+
# an implicit terminator so the last entry is still parsed correctly.
290+
if (length(stateStarts) == length(stateEnds) + 1L) {
291+
stateEnds <- c(stateEnds, length(stateLines) + 1L)
292+
}
286293
if (length(stateStarts) != length(stateEnds)) {
287294
warning("Could not parse character states; does each end with a ' or ;?.")
288295
} else {
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
#NEXUS
2+
BEGIN TAXA;
3+
DIMENSIONS NTAX=2;
4+
TAXLABELS
5+
TaxonA TaxonB
6+
;
7+
END;
8+
9+
BEGIN CHARACTERS;
10+
DIMENSIONS NCHAR=2;
11+
FORMAT DATATYPE=STANDARD MISSING=? GAP=-;
12+
CHARLABELS
13+
'First character'
14+
'Second character'
15+
;
16+
STATELABELS
17+
1
18+
"state zero"
19+
"state one"
20+
,
21+
2
22+
"absent"
23+
"present"
24+
;
25+
MATRIX
26+
TaxonA 01
27+
TaxonB 10
28+
;
29+
END;

tests/testthat/test-parsers.R

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -321,6 +321,16 @@ test_that("ReadCharacters() reads CHARSTATELABELS", {
321321

322322
})
323323

324+
test_that("ReadCharacters() handles STATELABELS with terminal semicolon", {
325+
# Last STATELABELS entry terminated by block ';' rather than a comma
326+
labels <- expect_no_warning(
327+
ReadCharacters(system.file("extdata/tests/statelabels-semicolon.nex",
328+
package = "TreeTools"))
329+
)
330+
expect_equal(attr(labels, "state.labels"),
331+
list(c("state zero", "state one"), c("absent", "present")))
332+
})
333+
324334
test_that("MorphoBankDecode() decodes", {
325335
expect_equal("' -- x \n 1--2", MorphoBankDecode("'' - x^n 1-2"))
326336
})

0 commit comments

Comments
 (0)