Skip to content

Commit 610778f

Browse files
committed
SPARQL String. Unicode escapes exclude surrogates
1 parent cca75e6 commit 610778f

File tree

1 file changed

+56
-30
lines changed

1 file changed

+56
-30
lines changed

spec/index.html

+56-30
Original file line numberDiff line numberDiff line change
@@ -292,7 +292,7 @@
292292
<h2>Abstract</h2>
293293
<p>
294294
RDF is a directed, labeled graph data model for representing information in the
295-
Web. This specification defines the syntax and semantics of the SPARQL query language for
295+
Web. This specification defines the syntax and semantics of the SPARQL Query Language for
296296
RDF. SPARQL can be used to express queries across diverse data sources, whether the data is
297297
stored natively as RDF or viewed as RDF via middleware. SPARQL contains capabilities for
298298
querying required and optional graph patterns along with their conjunctions and
@@ -319,11 +319,11 @@ <h2>Introduction</h2>
319319
RDF is a directed, labeled graph data model for representing information in the Web. RDF is
320320
often used to represent, among other things, personal information, social networks, metadata
321321
about digital artifacts, as well as to provide a means of integration over disparate sources of
322-
information. This specification defines the syntax and semantics of the SPARQL query language
322+
information. This specification defines the syntax and semantics of the SPARQL Query Language
323323
for RDF.
324324
</p>
325325
<p>
326-
The SPARQL query language for RDF is designed to meet the use cases and
326+
The SPARQL Query Language for RDF is designed to meet the use cases and
327327
requirements identified by the RDF Data Access Working Group in [[RDF-DAWG-UC]],
328328
the SPARQL 1.1 Working Group in [[SPARQL-FEATURES]], and the RDF-star Working Group.
329329
</p>
@@ -335,7 +335,7 @@ <h3>Document Outline</h3>
335335
</p>
336336
<p>
337337
This section of the document, <a href="#introduction">section 1</a>, introduces the SPARQL
338-
query language specification. It presents the organization of this specification document and
338+
Query Language specification. It presents the organization of this specification document and
339339
the conventions used throughout the specification.
340340
</p>
341341
<p>
@@ -5315,7 +5315,7 @@ <h4>Operator Extensibility</h4>
53155315
</section>
53165316
<section id="SparqlOps">
53175317
<h3>Function Definitions</h3>
5318-
<p>This section defines the operators and functions introduced by the SPARQL Query language.
5318+
<p>This section defines the operators and functions introduced by the SPARQL query language.
53195319
The examples show the behavior of the operators as invoked by the appropriate grammatical
53205320
constructs.</p>
53215321
<section id="func-forms">
@@ -10513,30 +10513,49 @@ <h4>Notes</h4>
1051310513
<h2>SPARQL Grammar</h2>
1051410514
<p>The SPARQL grammar covers both SPARQL Query and [[[SPARQL11-UPDATE]]].</p>
1051510515
<section id="queryString">
10516-
<h3>SPARQL Request String</h3>
10516+
<h3>SPARQL String</h3>
1051710517
<p>
10518-
A <dfn data-lt="SPARQLRequestString">SPARQL Request String</dfn> is
10519-
a <a>SPARQL Query String</a> or <a>SPARQL Update String</a> and is a Unicode character string
10520-
(c.f. section 6.1 String concepts of [[CHARMOD]]) in the language defined by the following
10521-
grammar.</p>
10518+
<span id="defn_SPARQLRequestString"></span>
10519+
A <dfn>SPARQL string</dfn> is an
10520+
<a data-cite="RDF12-CONCEPTS#dfn-rdf-string">RDF string</a> that
10521+
conforms to the grammar given in this section.
10522+
</p>
10523+
<p class="note">
10524+
An <a data-cite="RDF12-CONCEPTS#dfn-rdf-string">RDF string</a> is
10525+
a sequence of
10526+
<a data-cite="I18N-GLOSSARY#dfn-code-point" class="lint-ignore">Unicode code points</a>
10527+
which are <a data-cite="I18N-GLOSSARY#dfn-scalar-value" class="lint-ignore">Unicode scalar values</a>.
10528+
Unicode scalar values do not include the
10529+
<a data-cite="I18N-GLOSSARY#dfn-surrogate" class="lint-ignore">surrogate code points</a>.
10530+
</p>
1052210531
<p>
10523-
A <dfn data-lt="SPARQLQueryString">SPARQL Query String</dfn> starts
10524-
at the <a href="#rQueryUnit">QueryUnit</a> production.</p>
10532+
<span id="defn_SPARQLQueryString"></span>
10533+
A <dfn>SPARQL query string</dfn> is a
10534+
<a>SPARQL string</a> that conforms to the grammar starting at
10535+
the <a href="#rQueryUnit">QueryUnit</a> production.
10536+
</p>
1052510537
<p>
10526-
A <dfn data-lt="SPARQLUpdateString">SPARQL Update String</dfn> starts
10527-
at the <a href="#rUpdateUnit">UpdateUnit</a> production.</p>
10528-
<p>For compatibility with future versions of Unicode, the characters in this string may
10538+
<span id="defn_SPARQLUpdateString"></span>
10539+
A <dfn>SPARQL update string</dfn> is a
10540+
<a>SPARQL string</a> that conforms to the grammar starting at
10541+
the <a href="#rUpdateUnit">UpdateUnit</a> production.
10542+
</p>
10543+
<p>
10544+
For compatibility with future versions of Unicode, the characters in this string may
1052910545
include Unicode codepoints that are unassigned as of the date of this publication (see
1053010546
[[[UAX31]]] [[UAX31]] section 4 Pattern Syntax). For productions with excluded character
1053110547
classes (for example <code>[^&lt;&gt;'{}|^`]</code>), the characters are excluded from the
10532-
range <code>#x0 - #x10FFFF</code>.</p>
10548+
range <code>#x0 - #x10FFFF</code>.
10549+
</p>
1053310550
</section>
1053410551

1053510552
<section id="codepointEscape">
1053610553
<h3>Codepoint Escape Sequences</h3>
10537-
<p>A SPARQL Query String is processed for codepoint escape sequences before parsing by the
10554+
<p>
10555+
A <a>SPARQL string</a> is processed for codepoint escape sequences before parsing by the
1053810556
grammar defined in EBNF below. The codepoint escape sequences for a SPARQL query string
10539-
are:</p>
10557+
are:
10558+
</p>
1054010559
<span class="doc-ref" id="table68"></span>
1054110560
<table title="Codepoint escapes">
1054210561
<colgroup>
@@ -10554,15 +10573,19 @@ <h3>Codepoint Escape Sequences</h3>
1055410573
<a href="#HEX">HEX</a> <a href="#HEX">HEX</a>
1055510574
</td>
1055610575
<td>A Unicode code point in the range U+0 to U+FFFF inclusive corresponding to the
10557-
encoded hexadecimal value.</td>
10576+
encoded hexadecimal value, excluding U+D800 to U+DFFF, the
10577+
<a data-cite="I18N-GLOSSARY#dfn-surrogate">surrogate code points</a>.
10578+
</td>
1055810579
</tr>
1055910580
<tr>
1056010581
<td>
1056110582
<span class="token">'\U'</span> <a href="#HEX">HEX</a> <a href="#HEX">HEX</a>
1056210583
<a href="#HEX">HEX</a> <a href="#HEX">HEX</a> <a href="#HEX">HEX</a> <a href="#HEX">HEX</a> <a href="#HEX">HEX</a> <a href="#HEX">HEX</a>
1056310584
</td>
1056410585
<td>A Unicode code point in the range U+0 to U+10FFFF inclusive corresponding to the
10565-
encoded hexadecimal value.</td>
10586+
encoded hexadecimal value, excluding U+D800 to U+DFFF, the
10587+
<a data-cite="I18N-GLOSSARY#dfn-surrogate">surrogate code points</a>.
10588+
1056610589
</tr>
1056710590
</tbody>
1056810591
</table>
@@ -10575,13 +10598,16 @@ <h3>Codepoint Escape Sequences</h3>
1057510598
&lt;ab\u00E9xy&gt; # Codepoint 00E9 is Latin small e with acute - é
1057610599
\u03B1:a # Codepoint x03B1 is Greek small alpha - α
1057710600
a\u003Ab # a:b -- codepoint x3A is colon</pre>
10578-
<p>Codepoint escape sequences can appear anywhere in the query string. They are processed
10601+
<p>
10602+
Codepoint escape sequences can appear anywhere in the query string. They are processed
1057910603
before parsing based on the grammar rules and so may be replaced by codepoints with
10580-
significance in the grammar, such as "<code>:</code>" marking a prefixed name.</p>
10604+
significance in the grammar, such as "<code>:</code>" marking a prefixed name.
10605+
</p>
1058110606
<p>These escape sequences are not included in the grammar below. Only escape sequences for
1058210607
characters that would be legal at that point in the grammar may be given. For example, the
1058310608
variable "<code>?x\u0020y</code>" is not legal (<code>\u0020</code> is a space and is not
10584-
permitted in a variable name).</p>
10609+
permitted in a variable name).
10610+
</p>
1058510611
</section>
1058610612
<section id="whitespace">
1058710613
<h3>White Space</h3>
@@ -10629,22 +10655,22 @@ <h3>Blank Nodes and Blank Node Identifiers</h3>
1062910655
<li><code><a href="#rDeleteData">DELETE DATA</a></code></li>
1063010656
<li>a <code><a href="#rDeleteClause">DeleteClause</a></code></li>
1063110657
</ul>
10632-
<p>in a <a data-cite="SPARQL11-UPDATE#terminology">SPARQL Update
10658+
<p>in a <a data-cite="SPARQL11-UPDATE#terminology">SPARQL update
1063310659
request</a>.
1063410660
</p>
1063510661
<p>
1063610662
<a data-cite="RDF12-CONCEPTS#dfn-blank-node-identifier">Blank node identifiers</a>
10637-
are scoped to the <a>SPARQL Request String</a> in which they occur.
10663+
are scoped to the <a>SPARQL string</a> in which they occur.
1063810664
Different uses of the same blank node identifier in a request
1063910665
string refer to the same blank node. Fresh blank nodes are generated for each request;
1064010666
blank nodes can not be referenced by identifier across requests.</p>
1064110667
<p>The same blank node identifier can not be used in:</p>
1064210668
<ul>
1064310669
<li>two separate basic graph patterns in a SPARQL Query</li>
10644-
<li>two <code><a href="#rModify">WHERE</a></code> clauses within a single SPARQL Update
10670+
<li>two <code><a href="#rModify">WHERE</a></code> clauses within a single SPARQL update
1064510671
request</li>
1064610672
<li>two <code><a href="#rInsertData">INSERT DATA</a></code> operations within a single
10647-
SPARQL Update request</li>
10673+
SPARQL update request</li>
1064810674
</ul>
1064910675
<p>Note that the same blank node identifier can occur in different
1065010676
<a href="#rQuadPattern">QuadPattern</a> clauses in a [[[SPARQL11-UPDATE]]] request.</p>
@@ -10723,8 +10749,8 @@ <h3>Grammar</h3>
1072310749
<li>Escape sequences are case sensitive.</li>
1072410750
<li>When tokenizing the input and choosing grammar rules, the longest match is chosen.</li>
1072510751
<li>The SPARQL grammar is LL(1) when the rules with uppercased names are used as terminals.</li>
10726-
<li>There are two entry points into the grammar: <code>QueryUnit</code> for SPARQL queries,
10727-
and <code>UpdateUnit</code> for SPARQL Update requests.</li>
10752+
<li>There are two entry points into the grammar: <code>QueryUnit</code> for the SPARQL query language
10753+
and <code>UpdateUnit</code> for the SPARQL update language.</li>
1072810754
<li>In signed numbers, no white space is allowed between the sign and the number.
1072910755
The <code><a href="#rAdditiveExpression">AdditiveExpression</a></code> grammar rule allows for this by
1073010756
covering the two cases of an expression followed by a signed number. These
@@ -12126,7 +12152,7 @@ <h3>Grammar</h3>
1212612152
<section id="conformance">
1212712153
<h2>Conformance</h2>
1212812154
<p>See Section <a href="#grammar">19 SPARQL Grammar</a> regarding conformance of
12129-
<a>SPARQL Query strings</a>, and section
12155+
<a>SPARQL query strings</a>, and section
1213012156
<a href="#QueryForms">16 Query Forms</a> for conformance of query results.
1213112157
See section <a href="#mediaType">22. Internet Media Type</a> for conformance
1213212158
to the application/sparql-query media type.</p>

0 commit comments

Comments
 (0)