SHACL 1.2 Rules

This document defines SHACL Rules.

SHACL, the Shapes Constraint Language, is a language for describing the structure of RDF graphs. SHACL may be used for a variety of purposes such as validating, inferencing, modeling domains, generating ontologies to inform other agents, building user interfaces, generating code, and integrating data.

SHACL Rules provides inferencing with the generation of new RDF data from a combination of a set of rules and a base data graph. Rules can be expressed as RDF or in the SHACL Rules Language (SRL).

This specification is published by the Data Shapes Working Group.

Introduction

This document introduces inference rules for SHACL 1.2, a mechanism for deriving new RDF triples from existing RDF data through declarative rules. The document defines the syntax and semantics of rule-based inference.

Implementations of SHACL Rules provide two operations. The infer operation that applies the rules to a given base graph and produces an inference graph containing the RDF triples derived by rule execution. Combining the inference graph with the base graph is optional and left to users. The query operation determines whether a given goal pattern can be derived from the base graph using the rules.

SHACL Rules allow the use of new RDF terms, including blank nodes, that can be used in triple templates in the head of rules.

SHACL Rules also support constructs, such as negation as failure, that could lead to different inferred graphs depending on the order in which rules are executed. To avoid this, rules are evaluated using the technique of stratification, which establishes a single, implicit ordering among rules, ensuring that the same inference graph is always produced.

Terminology

Connect to definitions in RDF 1.2 Concepts.

The following definitions from other specifications are used in this document: @@

Document Conventions

Some examples in this document use Turtle [[turtle]]. The reader is expected to be familiar with SHACL [[shacl]] and SPARQL [[sparql-query]].

Within this document, the following namespace prefix bindings are used:

Prefix	Namespace
`rdf:`	`http://www.w3.org/1999/02/22-rdf-syntax-ns#`
`rdfs:`	`http://www.w3.org/2000/01/rdf-schema#`
`sh:`	`http://www.w3.org/ns/shacl#`
`srl:`	`http://www.w3.org/ns/shacl-rules#`
`shnex:`	`http://www.w3.org/ns/shacl-node-expr#`
`xsd:`	`http://www.w3.org/2001/XMLSchema#`
`ex:`	`http://example.com/`

Throughout the document, color-coded boxes containing RDF graphs in Turtle will appear. These fragments of Turtle documents use the prefix bindings given above.

Version Labels
Version Label
"1.2"

SHACL Rules

SHACL rules infer new triples given a [=base graph=] and a [=rule set=]. The output of evaluation is an [=inference graph=] containing the derived triples that do not appear in the base graph.

Each [=rule=] has a pattern, called the [=body=], and a result template, called the [=head=]. A rule is executed by finding the values for variables in the body so that the body matches the combined base graph and any inferred triples from the execution up to this point. These values are then used to instantiate the triple templates in the rule head to produce new inferred triples.

The rules are executed until no more triples are inferred, and rules may be executed more than once as new inferred triples become available.

SHACL Rules execution is defined so that the order of rule execution does not lead to different outcomes when creating new RDF terms, including new blank nodes, nor when testing for the absence of a pattern. In other words, the same inference graph is produced regardless of the order of rule execution.

SHACL Rules has both a RDF syntax, as well as a human-friendly syntax, inspired by [[[SPARQL12-QUERY]]]. Rule set evaluation contains elements similar to SPARQL, with differences in the details to ensure that the same inference graph is produced regardless of the order of rule execution.

Basic Patterns

In this first example, we have the following data graph and rule set:

   :A :fatherOf :X .
   :B :motherOf :X .
   :C :motherOf :A .

RULE { ?x :childOf ?y } WHERE { ?y :fatherOf ?x }
RULE { ?x :childOf ?y } WHERE { ?y :motherOf ?x }

The above rules, applied to the data, will conclude that: `:X` is the `:childOf` `:A` and `:B`:

   :X :childOf :A .
   :X :childOf :B .
   :A :childOf :C .

We can then derive `:descendedFrom` relationships by adding a rule that depends on `:childOf` triples produced by the other rules:

RULE { ?x :childOf ?y } WHERE { ?y :fatherOf ?x }
RULE { ?x :childOf ?y } WHERE { ?y :motherOf ?x }
RULE { ?x :descendedFrom ?y } WHERE { ?x :childOf ?y }

The outcome is:

   :A :descendedFrom :C .
   :X :descendedFrom :B .
   :X :descendedFrom :A .
   :X :childOf :B .
   :X :childOf :A .
   :A :childOf :C .

Recursion

We can add a rule that depends on `:descendedFrom` triples to infer that `:X` is `:descendedFrom` `:C`:

RULE { ?x :childOf ?y } WHERE { ?y :fatherOf ?x }
RULE { ?x :childOf ?y } WHERE { ?y :motherOf ?x }
RULE { ?x :descendedFrom ?y } WHERE { ?x :childOf ?y }
RULE { ?x :descendedFrom ?y } WHERE { ?x :childOf ?z . ?z :descendedFrom ?y }

giving:

   :A :descendedFrom :C .
   :X :descendedFrom :C .
   :X :descendedFrom :B .
   :X :descendedFrom :A .
   :X :childOf :B .
   :X :childOf :A .
   :A :childOf :C .

This adds the triple `:X :descendedFrom :C`.

This last rule is a recursive rule, the body of the rule depends on the head of the rule.

Filtering

We can use expressions in the body of rules to restrict the values of variables in the matching of the body. For example, given data about towns and their populations, we can infer a class for towns with a population greater than 1500:

:town1 :population 1000 .
:town2 :population 2000 .

RULE { ?x rdf:type :largeTown } WHERE { ?x :population ?p . FILTER(?p > 1500) }

:town2 rdf:type :largeTown .

`FILTER` evaluates an expression and keeps the current set of variable bindings if the expression evaluates to true, and it discards the current set of variable bindings if the expression evaluates to false. This is the same as the `FILTER` operation of SPARQL and SHACL Rules provides many of the same functions and operators as SPARQL.

Negation

Negation allows you to specify a pattern that must not match. This is called "negation as failure".

In order to evaluate a negation element, the rules evaluation algorithm ensures that all the rules that could produce triples matching the pattern in the negation element have been completed. This is called [=stratification=] and ensures that the negation is based on all the relevant possible triples, whether from the data or from inferred triples inferred by other rules.

:X1 rdf:type :Place ;
    :population 1000 .

:X2 rdf:type :Place ;
    :population 2000 .

:X3 rdf:type :Place .

  RULE { ?x rdf:type :UnclassifiedSize } WHERE { 
      ?x rdf:type :Place .
      NOT { ?x :population ?p . }
  }

:X3 rdf:type :UnclassifiedSize .

Assignment and Creating RDF Terms

Assignment allows you to assign the result of an expression to a variable in the body of a rule. This can be used to create new RDF terms based on the data.

RULE { ?x :distanceKm ?kilometers }
WHERE { 
    ?x :distanceMiles ?miles .
    SET ( ?kilometers := ?miles * 1.60934 )
}

This can be combined with testing for the absence of a triple already recording the distance in kilometers.

RULE { ?x :distanceKm ?kilometers }
WHERE { 
    ?x :distanceMiles ?miles
    NOT { ?x :distanceKm ?km }
    SET ( ?kilometers := ?miles * 1.60934 )
}

Rules involving [=assignments=] and rules that create blank nodes in their [=rule head=] are [=run-once rules=]. Such rules are run after all the rules that could produce data that they depend on, and before any rules that depend on the data they produce. Rules that involve blank node in the [=rule head=] are also creating new RDF terms and are run-once rules.

This condition ensures that rules that create RDF terms do not generate new terms multiple times, with potentially different outcomes, and also that such rules do not loop back to themselves and cause an unbounded number of RDF terms.

If evaluating the expression in an [=assignment=] causes an error, then the current solution mapping is rejected by the [=assignment=].

Importing rules

A SHACL [=rule set=] can incorporate other rule sets by including their URLs in the [=rule imports=] of the rule set. This allows rules to be structured into libraries shared between rule sets.

The `IMPORTS` statements of a rule set are processed before any of the rules in the rule set are evaluated. During the importing step, if an imported rule set has its own imports, those are also processed recursively. Traversing `IMPORTS` statements during the processing of rule sets may lead to cyclic imports. A rule set is imported only once; cycles in the import statements graph does not lead to infinite loops.

Relationship between SHACL Rules and SPARQL

SHACL Rules and SPARQL have a close relationship. SHACL Rules are designed to be compatible with SPARQL, and many of the constructs in SHACL Rules are inspired by SPARQL. However, there are some differences:

@@ RULES have additional well-formedness conditions that ensure variables are always bound before use.
@@ `RULE` and `CONSTRUCT` - `RULE` have a restricted body syntax and output to the inference graph.
@@ `SET` and `BIND` - different error handling
@@ `NOT` and SPARQL `EXISTS`,`NOT EXISTS` - `NOT` has a restricted negation body
@@ Restricted property paths (no arbitrary length operators `*` and `+`)
@@ Restricted functions. No `COALESCE`, `BOUND` (rules well-formedness ), `RAND` (different result each time) and no hash functions. `NOW()` is permitted and is defined to return the same point in time through a rule set evaluation, as it does in SPARQL.

Data Blocks

Data blocks allow concisely providing RDF triples directly to the rule set evaluation. Triples in datablocks are added to the inference graph and are available for matching in the body of rules.

@@examples to be updated

DATA {
  :father rdfs:subClassOf :familyRelationship .
  :mother rdfs:subClassOf :familyRelationship .
}

Rule Tuples

At risk:

Rule tuples are disjoint from triples. They are tuples of RDF terms (no variables) and exist only during evaluation of a rule set. They can be used to record intermediate results during rule evaluation and to pass data between rules.

Syntax of tuple patterns, templates and tuples:

`TUPLE(termOrVar , ...)`
Shorthand: `$(termOrVar , ...)`

Often, the first argument will be a fixed name.

There is a tuple store which holds tuples for the lifetime of the evaluation. The tuple store holds duplicate data tuples (unlike an RDF graph which is a set).

Shape Rules Abstract Syntax

The Shape Rules Abstract Syntax is the logical structure of SHACL Rules. It is used to define the execution algorithm of SHACL Rules. Each of the two concrete syntax forms of SHACL Rules, the SHACL Rules Language (SRL) and the RDF syntax (SRL/RDF), provides a way to express the abstract syntax.

Elements of the Abstract Syntax

Variable: A [=variable=] represents a possible [=RDF term=] in a triple pattern. Variables are also used in expressions.
Expression: An [=expression=] is a function or a functional form; the arguments are [=RDF terms=]. An expression is evaluated with respect to a [=solution mapping=], giving an [=RDF term=] as the result. Expressions are compatible with SHACL list parameter functions and with SPARQL expressions.
Condition: A [=condition=] is an [=expression=] that evaluates to true or false. [=Conditions=] are used to restrict the values of variables in pattern matching.
Data block: A [=data block=] is a set of triples. These triples are added to the inference graph as additional facts and are included in the inference process.
Triple template: A [=triple template=] is 3-tuple where each element is either a [=variable=] or an [=RDF term=] (which might be a [=triple term=]). The second element of the tuple must be an [=IRI=] or a [=variable=]. [=Triple templates=] appear in the [=head=] of a [=rule=].
Triple pattern: A [=triple pattern=] is 3-tuple where each element is either a [=variable=] or an [=RDF term=] (which might be a triple term). The second element of the tuple must be an [=IRI=] or a [=variable=]. [=Triple patterns=] can appear as [=triple pattern elements=] as well as inside [=negation elements=].
Condition element: A [=condition element=] is a [=condition=] that appears as a [=rule body element=].
Triple pattern element: A [=triple pattern element=] is a [=triple pattern=] used as a [=rule body element=].
Negation element: A [=negation element=] is a [=rule body element=] comprised of a sequence of [=triple patterns=] and [=conditions=].
Assignment element: An [=assignment element=] is a pair consisting of a variable, called the assignment variable, and an expression, called the assignment expression. [=Assignment elements=] appear in the [=body=] of a [=rule=].
Rule body element: A [=rule body element=] (often just "rule element") is any element that can appear in a [=rule body=], i.e., a [=triple pattern element=], a [=condition element=], a [=negation element=], or an [=assignment element=].
Rule head: A [=rule head=] is a sequence of [=triple templates=].
Rule body: A [=rule body=] is a sequence of [=rule body elements=].
Rule imports: [=Rule imports=] (often just "imports") are a collection of URLs for other rule sets that will be included during evaluation.
Rule: A [=rule=] is a pair of a [=rule head=] (often just "head") and a [=rule body=] (often just "body").
Run-once rule: A [=run-once rule=] is a rule that is run exactly once at a particular point in the evaluation of a rule set.
General rule: A [=general rule=] is a rule that is not a [=run-once rule=]. General rules may run more than once during rule set evaluation.
Rule set: A [=rule set=] is a collection of zero or more [=rules=], a collection of zero or more [=data blocks=], and a collection of zero or more [=rule imports=].
Base graph: A [=base graph=] is the [=RDF Graph=] given as input to the evaluation process.
Inference graph: An [=inference graph=] is an [=RDF Graph=] produced by evaluating a [=rule set=]. It contains all triples not present in the [=base graph=] that are inferred by applying the [=rule set=] to the [=base graph=].
Infer: [=Infer=] is the operation that applies a [=rule set=] to a given [=base graph=] and produces an [=inference graph=] containing inferred triples.
Query: [=Query=] is the operation that determines whether a given goal pattern can be derived from a [=base graph=] using the [=rule set=].

In a [=triple pattern=] or a [=triple template=], position 1 of the tuple is informally called the subject, position 2 is informally called the predicate, and position 3 is informally called the object.

Well-formedness Conditions

Well-formedness is a set of conditions on the abstract syntax of SHACL rules. Together, these conditions ensure that a [=variable=] in the [=head=] of a rule has a value defined in the [=body=] of the rule; that each variable in an [=condition element=] or [=assignment expression=] has a value at the point of evaluation; and that each assignment in a rule introduces a new variable, one that has not been used earlier in the rule body.

A [=rule=] is a well-formed rule if all of the following conditions are met:

For every [=variable=] appearing in a [=triple template=] of the [=head=] of the [=rule=], there is one or more occurrences of a [=variable=] of the same name in the [=triple patterns=] in the [=body=], or in an [=assignment element=] in the body, or in both.
For every [=variable=] in an [=expression=] at position i of the [=body=], there is a corresponding [=variable=] of the same name occurring in a [=triple pattern=] at some position j, or there is an [=assignment variable=] at some position j, where i > j.
For every [=variable=] in an expression in a [=negation element=] at position i of the [=body=], either there is a corresponding [=variable=] of the same name occurring in a [=triple pattern=] or an [=assignment variable=] at some position j, where i > j, or a [=variable=] of the same name occurring in some [=triple pattern=] or [=assignment variable=] of the [=negation element=] preceeding the expression of the [=negation element=].
Each [=assignment variable=] is used in only one [=assignment element=] in the [=body=] of the [=rule=].
An [=assignment variable=] at position i of a [=rule body=] does not occur in any [=triple pattern=] at position j where i > j.

A [=rule set=] is "well-formed" if and only if all of the [=rules=] of the rule set are "well-formed".

Revisit

Rule Dependency

A rule `R1` depends on a rule `R2` if the output of the second rule affects the evaluation of the body of the first rule. That is, the head of `R2` has a [=triple template=] that might generate a triple that matches a [=triple pattern=] in the body of `R1`, either as a [=triple pattern element=] or inside a [=negation element=].

There are two kinds of dependencies: [=closed dependencies=] and an [=open dependencies=]. A closed dependency ensures that rule `R2` has generated all its possible output before rule `R1` is executed. If a rule dependency is not closed, it is an open dependency which allows the first rule `R1` to be executed while the rule `R2` might be run again to generate further triples which can then cause `R1` to be reevaluated with the new triples from `R2`.

In this first example, the first rule depends on the second rule. It is an open dependency.

    RULE { ?x :descendedFrom ?y } WHERE { ?x :childOf ?y }
    RULE { ?x :childOf ?y } WHERE { ?y :fatherOf ?x }

In this second example, the first rule depends on the second rule. It is a closed dependency.

    RULE @@

Triple pattern matching

A [=triple pattern=] matches a [=triple template=] if the triple template may generate a triple that matches the triple pattern.

Triple pattern dependency

A [=triple pattern=] depends on a [=triple template=] if the [=triple pattern=] could possibly match the [=triple template=].

A [=triple pattern=] depends on a [=rule=] if the [=triple pattern=] has dependency on any of the [=triple templates=] in the [=head=] of the rule.

Rule dependency

Rule `R1` [=depends on=] `R2` if any [=triple pattern=] in the body of `R1`, whether as a [=triple pattern element=] or inside a [=negation element=], depends on a [=triple template=] in the head of `R2`.

Closed dependency

A [=rule dependency=] of rule `R1` on rule `R2` is a [=closed dependency=] if any of the following conditions hold:

A [=triple pattern=] occurring inside a [=negation element=] of `R1` matches a [=triple template=] in the [=rule head=] of `R2`.
Rule `R1` [=depends on=] rule `R2` and `R1` has an [=assignment element=].
Rule `R1` [=depends on=] rule `R2` and the [=rule head=] of `R1` has a blank node.

Open dependency

A [=rule dependency=] of rule `R1` on rule `R2` is an [=open dependency=] if the dependencyis not a [=closed dependency=]. That is, any [=triple pattern=] of `R1` that depends on `R2` occurs only as a [=triple pattern element=].

A [=triple template=] with components `tSubj`, `tPred`, `tObj` can possibly generate a triple with component RDF terms `s`, `p`, `o` if

`tSubj` is a variable, or `tSubj` is the same RDF term as `s`
`tPred` is a variable, or `tPred` is the same RDF term as `p`
`tObj` is a variable, or `tObj` is the same RDF term as `o`

In addition, if any pair of `tSubj`, `tPred`, and `tObj` are the same variable, then the corresponding pair of `s`, `p`, and `o` must be the same.

Dependency Graph

The dependencies between rules are represented as a directed graph, called the [=dependency graph=]. The vertices of the graph are the rules of the rule set, and the edges are labeled either open or closed according to whether the dependency is an [=open dependency=] or a [=closed dependency=].

Dependency graph: A [=dependency graph=] of a [=rule set=] is a directed graph where each vertex is a [=rule=] in the rule set, and an edge exists from rule `R1` to rule `R2` if `R1` depends on `R2`. The edge is labeled either open or closed according whether the dependency is an [=open dependency=] or a [=closed dependency=].
Transitive rule dependency: A rule `R1` has a [=transitive dependency=] on rule `R2` if there is a path in the [=dependency graph=] from `R1` to `R2`.
Recursive rule dependency: A rule `R` has a [=recursive dependency=] if there is a cyclic path in the [=dependency graph=] involving `R`.

The dependency graph is not affected by the data graph.

Dependency Graph Algorithm

The following algorithm gives one possible method for constructing the [=dependency graph=] from a [=rule set=]. Conformance depends on producing a dependency graph that meets the definitions of a dependency graph, not on the use of this procedure.


define mergeLabel(oldLabel, newLabel):
    ## Closed dependency overrides open dependency.
    if oldLabel == "open" and newLabel == "open":
        return "open"
    else:
        return "closed"
    endif
enddefine

## output -- Dependency graph with rule vertices and labeled edges.
define buildDependencyGraph(ruleSet):
    ## edgeLabelMap maps (R1, R2) to "open" or "closed"
    let edgeLabelMap be a map from pair (rule, rule) to label

    foreach rule R1 in ruleSet:
        ## Classify each triple pattern TP in the rule as requiring "open" or "closed"
        ## depending on whether it is in a negation element or not.
        let bodyDependencies = {}
        foreach rule body element RBE in the body of R1:
            if RBE is a negation element:
                foreach triple pattern TP in RBE:
                    let item be a pair (TP, "closed")
                    add item to bodyDependencies
                endfor
            else if RBE is a triple pattern element of triple pattern TP:
                let item be a pair (TP, "open")
                add item to bodyDependencies
            else if RBE is a condition element:
                ## Do nothing
            else if RBE is an assignment element:
                ## Do nothing
            endif
        endfor

        foreach pair (triple pattern TP, depLabel) in bodyDependencies:
            if R1 has an assignment element:
              set depLabel to "closed"
            endif
            if R1 has a triple template with a blank node:
              set depLabel to "closed"
            endif
            ## Find depenencies for this triple pattern element or negation element.
            foreach rule R2 in ruleSet:
                foreach triple template TT in head of R2:
                    ## "possibly generate" / matching is defined in
                    ## section 3.3 
                    if TT can possibly match triple pattern TP:
                        let key = (R1, R2)
                        if edgeLabelMap contains key:
                            let oldLabel = edgeLabelMap.get(key)
                            let merged = mergeLabel(oldLabel, depLabel)
                            edgeLabelMap.set(key, merged)
                        else:
                            edgeLabelMap.set(key, depLabel)
                        endif
                    endif
                endfor
            endfor
        endfor
    endfor

    let DP = { }
    foreach entry ((R1, R2), label) in edgeLabelMap:
        add edge (R1 -> R2) labeled label to DP
    endfor

    the result is DP
    enddefine

Examples:

            @@ Examples of triple patttern dependencies.

            @@ Examples of rule dependencies.

Stratification

[=Stratification=] is the process of partitioning a [=rule set=] into an ordered sequence of [=stratification layers=] (also known as "strata", singular "stratum"). Rules in lower [=strata=] are evaluated before rules in higher [=strata=].

[=Stratification=] imposes constraints on dependencies between [=rules=] to ensure that [=negation elements=], [=assignment elements=], and blank nodes created in a [=rule head=] depend only on results computed using earlier (lower) [=strata=] and the [=base graph=]. This guarantees a single, well-defined, and finite outcome from the evaluation of a [=rule set=] over a given [=base graph=].

A stratification process may also be used to make other evaluation decisions. This document describes the necessary conditions for consistent evaluation and gives one possible way to form a stratification. Implementations need to meet the conditions described here in order to get compatible behavior but they are not required to implement the algorithm as presented.

Stratification layer: A [=stratification layer=] `SL`, is a pair of disjoint sets of rules (`SL.once`, `SL.general`) . `SL.once` contains [=run-once rules=], which are rules that use [=assignment elements=] or produce blank nodes in the [=rule head=]; these rules are each evaluated exactly once at the start of evaluation of the [=stratification layer=]. `SL.general` contains the remaining rules, which are evaluated until no new triples are inferred.
Stratification: A [=stratification=] of a [=rule set=] is a sequence of [=stratification layers=]. Each rule in a [=rule set=] appears in exactly one of the sets of one of the [=stratification layers=].

Stratification Condition

[=Stratification=] is only defined when the following condition is satisfied. If a [=rule set=] does not meet this condition, then this specification does not define an outcome for the evaluation of such a [=rule set=].

Stratification Condition: The [=stratification condition=] requires that there is no [=recursive dependency=] involving a [=closed dependency=] in the [=dependency graph=] for a [=rule set=].

In other words, there is no `NOT` or run-once rule (assignment or rule [=triple template=] involving a blank node) in any transitive dependency cycle of the [=dependency graph=].

Stratification Algorithm

The following algorithm gives one possible stratification based solely on the rule set.


## output -- Map: Integer -> Set of rules.

define stratification(ruleSet):

    let DP = Dependency graph for the rule set.
    let stratumMap be a map from rule to integer

    ## The dependency graph should satisfy the stratification condition.
    ## The check for unbounded stratification is a guard 
    ## due to a violation of the stratification condition.
    let limit = num rules + 1
    let maxStratum = 0

    ## initialize stratumMap
    foreach rule in ruleSet:
        stratumMap.set(rule, 0)
        endfor

    boolean changed = true;
    while changed:
        changed = false;
        foreach edge E in DP:
            ## Edge from pRule to qRule with a label
            let pRule = source of edge
            let qRule = destination of the edge
            let label = edge label

            if label == "open" :
                if stratumMap.get(pRule) < stratumMap.get(qRule) :
                    stratumMap.set(pRule, stratumMap.get(qRule))
                    changed = true;
                endif
            endif
            if label == "closed" :
                if stratumMap.get(pRule) <= stratumMap.get(qRule) :
                    let xStratum = 1 + stratumMap.get(qRule)
                    if ( xStratum > limit )
                        ## Stratification requirement violated
                        error "Stratification error"
                        endif
                    stratumMap.set(pRule, xStratum)
                    maxStratum = max(maxStratum, xStratum)
                    changed = true;
                endif
            endif
        endfor
    endwhile

    ## Initialize the result map.
    let stratumRules be a map from integer to rules.
    for i = 0 to maxStratum
        stratumRules.set(i, {})
    endfor

    ## Gather rules in stratumMap with the same level number
    for rule R in map stratumMap:
        let stratumNum = stratumMap.get(R)
        add R to stratumRules.get(stratumNum)
    endfor

    ## Partition each level into once and general
    let stratumLevels be a sequence of pairs of sets of rules.
    for i = 0 to maxStratum:
        let rules = stratumRules.get(i)
        let once = { R in rules | R is a run-once rule }
        let general = rules \ once
        stratumLevels.set(i, pair(once, general))
    endfor

    the result is stratumLevels
enddefine

A consequence of the [=stratification condition=] is that once a [=run-once rule=] is evaluated, the data used to determine the outcome of the rule will not change during further evaluation.

Concrete Syntax forms for Shapes Rules

There are two concrete syntaxes.

Shape Rules Language syntax
RDF Rules syntax

Shape Rules Language:

PREFIX : <http://example/>

DATA { :x :p 1 ; :q 2 . }

RULE { ?x :bothPositive true . }
WHERE { ?x :p ?v1  FILTER ( ?v1 > 0 )  ?x :q ?v2  FILTER ( ?v2 > 0 )  }

RULE { ?x :oneIsZero true . }
WHERE { ?x :p ?v1 ;  :q ?v2  FILTER ( ( ?v1 = 0 ) || ( ?v2 = 0 ) )  }

RDF Rules syntax:

PREFIX :       <http://example/>
PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX sh:     <http://www.w3.org/ns/shacl#>
PREFIX srl:    <http://www.w3.org/ns/shacl-rules#>
PREFIX sparql: <http://www.w3.org/ns/sparql#>

:ruleSet-1
  rdf:type srl:RuleSet;
  srl:data (
    [  srl:subject :x ; srl:predicate :p ; srl:object 1 ]
    [  srl:subject :x ; srl:predicate :q ; srl:object 2 ]
  );
  srl:rules (
    [
      rdf:type srl:Rule;
      srl:head (
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :bothPositive ; srl:object true ]
      ) ;
      srl:body (
        [ srl:subject [ srl:varName "x" ]; srl:predicate :p ; srl:object [ srl:varName "v1" ] ]
        [ srl:filter [ sparql:greater-than ( [ srl:varName "v1" ] 0 ) ] ]
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :q ; srl:object [ srl:varName "v2" ] ]
        [ srl:filter [ sparql:greater-than ( [ srl:varName "v2" ] 0 ) ] ]
      );
    ]
    [
      rdf:type srl:Rule;
      srl:head (
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :oneIsZero ; srl:object true ]
      ) ;
      srl:body (
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :p ; srl:object [ srl:varName "v1" ] ]
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :q ; srl:object [ srl:varName "v2" ] ]
        [ srl:filter [ sparql:function-or (
              [ sparql:equals ( [ srl:varName "v1" ] 0 ) ]
              [ sparql:equals ( [ srl:varName "v2" ] 0 ) ]
            ) ]
        ]
      );
    ]
  ) .

Shape Rules Language syntax

The grammar is given below.

Mapping the AST to the abstract syntax.

Shape Rules Language Abbreviations

Additional helpers (short-hand abbreviations):

These allow for well-known rule patterns and also specialised implementations in basic engines.

TRANSITIVE(uri)
SYMMETRIC(uri)
INVERSE(uri, uri)

At risk:

`TRANSITIVE` has both implementation and concise expression advantages. Implementation advantages for `SYMMETRIC` and `INVERSE` are not clear.

RDF Rules Syntax

Vocabulary: rdf-syntax-vocab.ttl
SHACL shapes: rdf-syntax-shapes.ttl

Well-formedness:

All RDF lists are well-formed
exactly one of subject - predicate - object, per body of head element
Well-formed, single-valued,list-argument node expressions
well-formed abstract syntax

Describe how the abstract model maps to triples.

Process : accumulators, bottom up/ Walk the structure.

Collect data triples
Map expressions
Map triple-patterns
Map triple-templates
Map assignments
Map to rule
Rule set

All triples not in the syntax are ignored. No other "srl:" predicates are allowed (??).

@@ Illustration: SHACL rule set in text and RDF syntaxes: all features:

PREFIX :        <http://example/>

DATA { :s :p :o }
RULE { ?x :q :o } WHERE { ?x :p :o }
RULE { ?x :q :o } WHERE { ?x :p :o1 ; :p :o2 }
RULE { ?x :q :o } WHERE { ?x :p ?o . FILTER (?o < 18) }
RULE { ?x :q ?o } WHERE { ?x :p :o . SET (18 AS ?o) }
RULE { ?x :q ?o } WHERE { ?x :p :o . NOT { ?s :p ?o . FILTER(?o < 18) } }

PREFIX :       <http://example/>
PREFIX rdf:    <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX sh:     <http://www.w3.org/ns/shacl#>
PREFIX sparql: <http://www.w3.org/ns/sparql#>
PREFIX srl:    <http://www.w3.org/ns/shacl-rules#>

:ruleSet-1
  rdf:type srl:RuleSet;
  srl:data (
    [ srl:subject :s ; srl:predicate :p; srl:object :o ; ]
  );
  srl:rules (
    [
      rdf:type srl:Rule;
      srl:body (
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :p ; srl:object :o ; ]
      ) ;
      srl:head (
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :q ; srl:object :o ; ]
      )
    ]
    [
      rdf:type srl:Rule ;
      srl:body (
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :p ; srl:object :o1 ; ]
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :p ; srl:object :o2 ; ]
      ) ;
      srl:head (
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :q ; srl:object :o ; ]
      )
    ]
    [
      rdf:type srl:Rule ;
      srl:body (
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :p ; srl:object [ srl:varName "o" ] ; ]
        [
          srl:filter [
            sparql:less-than (
              [ srl:varName "o" ]
              18
            )
          ]
        ]
      ) ;
      srl:head (
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :q ; srl:object :o ; ]
      )
    ]
    [
      rdf:type srl:Rule ;
      srl:body (
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :p ; srl:object :o ; ]
        [
          srl:assign [
            srl:assignValue 18 ;
            srl:assignVar [ srl:varName "o" ]
          ]
        ]
      ) ;
      srl:head (
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :q ; srl:object [ srl:varName "o" ] ; ]
      )
    ]
    [
      rdf:type srl:Rule ;
      srl:body (
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :p ; srl:object :o ; ]
        [
          srl:not (
            [ srl:subject [ srl:varName "s" ] ;     srl:predicate :p ;     srl:object [ srl:varName "o" ] ; ]
            [
              srl:filter [
                sparql:less-than (
                  [ srl:varName "o" ]
                  18
                )
              ]
            ]
          )
        ]
      ) ;
      srl:head (
        [ srl:subject [ srl:varName "x" ] ; srl:predicate :q ; srl:object [ srl:varName "o" ] ; ]
      )
    ]
  ) .

Rule Set Evaluation

This section defines the outcome of evaluating a rule set on given data. It does not prescribe the algorithm as the method of implementation. An implementation can use any algorithm that generates the same outcome.

Inputs: data graph G, called the base graph, and a rule set RS.
Output: an RDF graph GI of inferred triples

The inferred triples do not include triples present in the set of triples of the [=base graph=].

Evaluation Definitions

Solution mapping

A Solution mapping, μ, is a partial function μ : V → T, where V is the set of all variables and T is the set of all [=RDF terms=]. The domain of μ is denoted by dom(μ), and it is the subset of V for which μ is defined. We use the term [=solution=] where it is clear that a [=solution mapping=] is meant. Write μ₀ for the solution mapping, such that dom(μ₀) is the empty set.

Substitution function

A substitution function, or just a substitution, is a function subst(μ, [=triple pattern=]) that returns a [=triple pattern=] where each occurrence in the [=triple pattern=] of a variable that is in the dom(μ) is replaced by the [=RDF term=] given by the [=solution mapping=] for var. If the triple pattern result has no variables, then it is an [=RDF Triple=].

Evaluation graph

A [=evaluation graph=] is an [=RDF Graph=] that combines the [=base graph=] and all triples produced during the evaluation of a rule set.

Graph match

A [=graph match=] finds the ways to map a triple pattern onto triples in an [=RDF Graph=].

Let G be an [=RDF graph=] and TP be a triple pattern. The function `graphMatch(G, TP)` returns a set of all possible solutions that, when applied to the triple pattern, produce a triple that is in the [=evaluation graph=]

Let G be an [=RDF graph=] and TP be a triple pattern.

graphMatch(G, TP) = { μ | subst(μ, TP) is a triple in G }

Solution compatible

Two solutions S1 and S2 are [=compatible=] if they agree on the variables in common.

Let S1 and S2 be solutions.

compatible(μ₁, μ₂) = true
                      if forall v in dom(μ1) intersection dom(μ2)
                          μ1(v) = μ2(v)
compatible(μ₁, μ₂) = false otherwise

Solution sequence

A [=solution sequence=] is a multi-set of solutions. There is no defined order to the sequence. It is equivalent to an unordered list and it can contain duplicates.

Solution merge

If two solutions are compatible, the merge of two solutions is the solution that maps variables of each solution to the [=RDF term=] from one or other of the solutions.

Let μ₁, μ₂ be solution mappings, and S1 and S2 be solution sequences.

merge(μ₁, μ₂) = { μ |
                    μ(v) = μ1(v) if v in dom(μ1)
                    μ(v) = μ2(v) otherwise }

merge(S1, S2) = { μ |
                    μ₁ in S1, μ₂ in S2
                    and compatible(μ₁, μ₂)
                    μ(v) = merge(μ₁, μ₂) }

Say the domain is `dom(S1) ∪︀ dom(S2)`.

Say that two solutions that have no variables in common are compatible.

Preparation for Evaluation

The first step in evaluating a [=rule set=] is to prepare a single, valid rule set. This involves gathering all imported rule sets, building a single, combined rule set, and then calculating the stratification for the combined rule set.

Process Imports

@@Walk imports - visit only once@@

@@ TODO: consider defining a rule set as having two components and a separate set of imports. i.e. imports are in the parsing and sorted out to give a usable "rule set" of "rules + data"

A [=rule set=] has three components: `R.rules`, `R.data`, and `R.imports`. The rule set merge of two rule sets, `RS1` and `RS2`, is a rule set, `MR`, defined as follows:

                MR.rules = RS1.rules ∪︀ RS2.rules
                MR.data = merge(RS1.data, RS2.data)
                MR.imports = {}

where `merge` is the RDF merge operation.

define imports(rule set RS, set of URLs V), returning rule set
    let I = the set of import URLs declared for the rule set RS
    let RS2 = RS
    foreach URL x in I:
        if x ∉ V:
            V = V ∪︀ { x }
            read rule set RS3 from URL x
            RS2 = rulesetMerge(RS2, imports(RS3, V))
        endif
    endfor
    result is RS2
enddefine

let RS be a rule set
let V = {}
if RS has a location, V = { location of RS }
result is imports(RS, V)

Calculate Stratification

@@ See

Evaluation of an Expression

An expression, whether used in a [=condition element=] or an [=assignment element=], is evaluated with respect to a solution mapping which provides a value which is an RDF term, for each variable in the expression. The well-formedness requirements of ensure that all variables in the expression appear in the solution mapping.

define evalFunction(F, μ):
    let [x/μ] be
        if x is an RDF term, then [x/row] is x
        if x is a variable, then [x/row] is μ(x):
          ## By well-formedness, it is an error if x is not in the row.
          return evalFunction(F(expr1, expr2 ...), row) = F(evalFunction(expr1, row), evalFunction(expr2, row), ...)
        if an error is returned by evalFunction: return error
          @@return evalFunction(FF(expr1, expr2) , row) = ... things that are not functions like `IF`
enddefine

The function `EBV(x)` returns the effective boolean value for an RDF term.

Evaluation of a Rule

A [=rule=] is evaluated by calculating a [=solution sequence=] from the [=rule body=] and then using each [=solution mapping=] of the [=solution sequence=] to generate triples using [=rule head=].

let R be a well-formed rule.

let rule R = (H, B) where
             H is the sequence of triple templates in the head
             B is the sequence of triple pattern elements,
                condition elements, negation elements,
                and assignment elements in the body

# Solution sequence of one solution that does not map any variables.
let SEQ0: Solution sequence = { μ₀ }

let G = evaluation graph

# Evaluate rule body
# This function returns a sequence of solutions
define evalRuleElements(B, SEQ, G):

    for each rule element rElt in B:

        if rElt is a triple pattern TP:
            X = graphMatch(G, TP)
            SEQ1 = {}
            for each μ₁ in X:
                for each μ₂ in SEQ:
                    if compatible(μ₁, μ₂)
                      μ₃ = merge(μ₁, μ₂)
                      add μ₃ to SEQ1
                    endif
                endfor
            endfor
        endif

        if rElt is a condition element with expression F:
            SEQ1 = {}
            for each solution μ in SEQ:
                let x = evalFunction(F, μ)
                if EBV(x) is true:
                    add μ to SEQ1
                endif
            endfor
        endif

        if rElt is a negation expression with body elements N:
            SEQ1 = {}
            for each solution μ in SEQ:
                S = sequence{ μ }
                NEG = evalRuleElements(N, S, G)
                if NEG is empty
                    add μ to SEQ1
                endif
            endfor
        endif

        if rElt is an assignment with variable V and expression expr
            SEQ1 = {}
            for each solution μ in SEQ:
                let x = evalFunction(expr, μ)
                if x is not an error:
                    ## Add mapping V -> x to solution μ
                    let μ2 be a solution mapping μ ∪︀ { (V, x) }
                    add μ2 to SEQ1
                else
                    # Error: drop solution μ
                endif
            endfor
        endif

        if SEQ1 is empty
            SEQ = {}
            return SEQ
        endif

        SEQ = SEQ1
    endfor

    return SEQ
enddefine

let SEQ = evalRuleElements(B, SEQ0, G)

# Evaluate rule head
let OUT = empty set
for each μ in SEQ:
    let S = {}
    for each triple template TT in H:
        let triple = subst(μ, TT)
        Add triple to S
    endfor
    OUT = OUT union S
endfor

result eval(R, G) is OUT

Note that `OUT` may contain triples that are also in the data graph.

Evaluation of a Rule Set

Evaluation of a [=rule set=] is defined as the execution of each [=stratum=] of the [=stratification=] of the rule set, where each stratum is executed completely and in order before moving on to the next [=stratum=]. A [=stratum=] is evaluated by first evaluating each of the [=run-once rules=] of that stratum, and then evaluating [=general rules=] of the stratum repeatedly until no new triples are produced.

let G0 be the input base graph
let RS be the rule set
let D be the graph of all DATA triples in RS

Apply stratification to RS

let LS be the sequence of layers after stratification

# Inference graph
let GI = { t ∈ D | t ∉ G0 }

# Evaluation graph.
let GE = G0 ∪︀ D

for each stratum ST in LS:
    for each rule R in ST.once:
        let X = eval(R, GE)
        let Y = { t ∈ X | t ∉ GE }
        GI = Y ∪︀ GI
        GE = Y ∪︀ GE
    endfor

    let finished = false
    while !finished:
        finished = true
        for each rule R in ST.general:
            let X = eval(R, GE)
            let Y = { t ∈ X | t ∉ GE }
            if Y is not empty:
                finished = false
                GI = Y ∪︀ GI
                GE = Y ∪︀ GE
            endif
        endfor
    endwhile
endfor
the result is GI

Shapes Rules Language Grammar

A Shapes Rules Language document is an RDF string encoded in UTF-8 [[!RFC3629]]. Only Unicode scalar values, in the ranges U+0000 to U+D7FF and U+E000 to U+10FFFF, are allowed. This excludes surrogate code points, range U+D800 to U+DFFF.

White Space

White space (production WS) is used to separate two terminals which would otherwise be (mis-)recognized as one terminal. Rule names below in capitals indicate where white space is significant; these form a possible choice of terminals for constructing a Shapes Rules Language parser.

White space is significant in the production String.

Comments

Comments start with a # outside an IRIREF, STRING_LITERAL1, STRING_LITERAL2, STRING_LITERAL_LONG1, or STRING_LITERAL_LONG2, and continue to the end of line (marked by LF, or CR), or end of file if there is no end of line after the comment marker. Comments are treated as white space.

IRI References

Relative IRI references are resolved with base IRIs as per [[[RFC3986]]] [[RFC3986]] using only the basic algorithm in section 5.2. Neither Syntax-Based Normalization nor Scheme-Based Normalization (described in sections 6.2.2 and 6.2.3 of RFC3986) are performed. Characters additionally allowed in IRI references are treated in the same way that unreserved characters are treated in URI references, per section 6.5 of [[[RFC3987]]] [[RFC3987]].

The BASE directive defines the Base IRI used to resolve relative IRI references per [[RFC3986]] section 5.1.1, "Base URI Embedded in Content". Section 5.1.2, "Base URI from the Encapsulating Entity" defines how the In-Scope Base IRI may come from an encapsulating document, such as a SOAP envelope with an `xml:base` directive or a MIME multipart document with a `Content-Location` header. The "Retrieval URI" identified in 5.1.3, Base "URI from the Retrieval URI", is the URL from which a particular Shapes Rules Language document was retrieved. If none of the above specifies the Base URI, the default Base URI (section 5.1.4, "Default Base URI") is used. Each BASE directive sets a new In-Scope Base URI, relative to the previous one.

Escape Sequences

There are three forms of escapes used in Shapes Rules documents:

A numeric escape sequence represents the value of a Unicode code point.

A numeric escape sequence MUST NOT produce a code point value in the range U+D800 to U+DFFF, which is the range for Unicode surrogates.

Escape sequence	Unicode code point
`\u` `hex` `hex` `hex` `hex`	A Unicode code point in the ranges `U+0000` to `U+D7FF` and `U+E000` to `U+FFFF`, corresponding to the value encoded by the four hexadecimal digits interpreted from most significant to least significant digit.
`\U` `hex` `hex` `hex` `hex` `hex` `hex` `hex` `hex`	A Unicode code point in the ranges `U+0000` to `U+D7FF` and `U+E000` to `U+10FFFF`, corresponding to the value encoded by the eight hexadecimal digits interpreted from most significant to least significant digit.

where hex is a hexadecimal character

HEX ::= [0-9] | [A-F] | [a-f]

A string escape sequence represents a character traditionally escaped in string literals:

Escape sequence	Unicode code point
`\t`	`U+0009`
`\b`	`U+0008`
`\n`	`U+000A`
`\r`	`U+000D`
`\f`	`U+000C`
`\"`	`U+0022`
`\'`	`U+0027`
`\\`	`U+005C`

A reserved character escape sequence consists of a \ followed by one of these characters ~.-!$&'()*+,;=/?#@%_, and represents the character to the right of the \.

Context where each kind of escape sequence can be used
	numeric escapes	string escapes	reserved character escapes
IRIs, used as RDF terms `PREFIX`, or `BASE` declarations	yes	no	no
local names	no	no	yes
Strings	yes	yes	no

%-encoded sequences are in the character range for IRIs and are explicitly allowed in local names. These appear as a % followed by two hex characters and represent that same sequence of three characters. These sequences are not decoded during processing. A term written as <http://a.example/%66oo-bar> designates the IRI http://a.example/%66oo-bar and not IRI http://a.example/foo-bar. A term written as ex:%66oo-bar with a prefix PREFIX ex: <http://a.example/> also designates the IRI http://a.example/%66oo-bar.

Grammar

The EBNF used here is defined in XML 1.0 [[!EBNF-NOTATION]].

Notes:

Keywords are case-insensitive except for 'a' which is case-sensitive.
Escape sequences UCHAR and ECHAR are case sensitive.
When tokenizing the input and choosing grammar rules, the longest match is chosen.
The Shapes Rules Language grammar is LL(1) and LALR(1) when the rules with uppercased names are used as terminals.
The entry point into the grammar is RuleSet.

A text version of this grammar is available here.

Selected Terminal Literal Strings

This document uses some specific terminal literal strings [[EBNF-NOTATION]]. To clarify the Unicode code points used for these terminal literal strings, the following table describes specific characters used in this section.

Code	Glyph	Description
`U+000A`	`LF`	Line feed
`U+000D`	`CR`	Carriage return
`U+0023`	`#`	Number sign
`U+0025`	`%`	Percent sign
`U+005C`	`\`	Backslash