Semantics of XPath expressions

XPath expressions can be added to any XPath query in Mendix to constrain the queried Mendix object. The XPath expression can be a combination of variables, operations, functions, keywords and system variables. This post will focus on the semantics of certain XPath expressions with the AND-operator. A seemingly small change in the notation of an XPath expression using this operator can have a huge impact on the retrieved data.

Different notations of XPath expressions

Consider the following scenario, a team has multiple members (employees) and each employee can speak multiple languages. The languages are stored as Boolean attributes on the employee entity. This is depicted in the following domain model. Furthermore a data set containing 2 teams, and 5 employees is configured.

DomainModel

Employee nameTeam nameSpeaksEnglishSpeaksFrenchSpeaksRussian
JohnAlphatruefalsetrue
FrankAlphafalsetruetrue
SteveBetatruefalsefalse
DanBetatruetruefalse
MikeBetafalsetruetrue

The requirement is to retrieve all employees who speak both English and Russian. The first XPath constraint follows the notation depicted below.

Xpath constraint - scenario 1

[SpeaksEnglish=true()]
[SpeaksRussian=true()]

The second XPath expression combines the constraints into one XPath constraint. A minor change in the notation.

XPath constraint - scenario 2

[SpeaksEnglish=true() and SpeaksRussian=true()]

As expected, both queries return the same result “John”. In both scenarios this employee matches the constrained XPath query. However was the query the same? Mendix translates the query in the Modeler to a database SQL query.

The SQL queries

SQL - scenario 1

SQL: SELECT "organization$employee"."id", "organization$employee"."name", "organization$employee"."speaksenglish", "organization$employee"."speaksrussian", "organization$employee"."speaksfrench" 
FROM "organization$employee" 
WHERE "organization$employee"."speaksenglish" = TRUE AND "organization$employee"."speaksrussian" = TRUE

SQL - scenario 2

SQL: SELECT "organization$employee"."id", "organization$employee"."name", "organization$employee"."speaksenglish", "organization$employee"."speaksrussian", "organization$employee"."speaksfrench"
FROM "organization$employee" 
WHERE ("organization$employee"."speaksenglish" = TRUE AND "organization$employee"."speaksrussian" = TRUE)

Note the small deviation in the WHERE statement of the SQL, the statement contains parentheses. With parentheses, constraints can be grouped to define priorities. Between scenario 1 and 2 it had no impact on the result of the query. For now, I won’t go into more detail on this difference in the SQL queries. Now, consider the following requirement: I wish to know which teams has an employee who speaks both English and Russian. The XPath constraint can be written as two separate constraints:

XPath constraint - scenario 3

[Organization.Employee_Team/Organization.Employee/SpeaksEnglish=true()]
[Organization.Employee_Team/Organization.Employee/SpeaksRussian=true()]

Or combined into one XPath constraint. It looks semantically the same, but will it return the same data?

XPath constraint - scenario 4

[Organization.Employee_Team/Organization.Employee[SpeaksEnglish=true() and SpeaksRussian=true()]]

The following SQL is generated in these scenario’s.

SQL - scenario 3

SQL: SELECT "organization$team"."id", "organization$team"."name" 
FROM "organization$team" 
WHERE "organization$team"."id" 
IN (SELECT "a1organization$employee_team"."organization$teamid" 
  FROM "organization$employee_team" "a1organization$employee_team" 
  INNER JOIN "organization$employee" ON "organization$employee"."id" = "a1organization$employee_team"."organization$employeeid" 
  INNER JOIN "organization$employee_team" "a2organization$employee_team" 
  ON "a2organization$employee_team"."organization$teamid" = "a1organization$employee_team"."organization$teamid" 
  INNER JOIN "organization$employee" "x2Organization.Employee" 
  ON "x2Organization.Employee"."id" = "a2organization$employee_team"."organization$employeeid" 
  WHERE "organization$employee"."speaksenglish" = TRUE 
  AND "x2Organization.Employee"."speaksrussian" = TRUE)

SQL - scenario 4

SQL: SELECT "organization$team"."id", "organization$team"."name" 
FROM "organization$team" WHERE "organization$team"."id" 
IN (SELECT "a1organization$employee_team"."organization$teamid" 
  FROM "organization$employee_team" "a1organization$employee_team" 
  INNER JOIN "organization$employee" 
  ON "organization$employee"."id" = "a1organization$employee_team"."organization$employeeid" 
  WHERE ("organization$employee"."speaksenglish" = TRUE AND "organization$employee"."speaksrussian" = TRUE))

The SQL in scenario 3 is quite different from the SQL in scenario 4. And more importantly it also returns different results. Scenario 3 returns teams “Alpha” and “Beta”, while scenario 4 only shows team “Alpha” as a result. This is caused by the different notation of the XPath expression, by changing the notation we changed the semantics of the expression.

Conclusion

In scenario 3 the constraint actually means, show the teams where someone speaks English AND where someone speaks Russian. The constraint in scenario 4 meant; show the teams where you can find someone who speaks both English and Russian. By introducing a sub-constraint on the original constraint the semantics of the query changed. It is important to decide whether the query should be constrained on the employee or not. In scenario 4 the language constraint applies to the employee, meaning it needs to be the same person who speaks both English and Russian. This shows how easily the semantics of an XPath query can change with a minor change in the notation.