Understanding XPath Injection Vulnerabilities

May 25, 2023 TH Author Trend Micro DevOps : Article, Trend Micro DevOps : AWS, Trend Micro DevOps : Azure, Trend Micro DevOps : Cloud Native, Trend Micro DevOps : Google Cloud Platform, Trend Micro DevOps : How To

When developers need to query an XML database, they use XML Path Language (XPath) to construct these queries. An XPath query searches the XML document to find nodes that match a specified pattern or have particular attributes. XML databases remain a common means of storing user data. When a user supplies their login ID and password, they trigger the preconfigured XPath query, which searches the database for the matching credentials and supplies access if the provided combination exists.

However, the ability to trigger an XPath query via user-supplied information introduces the risk of XPath injection attacks. These attacks occur when specially constructed XPath queries gain access to the XML data structure. Malicious actors can take advantage of user input fields to inject arbitrary XPath code that can access or modify the XML document data. This means that, even if the attacker cannot retrieve passwords (if the database only contains password hash values), they can still use the discovered XML structure to cause additional harm.

In this article, you’ll view some example code to discover how XPath injection attacks work and learn some best practices for preventing and mitigating them.

The risks of XPath injection

XPath injection attacks are one of the most prevalent and dangerous web application vulnerabilities. A successful attack can have several potential consequences, including:

Access to and exfiltration of sensitive data or personally identifiable information (PII).
Deletion, modification, or corruption of crucial business data.
Gaining root access to a system and performing actions that compromise system integrity.
Distribution of malware or other malicious code to internal and external users.

The consequences of such attacks can ruin the reputation of your application and your users. Therefore, you need to be conscious of the risks associated with XPath injection attacks and take the necessary measures to mitigate them. In the next section, you’ll examine a sample XPath injection attack to learn how to best defend against them.

How XPath injection works

This section reviews how XPath queries work, provides a hands-on demonstration of an XPath injection attack, and how to mitigate or prevent these attacks.

How XPath queries work

Imagine an application that enables users to search for items in an XML file. To enable this function, the application uses the following expression:

//*[contains(text(), $search)]

In this query, the user-supplied string replaces the $search variable. The query then searches the XML document for all text strings that match this input. When it executes, the application will work as expected.

However, this type of query is vulnerable to attackers who can enter malicious XPath code that allows them to bypass the XML document hierarchy. Consequently, they may access or even modify the XML data in unforeseen and dangerous ways.

For example, the knowledgeable attacker can use the user input form to inject the following XPath code into the query’s $search variable:

a’ or true() or ‘

As a result, the application constructs and executes the following XPath query:

//*[contains(text(), ‘a’ or true() or ”)]

A query of this type would match all nodes in an XML document, and depending on the program, might allow the attacker to access and modify any data the document contains.

Additionally, an attacker can ascertain an XML document’s structure, which can potentially enable them to navigate among several layers of the contained data. When this type of access is the goal, malicious actors tend to use one of the following two XPath injection methods:

Booleanization: Boolean queries will generate different behaviors depending on whether they resolve into true or false conditions. An attacker could inject a Boolean query that returns true if a login request is successful and false if the login fails. This allows the attacker to retrieve a single bit of information (success or failure) with each query. Repeating this process enables an attacker to gain insight into the contents of the XML document.
XML crawling: Attackers can inject specially crafted queries that enable them to discover the structure of an XML document. These queries allow the attacker to “crawl” through an XML document without knowing its structure beforehand. By repeatedly sending such queries to the XML document and examining the responses, the attacker can gradually discover the structure of the document and the elements it contains. Eventually, they can piece together the gathered information to reconstruct the entire document. This approach can be an effective means for discovering sensitive information or exploitable vulnerabilities in the document structure.

Sample XPath injection vulnerabilities

To see how XPath injections emerge and function, you’ll create a demo application vulnerable to these attacks. You’ll create a demo application that checks the user-supplied input to return data from an XML document.

Say you’re working with an e-commerce platform and that you maintain a list of your customers (users), each of which is identified using a username. To connect your orders with their purchasers, you use an application to search for the username to return the orders they are associated with, as listed on the order status page.

Below is an XML data construct called orders_data.xml, which will represent this scenario.

<users>

    <orders>
        <username>johndoe</username>
        <order>Custom Xbsg Item#3668</order>
        <timestamp>1671301351</timestamp>
        <reference>HS0282</reference>
        <code>gwr23d2has3fa13gs24</code>
    </orders>

    <orders>
        <username>michael_read</username>
        <order>Archep Cores Item#668</order>
        <timestamp>1671301351</timestamp>
        <reference>HS0282</reference>
        <code>gwr23d2has3fa13gs24</code>
    </orders>

    <orders>
        <username>adammrray</username>
        <order>Partial CoSum Item#92623</order>
        <timestamp>1671301351</timestamp>
        <reference>HS0282</reference>
        <code>gwr23d2has3fa13gs24</code>
    </orders>

</users>

You’ll use JavaScript to execute the above data and create a basic application that allows users to query the data using inputs they supply.

In the same directory, run the following command to initialize Node.js, a JavaScript runtime:

npm init –y

Then, you’ll need to use need the following to execute XML. Note that they’re both included in Node.js:

XPath—For DOM implementation and helper for JavaScript that supports XPath query strings
XMLDOM—For JavaScript implementation of DOM for Node.js that supports the XML Parser interface

Open a terminal to your app directory and run the following command:

npm i xpath xmldom

In the same directory, create an index.js file and execute the XML data as follows:

First, import the required dependencies:

const fs = require(‘fs’);
const xpath = require(‘xpath’);
const dom = require(‘xmldom’).DOMParser;
const util = require(‘util’);
const readline = require(‘readline’).createInterface({
input: process.stdin,
output: process.stdout
});

Then, create a simple readline module to allow user-supplied inputs:

async function enterOrder(){
    let question = util.promisify(readline.question).bind(readline);
    try{
        const order = await question(“Enter the username”);
        return order;
    }catch(err){
        err;
    }
}

Create a function to execute XML query and return data from the input using the code below:

async function xpath_injection_example(){

// execute this function
xpath_injection_example()

In the above example, the query /users/orders[username = ‘${order}’]/order will be executed. The order elements are children of orders with a username element, where the value is equal to the variable supplied in the user input.

[username = ‘${order}’] specifies a predicate condition that must be satisfied by the selected elements. In this case, the predicate specifies that the username element must have a value equal to the value of the order variable.

This way, the order element is selected as a child of the orders element that satisfies the predicate. Now run the following command to execute this program:

node index.js

This will allow you to enter the username, just like a user would have used a search-supplied input.

Enter a username in your orders_data.xml file, as shown below. This should display the associated order. Otherwise, a “User does not exist” message will be displayed if the username doesn’t exist.

Enter the username michael_read
Archep Cores Item#668

Exploring XPath injection vulnerabilities

This application is working as it should. However, an attacker with malicious intentions can run arbitrary XPath queries using the supplied user input to get access without needing a valid username.

The application is vulnerable to malicious code injection. The predicate [username = ‘${order}’] is the target for the attacker here. An attacker can construct a query that evaluates this expression to satisfy its condition. Using the injected code, the query will evaluate to true and allows the attacker to gain access without supplying the correct username.

Here are some arbitrary queries that can allow attackers to maneuver around the data hierarchy using the user-supplied input:

‘or’1’=’1
text’ or ‘1’ = ‘1
‘ or 1=1 or ‘a’=’a
‘ or ”=’
a’ or true() or ‘

Here is how to execute all of the above arbitrary queries using a’ or true() or ‘ as the example:

Enter the username a’ or true() or ‘
Archep Cores Item#668

Mitigating XPath injections

As you can see in the example above, properly constructed injections mean that an attacker can too easily access restricted data. So, this section explores how you can patch common vulnerabilities and mitigate the risks associated with XPath injections.

This attack takes advantage of a lack of proper variable parameter binding in the application’s code. The application concatenates user-supplied input directly into an XPath query without adequately validating or sanitizing the input. This is where the attacker can insert malicious XPath statements into the query to access sensitive information from the XML database.

To mitigate this, the best strategy is to use parameter binding to prevent injection. To accomplish this, you can use a regular expression that removes any characters that are not letters or numbers. This method prevents potential attackers from constructing arbitrary queries.

To sanitize and prevent XPath injection vulnerabilities in this example, use the following code:

async function enterOrder(){
    let question = util.promisify(readline.question).bind(readline);
    const regex = /[^a-z0-9]/g;
    try{
        const order = await question(“Enter the username “);
        if(regex.exec(order)){
            console.log(“Invalid characters not allowed”);
        }
        else{
            return order;
        }

    }catch(err){
        err;
    }
}

async function xpath_injection_example(){

// execute this function
xpath_injection_example()

The above code uses regex to detect and filter non-alphanumeric characters. If the supplied input has such characters, the application will stop any further execution and provide the message, “Invalid characters not allowed.” However, if the input passes this test, the application will proceed and execute the query with the supplied input. You can test the code with the arbitrary queries discussed in the section above.

Although you can sanitize user inputs, an attacker could still use other techniques to bypass this filter.

It’s virtually impossible to escape all potentially exploitable characters using regex expressions. In cases like user authentication, the application requires users to provide passwords that may contain such characters. This means that sanitizing user inputs is not wholly reliable.

Another alternative is to use parameterized XPath queries, as shown below:

const evaluator = xpath.parse(`/users/orders[username = $order]/order`);

const character = evaluator.select1({

node: doc,
variables: {
order: order
}
});

However, this approach contains a dynamic XPath expression constructed using string interpolation, which can also be constructed from user-supplied data. Therefore, this method may not be sufficient to fully protect against XPath injection.

Using precompiled XPath queries

You can use a precompiled XPath query to avoid dynamic XPath expression. This is achieved by defining the XPath expression as a separate variable and then passing it when it is needed.

The precompiled XPath query uses a variable to represent the user-provided input. This ensures they aren’t constructed from user-supplied data. Thus, an attacker cannot run any arbitrary code to gain access. Here is an example of how to use a precompiled XPath query:

async function xpath_injection_example() {
    const xml = fs.readFileSync(‘orders_data.xml’, ‘utf-8’);
    const doc = new dom().parseFromString(xml);

    // Define the precompiled XPath query
    const xpathQuery = `/users/orders[username = $order]/order`;

    const order = await enterOrder();
    if (order) {
      // Use the precompiled XPath query
      const evaluator = xpath.parse(xpathQuery);

      const character = evaluator.select1({
        node: doc,
        variables: {
          order: order
        }
      });
      if (character) {
        console.log(character.textContent);
      } else {
        console.log(“Order number does not exist”);
      }
    }
}

// Execute the function
xpath_injection_example();

This helps ensure that user-supplied input is treated as separate from the XPath query rather than as part of the query itself.

Conclusion

You have now learned some common strategies for executing XPath injection attacks and gained insight into their potential consequences. Fortunately, you also discovered some ways to mitigate the risks.

However, your application may still be vulnerable to XPath injection even after implementing the security measures you explored. To maximize application security, turning to tools like Web Application Firewalls (WAFs) or Web Application and API Protection (WAAP) can fill in the gaps that coding best practices may not address. Visit Trend Micro today to begin assessing your app’s security posture.

You May Also Like

Auto Apply IPS Rules for Solid Cloud Workload Security Solution Engineer

Attack Vector vs Attack Surface: The Subtle Difference

CIEM vs CWPP vs CSPM Use Cases