Veracity Logo

2227677
Statements

33149
Actors

103772
Activities

322
LRSs

Advanced Queries:

xAPI Advanced Search

The xAPI search interface from the official specification can be somewhat.... limiting. Our expanded search API gives you the language you need to express complex queries. Based on standard MongoDB search operators, your search can look something like this:

Find statements where the actor name is either "Harriet Chapman" or "Pearl Drake", or where the scaled score is greater than .75.

{
    "$or":[ 
            { "actor.name": 
                { $in: ["Harriet Chapman",
                        "Pearl Drake"]}
            },
            { "result.score.scaled": 
                { "$gt": 0.75 }
            }
    ]
}

Find statements where the value of the extension field "https://w3id.org/device-type" includes "Mobile" and the actor name contains "Henrietta". Strings are automatically treated as regular expressions.

{

    "context.extensions.https://w3id.org/device-type":"Mobile",
    "actor.name":"Henrietta"

}

You can have multiple conditions on the same field with the $and operator. When your LRS enables it, you can also use loose JSON formatting. Notice the lack of quotations? This finds statements where the scaled score is between .75 and 80.

{
        
    $and:[
        { result.score.scaled: 
            { $gt: 0.75 }
        },
        { result.score.scaled: 
            { $lt: 0.80 }
        }
    ]
    
}

Using Advanced Search

The Advanced Search endpoint is an extension to the xAPI. Use the normal xAPI endpoint for your LRS, and the typical Basic Auth keys. They key that you use needs a special permission before it can access the advanced query API. Check the box called "Advanced Queries" in the key create or edit form. If my LRS were called "test lrs", then I would issue a GET request to this address:

https://test-lrs.lrs.io/xapi/statements/search?query=             

The value of query should be the urlencoded json representation of the query. Here's a query object, and the associated URL

{
    "$or":[ 
            { "actor.name": 
                { $in: ["Harriet Chapman",
                        "Pearl Drake"]}
            },
            { "result.score.scaled": 
                { "$gt": 0.75 }
            }
    ]
}

https://test-lrs.lrs.io/xapi/statements/search?query=%7B%22%24or%22%3A%5B%7B%22actor.name%22%3A%7B%22%24in%22%3A%5B%22Harriet%20Chapman%22%2C%22Pearl%20Drake%22%5D%7D%7D%2C%7B%22result.score.scaled%22%3A%7B%22%24gt%22%3A0.75%7D%7D%5D%7D

If you've disabled the "strict" flag in your LRS settings, you can omit the quotations and the url component encoding.

Regex

When you use a string as the value to test against an object path, we treat that string as a regex. This is only the case where the string is a direct comparison. In the $in field in the example above, these strings are treated normally. So, this example matches any statement who's actor.name includes Harriet or Pearl

{
    "$or":[ 
            { "actor.name": "Harriet" },
            { "actor.name": "Pearl" },
    ]
}

"Harriet" and "Pearl" are treated as regexes in this pattern, so you'll get back statements for "Tommy Pearlson" and "Harriet Reid"

{ 
    "actor.name": { 
        $in:  [
            "Harriet",
            "Pearl"
            ]
        }
}

This pattern wont return any statements! That's because it matches the strings in the $in array exactly. Include the last name to get statement with this example.

Regexes can be use to get clever with your queries. This query matches "Harriet", but with any number of "r"s. It would return statements where the actor name is "Harriet", "Haiet", "Hariet", "Harrriet" and so on. You can read up on JavaScript Regex syntax here.

{
    "actor.name": "Har*iet"
}

In addition to Regex, there is a huge list of special query operators you can use, like $in, $gt (greater than), $not... Read the MongoDB query guide for all the info.

Using Aggregation

The Aggregation API is the nuclear option for data queries. Using this feature, you can preform complex analysis over the entire database of xAPI statements without actually retrieving them from the server. You represent the algorithm you wish to run over the statements and POST it to the server. The analysis is computed within the Veracity Learning database, returning to you only the results.

Because of the possibility for massively complex computations, in order to ensure fair access for all users, accounts using the Shared Hosting options may be limited in the number of queries and total used processing time. Upgrade to dedicated infrastructure or an onsite install for unlimited data aggregation.

The Aggregation endpoint is an extension to the xAPI. Use the normal xAPI endpoint for your LRS, and the typical Basic Auth keys. They key that you use needs a special permission before it can access the aggregation API. Check the box called "Advanced Queries" in the key create or edit form. If my LRS were called "test lrs", then I would issue a POST request to this address:

https://test-lrs.lrs.io/xapi/statements/aggregate

The content-type header of the post must be application/json, and the post body must include the JSON serialization of an Aggregation Pipeline. If the "strict" flag in the LRS settings is disabled, the payload my be the more permissive JSON5 encoding of a "pipeline". An aggregation pipeline is a set of operations for transforming the data into a result.

Here is an example of a pipeline that counts the number of statements by verb id with a date range.

[
    {    
        $match:{
            $and:[
                {statement.timestamp :{ $lt: { $parseDate:{date:"Tue Mar 27 2018 16:25:40 GMT-0400 (Eastern Daylight Time)"}}}},
                {statement.timestamp :{ $gt: { $parseDate:{date:"Tue Mar 20 2017 16:25:40 GMT-0400 (Eastern Daylight Time)"}}}},
            ]
        }
    },
    {
        $group:{
            _id:"$statement.verb.id",
            count:{$sum:1}
        }
    }
]  

In this (relatively simple) example, we first select all the statements who's timeStamp is greater and one date and less than another. That data is then processed, summing up the number of statements with each verb id. The results of an aggregation call are always JSON arrays. In this case, the result will be:

[
    {
        "_id": "http://adlnet.gov/expapi/verbs/failed",
        "count": 524
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/responded",
        "count": 12798
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/terminated",
        "count": 3588
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/passed",
        "count": 535
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/completed",
        "count": 3588
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/initialized",
        "count": 3588
    }
]

Special Operators

Our aggregation API differs only slightly from MongoDB. Because we expose the API over a web service, it can be tricky to input certain data types that don't parse into JSON nicely. To overcome this, we've added a few additional operators.

  1. $parseDate:
  2. $parseNumber:
  3. $parseRegex:

Each of these commands accepts as a child an object with a special key,value pair. The value will be parsed into a Date, Number or RegExp before the aggregation is run. It's important to understand that these are not part of the MongoDB aggregation pipeline - we parse the input using these conventions before sending them to the database.

{
    $match:{
        someKey:{$parseDate:{date:"This should be a date string"}},
        someKey:{$parseNum:{num:"This should be a number string"}},
        someKey:{$parseRegex:{regex:"This should be a regexp string"}}
    }
}

Restricted Pipeline Stages

While the data in each LRS is completely siloed, and cannot be modified by other LRSs, we still worry that exposing the aggregation pipeline could lead users to break their account, or leak information about internal system configuration. We therefor only accept a limited subset of the MongoDB Aggregation pipeline stages. These stages are safe, in that they don't reveal information about the database system internals, or modify the state of the data in the system. We only allow

    $addFields
    $bucketAuto
    $bucket
    $count
    $facet
    $geoNear
    $graphLookup
    $group
    $limit
    $lookup
    $match
    $project
    $redact
    $replaceRoot
    $sample
    $skip
    $sort
    $sortByCount
    $unwind

For users with onsite installs or dedicated cloud hosting, contact us to remove these restrictions.

Understanding the Schema

In order to write aggregation queries, you'll need to understand the format of our representation in the database. Each document in our database includes the origianl posted statement (modified slightly to make uniform according to the spec). This statement value is exactly what you see in the Statement Viewer page, with a few exceptions. Actors and Authority have an additional id field. This allows us to aggregate over agents that have the same IFI, but otherwise have different names. The statement.actor.id is the IFI of the actor, or a hash of the object when that object is a group. Keys that have a dot in the key name (usually extensions), will have the dot characters replaced by the string:

*`*

We also keep a set of indexes on the root document to aid in searching. These indexes are

agent:[array:string] //the IFIs of all agents in this statement. Usually one entry, but multiple are possible when the agent is a group
verb:[array:string] //The verb ID only
activity:[array:string] //The activity IDs of activities that should match this statement
registration:[array:string] //registrations
relatedAgents:[array:string] //The list of all agent IFIs that match this statement when the xAPI query includes "relatedAgents=true"
relatedActivities:[array:string] //The list of all Activity IDs that match this statement when the xAPI query includes "relatedActivities=true"
voided:boolean //is this statement voided?
voids: string // the ID the the statement this statement voids
statement: object //The entire statement as posted, plus the modifications described above

You can use these fields to access data that can be hard to compute during queries, and therefor is generated on store.