Veracity Logo

3577014
Statements

55794
Actors

387255
Activities

675
LRSs

User Manual

Advanced Search and Aggregation

Veracity exposes a few extensions to the xAPI interfaces that allow you additional options for querying and analyzing data.


Table Of Contents

xAPI Advanced Search

The xAPI search interface from the official specification can be somewhat.... limiting. Our expanded search API gives you a more powerful language to express complex queries. Based on standard MongoDB search operators, your search can look something like this:

Find statements where the actor name is either "Harriet Chapman" or "Pearl Drake", or where the scaled score is greater than .75.

{
    "$or":[ 
            { "actor.name": 
                { $in: ["Harriet Chapman",
                        "Pearl Drake"]}
            },
            { "result.score.scaled": 
                { "$gt": 0.75 }
            }
    ]
}

Find statements where the value of the extension field "https://w3id.org/device-type" includes "Mobile" and the actor name contains "Henrietta". Strings are automatically treated as regular expressions.

{

    "context.extensions.https://w3id.org/device-type":"Mobile",
    "actor.name":"Henrietta"

}

You can have multiple conditions on the same field with the $and operator. When your LRS enables it, you can also use loose JSON formatting. Notice the lack of quotations? This finds statements where the scaled score is between .75 and 80.

{
        
    $and:[
        { result.score.scaled: 
            { $gt: 0.75 }
        },
        { result.score.scaled: 
            { $lt: 0.80 }
        }
    ]
    
}

Using Advanced Search

This documentation is deprecated! Check the Mode V2 section for more info. The API described below will work, but expect it to change!

The Advanced Search endpoint is an extension to the xAPI. Use the normal xAPI endpoint for your LRS, and the typical Basic Auth keys. They key that you use needs a special permission before it can access the advanced query API. Check the box called "Advanced Queries" in the key create or edit form. If my LRS were called "test lrs", then I would issue a GET request to this address:

https://test-lrs.lrs.io/xapi/statements/search?query=             

The value of query should be the urlencoded json representation of the query. Here's a query object, and the associated URL:

{
    "$or":[ 
            { "actor.name": 
                { $in: ["Harriet Chapman",
                        "Pearl Drake"]}
            },
            { "result.score.scaled": 
                { "$gt": 0.75 }
            }
    ]
}

https://test-lrs.lrs.io/xapi/statements/search?query=%7B%22%24or%22%3A%5B%7B%22actor.name%22%3A%7B%22%24in%22%3A%5B%22Harriet%20Chapman%22%2C%22Pearl%20Drake%22%5D%7D%7D%2C%7B%22result.score.scaled%22%3A%7B%22%24gt%22%3A0.75%7D%7D%5D%7D

If you've disabled the "strict" flag in your LRS settings, you can omit the quotations and the url component encoding.

Regex

When you use a string as the value to test against an object path, we treat that string as a regex. This is only the case where the string is a direct comparison. In the $in field in the example above, these strings are treated normally. So, this example matches any statement who's actor.name includes "Harriet" or "Pearl."

{
    "$or":[ 
            { "actor.name": "Harriet" },
            { "actor.name": "Pearl" },
    ]
}

"Harriet" and "Pearl" are treated as regexes in this pattern, so you'll get back statements for "Tommy Pearlson" and "Harriet Reid."

{ 
    "actor.name": { 
        $in:  [
            "Harriet",
            "Pearl"
            ]
        }
}

This pattern won't return any statements! That's because it matches the strings in the $in array exactly. Include the last name to get statement with this example.

Regexes can be used to get clever with your queries. This query matches "Harriet", but with any number of "r"s. It would return statements where the actor name is "Harriet", "Haiet", "Hariet", "Harrriet" and so on. You can read up on JavaScript Regex syntax here.

{
    "actor.name": "Har*iet"
}

In addition to Regex, there is a huge list of special query operators you can use, like $in, $gt (greater than), $not... Read the MongoDB query guide for additional info.

V2 Mode

The above queries use an expansion algorithm to attempt to make it simpler to form queries. You'll notice in the documentation on Aggregation that our internal representation actually stores the statements as a child of the root document, where the child key name is "statement". Thus, the actual query for the first example should be:

{ 
    "statement.actor.name": { 
        $in:  [
            "Harriet",
            "Pearl"
            ]
        }
}

The default API for advanced search attempts to correct this for you automatically. However, there are some MongoDB queries that cannot be corrected automatically. For this reason, it is preferred that that you use the V2 mode for this api. You can activate V2 mode by supplying an additional "mode" parameter in the querystring, and setting the value to "v2"

https://test-lrs.lrs.io/xapi/statements/search?mode=v2&query= 

In v2 mode, the expansion of the query works exactly as the expansion of Aggregations. This means that you will use the $Regex operator to create a regex, because the behavior of automatically turning strings into regular expressions is not a feature of v2. Additionally, all queries should include the "statement." prefix.

Automatic escaping of extension is supported in v2.

Using Aggregation

The Aggregation API is the nuclear option for data queries. Using this feature, you can perform complex analysis over the entire database of xAPI statements without actually retrieving them from the server. You represent the algorithm you wish to run over the statements and POST it to the server. The analysis is computed within the Veracity Learning database, returning to you only the results.

Because of the possibility for massively complex computations, in order to ensure fair access for all users, accounts using the Shared Hosting options may be limited in the number of queries and total used processing time. Upgrade to a dedicated infrastructure or an onsite install for unlimited data aggregation.

The Aggregation endpoint is an extension to the xAPI. Use the normal xAPI endpoint for your LRS, and the typical Basic Auth keys. They key that you use needs a special permission before it can access the aggregation API. Check the box called "Advanced Queries" in the key create or edit form. If my LRS were called "test lrs", then I would issue a POST request to this address:

https://test-lrs.lrs.io/xapi/statements/aggregate

The content-type header of the post must be application/json, and the post body must include the JSON serialization of an Aggregation Pipeline. If the "strict" flag in the LRS settings is disabled, the payload my be the more permissive JSON5 encoding of a "pipeline". An aggregation pipeline is a set of operations for transforming the data into a result.

Here is an example of a pipeline that counts the number of statements by verb id with a date range.

[
    {    
        $match:{
            $and:[
                {statement.timestamp :{ $lt: { $parseDate:{date:"Tue Mar 27 2018 16:25:40 GMT-0400 (Eastern Daylight Time)"}}}},
                {statement.timestamp :{ $gt: { $parseDate:{date:"Tue Mar 20 2017 16:25:40 GMT-0400 (Eastern Daylight Time)"}}}},
            ]
        }
    },
    {
        $group:{
            _id:"$statement.verb.id",
            count:{$sum:1}
        }
    }
]  

In this (relatively simple) example, we first select all the statements who's timeStamp is greater and one date and less than another. That data is then processed, summing up the number of statements with each verb id. The results of an aggregation call are always JSON arrays. In this case, the result will be:

[
    {
        "_id": "http://adlnet.gov/expapi/verbs/failed",
        "count": 524
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/responded",
        "count": 12798
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/terminated",
        "count": 3588
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/passed",
        "count": 535
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/completed",
        "count": 3588
    },
    {
        "_id": "http://adlnet.gov/expapi/verbs/initialized",
        "count": 3588
    }
]

Special Operators

Our aggregation API differs only slightly from MongoDB. Because we expose the API over a web service, it can be tricky to input certain data types that don't parse into JSON nicely. To overcome this, we've added a few additional operators.

  1. $parseDate:
  2. $parseNumber:
  3. $parseRegex:

Each of these commands accepts a child as an object with a special key/value pair. The value will be parsed into a Date, Number or RegExp before the aggregation is run. It's important to understand that these are not part of the MongoDB aggregation pipeline - we parse the input using these conventions before sending them to the database.

{
    $match:{
        someKey:{$parseDate:{date:"This should be a date string"}},
        someKey:{$parseNum:{num:"This should be a number string"}},
        someKey:{$parseRegex:{regex:"This should be a regexp string"}}
    }
}

Escaping Extensions

The xAPI allows for custom fields in several places. Because the specification requires that the names of these extensions be fully qualified URLs, the format can look somewhat odd. The underlying database does not allow key names to include the "." symbol, but this symbol is required in a URL. Therefore, statements are "escaped" before they are stored. The "." in an extension is replaced by the character sequence "*`*". You may have seen some queries written as

{
    statement.context.extensions.http://activitystream*`*ms/attempt : "matchvalue"
}

You can manually escape values in this way if you prefer, but Veracity automatically escapes such queries as necessary. Note that this applies only to result, context and object definition extensions. Extension values for the context activity definitions are not currently escaped automatically. You can write the below query which will behave exactly as the one above, but may be easier to understand.

{
    statement.context.extensions.http://activitystream.ms/attempt : "matchvalue"
}

Correlation

Veracity exposes a special pipeline stage that can compute correlations between sets of documents.

{
    $group:{
        _id:{
            course:"$statement.context.contextActivities.grouping.id",
            actor:"$statement.actor.id"
        },
        courseActivity:{
            $sum:1
        }
    }
},
...
{
    $correlate:{
        join:{
            actor:"$_id.actor"
        },
        xval:"$courseActivity",
        yval:"$averageScore",
        groupId:"$_id.course",
        pipeline:[
            {
                $group:{
                    _id:1,
                    averageScore:{
                        $avg:"$statement.result.score.scaled"
                    }
                }
            }
        ]
    }
}

This pipeline stage will return a stream of documents where, for each course, the correlation between $courseActivity and $averageScore is given as "rho". The result stream would looks something like this

[
    {
        _id:"Course ID 1",
        rho: .1
    },
    {
        _id:"Course ID 2",
        rho: 0.9
    }
]

An english description of the algorithm:

  1. For each unique pair of contextActivities.grouping.id and actor.mbox, count up the number of statements and call that count courseActivity.
  2. For each courseActivity value, look up all statements where the actor in the statement is the actor for whom we computed the courseActivity
  3. Sum up those statements (for each actor) and compute the average score. Call that value averageScore.
  4. For each unique course for which we computed courseActivity, find the correlation between the averageScore field and the courseActivity field.

In other words: "For all courses, what is the correlation between the number of statements for each actor, and that actors average score." Note that for simplicity, we are actually taking the actors average score OVER ALL STATEMENTS, no just over that course.

The general pattern is that the stage will compute pairs of documents, where the value is joined by the join field. Before the correlation is performed, you can optionally transform the documents with a sub pipeline. These pairs are grouped up by the value of the groupId field, and the correlation between the xval and yval fields is returned. Thus, you'll get an output document for each unique value of the groupId. The pipeline field allows you to compute a complex query, the result of which is the input to the correlation computations.

  • join - you can join on either the actor, verb, or object. The fields here represent which value in the input stream identifies the actor ID. If the inputs to this stage are raw xAPI statements, then the value for "join.actor" should be the object path "$statement.actor.id". The same pattern applies for object and verb, so "join.verb" would be "$statement.verb.id" and "join.object" would be "$statement.object.id"

  • xval - the object path in the incoming document stream to correlate against the yval

  • yval - the object path in the sub pipeline results documents to correlate against the xval

  • groupId - after pairs are generated by joining the sub pipeline with the incoming stream, how should the pairs be grouped for correlation. If you expect to output only one correlation then you can pass in a "1"

  • pipeline - a query to transform the paired documents. If you wish to correlate against statements directly, just pass []. This is used so that you can correlate against computed values like averages, instead of simply against statement values.

Restricted Pipeline Stages

While the data in each LRS is completely siloed, and cannot be modified by other LRSs, we still worry that exposing the aggregation pipeline could lead users to break their account, or leak information about internal system configuration. We therefore only accept a limited subset of the MongoDB Aggregation pipeline stages. These stages are safe, in that they don't reveal information about the database system internals, or modify the state of the data in the system. We only allow the following:

    $addFields
    $bucketAuto
    $bucket
    $count
    $facet
    $geoNear
    $graphLookup
    $group
    $limit
    $lookup
    $match
    $project
    $redact
    $replaceRoot
    $sample
    $skip
    $sort
    $sortByCount
    $unwind

For users with onsite installs or dedicated cloud hosting, contact us to remove these restrictions.

Understanding the Schema

In order to write aggregation queries, you'll need to understand the format of our representation in the database. Each document in our database includes the original posted statement (modified slightly to make uniform according to the spec). This statement value is exactly what you see in the Statement Viewer page, with a few exceptions. Actors and Authority have an additional id field. This allows us to aggregate over agents that have the same IFI, but otherwise have different names. The statement.actor.id is the IFI of the actor, or a hash of the object when that object is a group. Keys that have a dot in the key name (usually extensions), will have the dot characters replaced by the string:

*`*

We also keep a set of indexes on the root document to aid in searching. These indexes are:

agent:[array:string] //the IFIs of all agents in this statement. Usually one entry, but multiple are possible when the agent is a group
verb:[array:string] //The verb ID only
activity:[array:string] //The activity IDs of activities that should match this statement
registration:[array:string] //registrations
relatedAgents:[array:string] //The list of all agent IFIs that match this statement when the xAPI query includes "relatedAgents=true"
relatedActivities:[array:string] //The list of all Activity IDs that match this statement when the xAPI query includes "relatedActivities=true"
voided:boolean //is this statement voided?
voids: string // the ID the the statement this statement voids
statement: object //The entire statement as posted, plus the modifications described above

You can use these fields to access data that can be hard to compute during queries, and therefore is generated on store.