Extracting data from an ensemble of trajectories

Overview

Collecting data such as displacements, velocities, speeds is a common task when analyzing trajectory data. Yupi offers a collec() fuction that automatically iterates over an ensemble of trajectories and returns the requested data. Moreover, it can also get samples for a given time scale using moving windows. This tutorial provides an step-by-step view over this collect() function.

Let’s begin

Let us first create a fake ensemble of just two (for the sake of illustration) trajectories using the Trajectory class from yupi.

import numpy as np
from yupi import Trajectory
from yupi.stats import collect

# x-coordinate for the first/second trajectory
x1 = 2 * np.arange(5)
x2 = x1 + 1

# y-coordinates
y1 = x1**2
y2 = x2**2

# Time step and ids
dt = .5
id_traj1 = 'traj_01'
id_traj2 = 'traj_02'

# Instantiating the class
traj1 = Trajectory(x=x1, y=y1, dt=dt, traj_id=id_traj1)
traj2 = Trajectory(x=x2, y=y2, dt=dt, traj_id=id_traj2)

# Gather in an ensemble
trajs = [traj1, traj2]

At this point, it is quite easy using yupi to extract some time series from a single trajectory. For example:

  • position: traj1.r

  • displacement: traj1.r.delta

  • velocity: traj1.v

  • speed: traj1.v.norm

  • distance: traj1.delta_r.norm

But how to get these from an ensemble? How to extract it when time scales are different from the time step? In what is next, these questions are answered with some illustrative examples while a detailed explanation of every parameter is given on the collect() function.

Collect general function

By default, the collect() function takes a list trajectory an returns an array with all the positional data of each trajectory concatenated.

collect(trajs)
array([[ 0.  0.]
       [ 2.  4.]
       [ 4. 16.]
       [ 6. 36.]
       [ 8. 64.]
       [ 1.  1.]
       [ 3.  9.]
       [ 5. 25.]
       [ 7. 49.]
       [ 9. 81.]])

The following sections will describe all the parameters available that manipulate the resulting data within the collect() function.

The lag parameter

Suppose the underlying ensemble of trajectories as being realizations of a process with different statistical properties at different time scales. For such a case, lag can be helpful if it is set properly. If lag is an integer it is taken as the number of samples. On the other hand, if lag is of type float, it is taken as the time to lag where its units are those of the time array (i.e., traj.t).

If lag is not set, the default value is lag=0 will be assumed.

collect(trajs, lag=2)
array([[ 4., 16.],
       [ 4., 32.],
       [ 4., 48.],
       [ 4., 24.],
       [ 4., 40.],
       [ 4., 56.]])
collect(trajs, lag=1.0)
array([[ 4., 16.],
       [ 4., 32.],
       [ 4., 48.],
       [ 4., 24.],
       [ 4., 40.],
       [ 4., 56.]])

The concat parameter

As we show in the very first example, the code for collect(trajs) will return an array with all the positional data of each trajectory concatenated.

If the data is wanted to be split by realizations, the concat parameter should be set to False.

collect(trajs, concat=False)
array([[[ 0.,  0.],
        [ 2.,  4.],
        [ 4., 16.],
        [ 6., 36.],
        [ 8., 64.]],

       [[ 1.,  1.],
        [ 3.,  9.],
        [ 5., 25.],
        [ 7., 49.],
        [ 9., 81.]]])

The warnings parameter

If the given lag is larger than one of the trajectories length, a warning message will arise and the position of the trajectory in the ensemble and its id will be shown. The collect() function will skip this trajectory. To avoid warning messages set the parameter to False.

# A trajectory with new dt
traj3 = Trajectory(x=x2, y=y2, dt=.01, traj_id="traj_03")
collect([traj3, traj2], lag=dt)
15:07:11 [WARNING] Trajectory traj_03 is shorten than 50 samples
array([[ 2.,  8.],
       [ 2., 16.],
       [ 2., 24.],
       [ 2., 32.]])
collect(trajs, lag=dt, warnings=False)
array([[ 2.,  8.],
       [ 2., 16.],
       [ 2., 24.],
       [ 2., 32.]])

The velocity parameter

Some times it is useful to have the velocity of the trajectories. To indicate that the velocity is needed, the velocity parameter should be set to True.

collect(trajs, velocity=True)
array([[ 4., 16.],
       [ 4., 16.],
       [ 4., 32.],
       [ 4., 48.],
       [ 4., 48.],
       [ 4., 24.],
       [ 4., 24.],
       [ 4., 40.],
       [ 4., 56.],
       [ 4., 56.]])

Additional if the lag is used, the velocity will be calculated according the given lag.

collect(trajs, lag=2, velocity=True)
array([[ 4. 16.]
       [ 4. 32.]
       [ 4. 48.]
       [ 4. 24.]
       [ 4. 40.]
       [ 4. 56.]])

The func parameter

All the examples described above only returns raw data from the trajectories. If the data is wanted to be transformed, the func parameter should be set to a function that will be applied to each vector (before concatenation).

This could help if we want to extract for example the delta velocity of the trajectories.

collect(trajs, velocity=True, func=lambda vec: vec.delta)
array([[ 0.  0.]
       [ 0. 16.]
       [ 0. 16.]
       [ 0.  0.]
       [ 0.  0.]
       [ 0. 16.]
       [ 0. 16.]
       [ 0.  0.]])

The at parameter

When the data is wanted to be extracted at a certain time (or index), the at parameter should be used. If at is an integer, it is taken as the index. If at is a float, it is taken as the time (in this case the index is calculated using the trajectory’s dt value).

This paramenter can not be used with lag parameter at the same time. In addition, When the at parameter is used, the concat parameter is ignored.

collect(trajs, at=1)
array([[ 2.,  4.],
       [ 3.,  9.]])
collect(trajs, at=.5)
array([[ 2.,  4.],
       [ 3.,  9.]])

Collect specific functions

These functions are just spetializations of the collect() function. All of them use the collect() function internally. Each of them has a different usage depending on the data, if it’s wanted to be extracted at a certain time or step or if it’s wanted to be extracted lagged.