filters

nomad.filters.completeness(data, periods=1, freq='h', *, start=None, end=None, offset_col=0, relative=False, str_from_time=True, agg_freq=None, traj_cols=None, **kwargs)[source]

Measure trajectory completeness as the fraction of expected time intervals (‘buckets’) containing at least one observation.

Parameters:
  • data (pandas.Series or pandas.DataFrame) –

    Trajectory data containing timestamps, either as:

    • A pandas Series of Unix-second integers or datetime64 values.

    • A DataFrame, from which timestamp and user columns are identified via traj_cols or default column naming conventions.

  • periods (int, default 1) – Number of units of freq per bucket (must be ≥ 1). For example, periods=3, freq=’h’ results in 3-hour buckets.

  • freq ({'s', 'min', 'h', 'd', 'w'}, default 'h') – Time resolution used to define buckets: seconds (‘s’), minutes (‘min’), hours (‘h’), days (‘d’), or weeks (‘w’).

  • start (scalar, optional) – Explicit time bounds to define the bucket range. If either is omitted, the range is inferred from the data. Ignored if relative=True.

  • end (scalar, optional) – Explicit time bounds to define the bucket range. If either is omitted, the range is inferred from the data. Ignored if relative=True.

  • relative (bool, default False) – If False, completeness is measured within a common time span shared by all users. If True, each user’s completeness is computed only within their own individual time span (from their first to their last record).

  • offset_col (pandas.Series or int, default 0) – Offset in seconds to apply to timestamps (useful for handling time zones). If a tz_offset column is present in the data and indicated via traj_cols or kwargs, this argument is ignored.

  • agg_freq (str, optional) – Aggregation frequency (e.g., ‘D’ for daily, ‘W’ for weekly, ‘M’ for monthly). If specified, returns completeness aggregated at this frequency instead of overall completeness.

  • traj_cols (dict, optional) – Mapping from standard keys (‘timestamp’, ‘datetime’, ‘user_id’, ‘tz_offset’) to column names in data. If omitted, defaults are used.

  • **kwargs – Shorthand overrides for entries in traj_cols.

Returns:

  • If input is a single Series and agg_freq=None, returns a single float.

  • If input is a DataFrame and agg_freq=None, returns a Series indexed by user_id.

  • If agg_freq is specified, returns completeness aggregated by the specified frequency, either as a Series (single user) or DataFrame (rows per user, columns per aggregation bucket).

Return type:

float or pandas.Series or pandas.DataFrame

nomad.filters.coverage_matrix(data, periods=1, freq='h', start=None, end=None, offset_col=0, relative=False, str_from_time=False, traj_cols=None, **kwargs)[source]

Matrix of 0/1 flags; rows=user (or the single Series), columns=bucket start.

nomad.filters.downsample(df, periods=1, freq='min', keep='first', traj_cols=None, verbose=False, **kwargs)[source]

Down-sample df so that each user contributes at most one row in every consecutive periods × freq window.

Parameters:
  • df (pandas.DataFrame) – The input data.

  • periods (int, default 1) – Size of the window expressed in multiples of freq; must be ≥ 1.

  • freq ({'s', 'min', 'h', 'd', 'w'}, default 'min') – Unit of the window: second, minute, hour, day, or week (lower-case aliases).

  • keep ({'first', 'last', False}, default 'first') – Which duplicate inside each window to retain, matching pandas.Series.duplicated semantics.

  • traj_cols (dict, optional) – Mapping from the standard keys ‘timestamp’, ‘datetime’, ‘user_id’, and ‘tz_offset’ to the actual column names in df. Any key may be absent if the corresponding column is not present.

  • verbose (bool, default False) – When True, prints the fraction of rows removed and the window size.

  • **kwargs – Shorthand overrides for entries in traj_cols

Returns:

A view of df containing the surviving rows.

Return type:

pandas.DataFrame

Raises:
  • ValueError – If periods is not a positive integer or freq is invalid.

  • KeyError – If no suitable time column is found after parsing traj_cols.

nomad.filters.is_within(df, within, poly_crs=None, data_crs=None, traj_cols=None, **kwargs)[source]

Filter a DataFrame to include only points within the given polygon.

Parameters:
  • df (pd.DataFrame or GeoDataFrame) – Trajectory data.

  • within (shapely Polygon/MultiPolygon or WKT string) – Polygon defining the spatial filter.

  • traj_cols (dict, optional) – Mapping of logical trajectory column names to actual columns.

  • poly_crs (CRS or str, optional) – CRS of the polygon.

  • data_crs (CRS or str, optional) – CRS of the DataFrame coordinates.

  • **kwargs – Additional parameters for trajectory columns resolution.

Returns:

Boolean mask for which points are in the polygon within

Return type:

pd.Series

nomad.filters.q_filter(df: DataFrame, qbar: float, traj_cols: dict = None, user_id: str = 'user_id', timestamp: str = 'timestamp')[source]

Computes the q statistic for each user as the proportion of unique hours with pings over the total observed hours (last hour - first hour) and filters users where q > qbar.

Parameters:
  • df (pd.DataFrame) – Input DataFrame with user_id and timestamp columns.

  • qbar (float) – The threshold q value; users with q > qbar will be retained.

  • traj_cols (dict, optional) – Dictionary containing column mappings, e.g., {“user_id”: “user_id”, “timestamp”: “timestamp”}.

  • user_id (str, optional) – Name of the user_id column (default is “user_id”).

  • timestamp (str, optional) – Name of the timestamp column (default is “timestamp”).

Returns:

A Series containing the user IDs for users whose q_stat > qbar.

Return type:

pd.Series

nomad.filters.to_projection(data, crs_to, data_crs=None, traj_cols=None, **kwargs)[source]

Project coordinates from data_crs to crs_to, with robust column handling.

Warns if coordinate columns and CRS type appear mismatched.

Parameters:
  • data (pd.DataFrame) – Data to project.

  • crs_to (str or CRS) – Output CRS (required).

  • data_crs (str or CRS, optional) – Source CRS (default: inferred).

  • traj_cols (dict, optional) – Mapping of logical column names to actual columns.

  • **kwargs – Passed to trajectory column parsing.

Returns:

Projected x and y as Series, aligned to data.index

Return type:

pd.Series, pd.Series

Note

To assign directly, use np.column_stack. For example df[[‘lon’,’lat’]] = np.column_stack(to_projection(…))

nomad.filters.to_tessellation(data, index, res, data_crs=None, traj_cols=None, **kwargs)[source]

Project coordinates from data_crs to crs_to, with robust column handling.

Parameters:
  • data (pd.DataFrame) – Data to project.

  • index (str) – One of ‘h3’, ‘geohash’, or ‘s2’.

  • data_crs (str or CRS, optional) – Source CRS (default: inferred).

  • traj_cols (dict, optional) – Mapping of logical column names to actual columns.

  • **kwargs – Passed to trajectory column parsing.

nomad.filters.to_timestamp(datetime, tz_offset=None)[source]

Convert a datetime Series or scalar into UNIX timestamps (seconds).

Parameters:
  • datetime (pd.Series, str, pd.Timestamp, or scalar)

  • tz_offset (pd.Series, optional)

Returns:

UNIX timestamps as nullable Int64 values (seconds since epoch) for non-scalar inputs. Returns scalar int if input was scalar.

Return type:

pd.Series or int

nomad.filters.to_yyyymmdd(time_values, tz_offset=None)[source]

Convert datetimes/timestamps to integer YYYYMMDD.

Accepts heterogeneous inputs and optional per-row timezone offsets. If tz_offset is provided (seconds), the date is computed in that local time; otherwise dates are computed in UTC.

Parameters:
  • time_values (pd.Series) – Series of datetime64, strings, pandas.Timestamp objects, or Unix seconds.

  • tz_offset (pd.Series or scalar, optional) – Seconds offset from UTC to local time (e.g., -18000 for UTC-5). If provided, the conversion uses local dates; otherwise UTC dates.

Returns:

Integer dates encoded as YYYYMMDD (dtype Int64, NA-friendly).

Return type:

pd.Series

nomad.filters.to_zoned_datetime(utc_timestamps, timezone_offset)[source]
nomad.filters.within(df, within, poly_crs=None, data_crs=None, traj_cols=None, **kwargs)[source]

Return a filtered DataFrame containing only rows within the given polygon.

All arguments are passed to is_within.