import pandas as pd

from seeq import spy

# Set the compatibility option so that you maximize the chance that SPy will remain compatible with your notebook/script
spy.options.compatibility = 197

Parameterized Jobs

The simple scheduling methods described in spy.jobs will often be adequate for your purposes.

But in some scenarios, you may wish to run a suite of jobs across an asset group or some other set of items. For this you will use the spy.jobs.push() command.

This feature is only available for scheduling notebooks in Seeq Data Lab. You cannot use SPy to schedule content in Anaconda, AWS SageMaker, or any other Python environment.

Assemble a DataFrame with the Parameters

Let’s take the most common example, which is to schedule a series of jobs across a group of assets.

Search for the assets:

schedule_df = spy.search({
    'Path': 'Example >> Cooling Tower 1',
    'Type': 'Asset'
})
schedule_df

Now add a Schedule column, which will dictate how often the script will run.

For intervals more frequent than 1 hour, it is highly recommended that you use intervals for which an hour is cleanly divisible like ‘15 minutes’, ‘20 minutes’ or ‘30 minutes’.

schedule_df['Schedule'] = 'every 6 hours'
schedule_df

You can also use Quartz Cron expressions in place of the natural language phrasing above by using the Online Cron Expression Generator. As an example, the equivalent Quartz Cron expression for “every 6 hours” is 0 0 0/6 ? * * *.

Sort your Schedule DataFrame

It’s important to sort the DataFrame so that the ordering of the items is not dependent on how the constituent data happened to be returned by Seeq or any other data source.

# If you have an ID column, it's easiest to sort by that. Otherwise pick something that
# will result in consistent ordering
schedule_df.sort_values('ID', inplace=True, ignore_index=True)

Push the jobs to Seeq

The final step is to push the schedule DataFrame to Seeq so that it can schedule the jobs.

It’s often desirable to “spread out” the execution of the jobs so that they don’t all execute simultaneously. In this example, we’re executing the jobs every 6 hours and we’ve asked spy.jobs.push() to spread them out evenly over those 6 hours. (In general, the spread parameter is the same as the frequency of your schedule since you want all the jobs to execute within the time interval allocated.)

Execute the following cell (only) to schedule the set of jobs.

parameters = spy.jobs.push(schedule_df, spread='6 hours', interactive_index=1)

If you are a Seeq administrator, you can view these jobs by going to the Administration page and clicking on the Jobs tab. You will need to clear the Groups filter to see the Notebook jobs.

In the output of the cell above, you’ll notice that the current context is INTERACTIVE, which is the term we use for the scenario where you are executing cells in the workbook yourself via the Seeq Data Lab user interface. When you open an HTML file in the _Job Results folder, you’ll see that the same cell shows the current context as JOB.

In the JOB context, parameters will be the row of the DataFrame that pertains to that job instance. In the INTERACTIVE context, parameters will be the row that corresponds to interactive_index.

We unschedule the jobs here so that your Seeq Data Lab isn’t loaded down with executing this tutorial.

spy.jobs.unschedule()

Do something cool

Now, based on the parameters in parameters, you can do something interesting. In this example we’ll push a condition to a new (small) asset tree.

parameters

Let’s pretend that we have a spiffy algorithm that can determine the health of our asset by looking at a couple of signals.

health_data_df = spy.pull(spy.search({
    'Asset': parameters['ID'],
    'Name': 'Temperature'
}), header='Name')

health_indicator = health_data_df.mean()['Temperature']
health_status = 'HEALTHY' if health_indicator > 80 else 'UNHEALTHY'

metadata_df = pd.DataFrame([{
    'Path': 'Parameterized Jobs Tutorial',
    'Asset': f'{parameters["Name"]}',
    'Name': 'Job Executions',
    'Type': 'Condition',
    'Maximum Duration': '1h'
}])
metadata_df

import datetime

start = datetime.datetime.now().isoformat()
end = (datetime.datetime.now() + datetime.timedelta(minutes=5)).isoformat()
capsule_data = pd.DataFrame([{
    'Capsule Start': pd.to_datetime(start),
    'Capsule End': pd.to_datetime(end),
    'Health': health_status
}])
capsule_data

spy.push(capsule_data, metadata=metadata_df)

Scheduling from a separate notebook

The spy.jobs.push() function accepts a datalab_notebook_url parameter, so that a job can be pushed to another notebook to which you have access. A common use case for this would be to enable a user of an Add-on Mode notebook to configure a scheduled notebook through form input. In such a scenario, the parameters specified by completion of the form would need to be passed to the scheduled notebook.

path_to_here = '/notebooks/SPy%20Documentation/Advanced%20Scheduling/Parameterized%20Jobs.ipynb'
this_notebook_url = f'{spy.utils.get_data_lab_project_url()}{path_to_here}'
spy.jobs.push(schedule_df, spread='6 hours', datalab_notebook_url=this_notebook_url)

No additional work is needed to ensure the parameters are available in the target Notebook. The schedule_df used in the call to spy.jobs.push() is automatically pickled to a .pkl file in the _Job DataFrames folder of the Notebook being scheduled. To retrieve the parameters for a specific job in the jobs DataFrame from the scheduled Notebook, just call spy.jobs.pull():

parameters = spy.jobs.pull(interactive_index=1)
parameters

The JOB and INTERACTIVE contexts still apply as described earlier in this tutorial. Use the interactive_index to control which row is returned by spy.jobs.pull() in the interactive context.

The push and pull methods can both be used with an additional label argument, which is useful for enabling reuse of a single Notebook with different parameters. For example, if it is desired to have one schedule per user for a given notebook, the user’s ID could be used as a label. This will ensure that two distinct users can schedule the same notebook, possibly with distinct parameters created from a separate notebook or from another application, without unscheduling the other user’s jobs.

Another use for a label would be enabling the scheduling of a single notebook from different Workbench Analyses using an Add-on Tool. In this case, a convenient label would be an encoding of the Workbook and Worksheet IDs of the origin worksheet, e.g., workbookId=77953A64-0675-47AE-826F-DEE1FD7AB4C5&worksheetId=5C83DF79-D725-4756-BBE6-4D2D1525D4FF.