import pandas as pd
from seeq import spy
# Set the compatibility option so that you maximize the chance that SPy will remain compatible with your notebook/script
spy.options.compatibility = 193
Parameterized Jobs
The simple scheduling methods described in spy.jobs will often be adequate for your purposes.
But in some scenarios, you may wish to run a suite of jobs across an
asset group or some other set of items. For this you will use the
spy.jobs.push()
command.
This feature is only available for scheduling notebooks in Seeq Data Lab. You cannot use SPy to schedule content in Anaconda, AWS SageMaker, or any other Python environment.
Assemble a DataFrame with the Parameters
Let’s take the most common example, which is to schedule a series of jobs across a group of assets.
Search for the assets:
schedule_df = spy.search({
'Path': 'Example >> Cooling Tower 1',
'Type': 'Asset'
})
schedule_df
Now add a Schedule
column, which will dictate how often the script
will run.
For intervals more frequent than 1 hour, it is highly recommended that you use intervals for which an hour is cleanly divisible like ‘15 minutes’, ‘20 minutes’ or ‘30 minutes’.
schedule_df['Schedule'] = 'every 6 hours'
schedule_df
You can also use Quartz Cron expressions in place of the natural
language phrasing above by using the Online Cron Expression
Generator.
As an example, the equivalent Quartz Cron expression for “every 6 hours”
is 0 0 0/6 ? * * *
.
Sort your Schedule DataFrame
It’s important to sort the DataFrame so that the ordering of the items is not dependent on how the constituent data happened to be returned by Seeq or any other data source.
# If you have an ID column, it's easiest to sort by that. Otherwise pick something that
# will result in consistent ordering
schedule_df.sort_values('ID', inplace=True, ignore_index=True)
Push the jobs to Seeq
The final step is to push the schedule DataFrame to Seeq so that it can schedule the jobs.
It’s often desirable to “spread out” the execution of the jobs so that
they don’t all execute simultaneously. In this example, we’re executing
the jobs every 6 hours and we’ve asked spy.jobs.push()
to spread
them out evenly over those 6 hours. (In general, the spread
parameter is the same as the frequency of your schedule since you want
all the jobs to execute within the time interval allocated.)
Execute the following cell (only) to schedule the set of jobs.
parameters = spy.jobs.push(schedule_df, spread='6 hours', interactive_index=1)
If you are a Seeq administrator, you can view these jobs by going to the Administration page and clicking on the Jobs tab. You will need to clear the Groups filter to see the Notebook jobs.
In the output of the cell above, you’ll notice that the current context
is INTERACTIVE, which is the term we use for the scenario where you
are executing cells in the workbook yourself via the Seeq Data Lab user
interface. When you open an HTML file in the _Job Results
folder,
you’ll see that the same cell shows the current context as JOB.
In the JOB context, parameters
will be the row of the DataFrame that
pertains to that job instance. In the INTERACTIVE context,
parameters
will be the row that corresponds to
interactive_index
.
We unschedule the jobs here so that your Seeq Data Lab isn’t loaded down with executing this tutorial.
spy.jobs.unschedule()
Do something cool
Now, based on the parameters in parameters
, you can do something
interesting. In this example we’ll push a condition to a new (small)
asset tree.
parameters
Let’s pretend that we have a spiffy algorithm that can determine the health of our asset by looking at a couple of signals.
health_data_df = spy.pull(spy.search({
'Asset': parameters['ID'],
'Name': 'Temperature'
}), header='Name')
health_indicator = health_data_df.mean()['Temperature']
health_status = 'HEALTHY' if health_indicator > 80 else 'UNHEALTHY'
metadata_df = pd.DataFrame([{
'Path': 'Parameterized Jobs Tutorial',
'Asset': f'{parameters["Name"]}',
'Name': 'Job Executions',
'Type': 'Condition',
'Maximum Duration': '1h'
}])
metadata_df
import datetime
start = datetime.datetime.now().isoformat()
end = (datetime.datetime.now() + datetime.timedelta(minutes=5)).isoformat()
capsule_data = pd.DataFrame([{
'Capsule Start': pd.to_datetime(start),
'Capsule End': pd.to_datetime(end),
'Health': health_status
}])
capsule_data
spy.push(capsule_data, metadata=metadata_df)
Scheduling from a separate notebook
The spy.jobs.push()
function accepts a datalab_notebook_url
parameter, so that a job can be pushed to another notebook to which you
have access. A common use case for this would be to enable a user of an
Add-on Mode notebook to configure a scheduled notebook through form
input. In such a scenario, the parameters specified by completion of the
form would need to be passed to the scheduled notebook.
path_to_here = '/notebooks/SPy%20Documentation/Advanced%20Scheduling/Parameterized%20Jobs.ipynb'
this_notebook_url = f'{spy.utils.get_data_lab_project_url()}{path_to_here}'
spy.jobs.push(schedule_df, spread='6 hours', datalab_notebook_url=this_notebook_url)
No additional work is needed to ensure the parameters are available in
the target Notebook. The schedule_df
used in the call to
spy.jobs.push()
is automatically pickled to a .pkl file in the
_Job DataFrames
folder of the Notebook being scheduled. To retrieve
the parameters for a specific job in the jobs DataFrame from the
scheduled Notebook, just call spy.jobs.pull()
:
parameters = spy.jobs.pull(interactive_index=1)
parameters
The JOB and INTERACTIVE contexts still apply as described
earlier in this tutorial. Use the interactive_index
to control which
row is returned by spy.jobs.pull()
in the interactive context.
The push
and pull
methods can both be used with an additional
label
argument, which is useful for enabling reuse of a single
Notebook with different parameters. For example, if it is desired to
have one schedule per user for a given notebook, the user’s ID could be
used as a label. This will ensure that two distinct users can schedule
the same notebook, possibly with distinct parameters created from a
separate notebook or from another application, without unscheduling the
other user’s jobs.
Another use for a label would be enabling the scheduling of a single
notebook from different Workbench Analyses using an Add-on Tool. In this
case, a convenient label would be an encoding of the Workbook and
Worksheet IDs of the origin worksheet, e.g.,
workbookId=77953A64-0675-47AE-826F-DEE1FD7AB4C5&worksheetId=5C83DF79-D725-4756-BBE6-4D2D1525D4FF
.