SPy Jobs
- seeq.spy.jobs.pull(datalab_notebook_url: str | None = None, label: str | None = None, interactive_index: int | None | str = None, all: bool = False, session: Session | None = None) Series | None
Retrieves a jobs DataFrame previously created by a call to spy.jobs.push or spy.jobs.schedule. The DataFrame will have been stored as a pickle (.pkl) file in the _Job DataFrames folder within the parent folder of the Notebook specified by the datalab_notebook_url, or of the current Notebook if no datalab_notebook_url is specified.
- Parameters:
datalab_notebook_url (str, default None) – The URL of the Data Lab Notebook for which the scheduled jobs DataFrame is desired. If the value is not specified, the URL of the currently-running notebook is used.
label (str, default None) – The label that was used in scheduling, if any. In most circumstances, this parameter should not be specified, since the scheduled Notebook will use the label that was provided during scheduling.
interactive_index (int or str, default None) –
Used during notebook development to control which row of jobs_df is returned when NOT executing as a job. Change this value if you want to test your notebook in the Jupyter environment with various rows of parameters.
When the notebook is executed as a job, this parameter is ignored.
all (bool, default False) – If true, the entire DataFrame is returned, regardless of call context or the value of interactive_index
session (spy.Session, optional) – If supplied, the Session object (and its Options) will be used to store the login session state. This is useful to log in to different Seeq servers at the same time or with different credentials.
- Returns:
The requested row of the DataFrame that was pushed for the specified Notebook and label using the spy.jobs.push or spy.jobs.schedule method, if called with all=False. If all=True, the entire DataFrame is returned
- Return type:
pandas.Series or pandas.DataFrame
- seeq.spy.jobs.push(jobs_df: DataFrame, spread: str | None = None, datalab_notebook_url: str | None = None, label: str | None = None, user: str | None = None, interactive_index: int | None | str = None, suspend: bool = False, notify_on_skipped_execution: bool = True, notify_on_automatic_unschedule: bool = True, quiet: bool | None = None, status: Status | None = None, session: Session | None = None)
Schedules the automatic execution of a notebook and returns the row corresponding for the currently running schedule.
When used inside a Data Lab notebook, the current notebook is scheduled for execution. A notebook can be scheduled also by specifying its URL, and the scheduling can be done on behalf of another user by a user with admin privileges.
Successive calls to ‘push()’ for the same notebook and label but with different schedules will replace the previous schedule for that notebook- label combination.
Removing the scheduling is accomplished via unschedule().
A copy of the jobs DataFrame is automatically stored to a _Job DataFrames folder adjacent to the Notebook for which the job is scheduled.
- Parameters:
jobs_df (pandas.DataFrame) –
A DataFrame that contains the schedules in the form of schedule specification strings and optional parameters for each job.
The DataFrame must have an integer index and a column named ‘Schedule’ containing the scheduling specifications. If no column named ‘Schedule’ is found, the first column is used.
Examples of scheduling specification strings:
’every 15 minutes’ ‘every tuesday and friday at 6am’ ‘every fifth of the month’
The timezone used for scheduling is the one specified in the logged-in user’s profile.
You can also use Quartz Cron syntax. Use the following site to construct it: https://www.freeformatter.com/cron-expression-generator-quartz.html
spread (str, default None) – A time period over which to spread out the jobs. This should generally be the same value of the frequency of the jobs. For example, if you want the jobs to run every 6 hours, you should specify spread=’6h’ to dynamically space out the execution of the jobs throughout that 6-hour period so that the load on Seeq services isn’t concentrated at the same time.
datalab_notebook_url (str, default None) – A datalab notebook URL. If the value is not specified the currently running notebook URL is used.
label (str, default None) – A string used to enable scheduling of the Notebook by different users or from different Analysis Pages. Labels may contain letters, numbers, spaces, and the following special characters: !@#$^&-_()[]{}
user (str, default None) – Determines the user on whose behalf the notebook is executed. If the value is not specified the currently logged in user is used. The can be specified as username or a user’s Seeq ID.
interactive_index (int or str, default None) –
Used during notebook development to control which row of jobs_df is returned when NOT executing as a job. Change this value if you want to test your notebook in the Jupyter environment with various rows of parameters.
When the notebook is executed as a job, this parameter is ignored.
suspend (bool default False) – If True, un-schedules all jobs for the specified notebook. This is used in scenarios where you wish to work with a notebook interactively and temporarily prevent job execution. Remove the argument (or change it to False) when you are ready to resume job execution.
notify_on_skipped_execution (bool, default True) – If True, on skipped execution, the user on whose behalf the notebook is executed is notified, making it possible to investigate the problem and even try the execution if needed
notify_on_automatic_unschedule (bool, default True) – If True, in case the notebook is automatically unscheduled because of a system error, the user on whose behalf the notebook is executed is notified, making it possible to investigate the problem and reschedule the notebook if needed
quiet (bool, default False) – If True, suppresses progress output. Note that when status is provided, the quiet setting of the Status object that is passed in takes precedence.
status (spy.Status, optional) – If specified, the supplied Status object will be updated as the command progresses. It gets filled in with the same information you would see in Jupyter in the blue/green/red table below your code while the command is executed. The table itself is accessible as a DataFrame via the status.df property.
session (spy.Session, optional) – If supplied, the Session object (and its Options) will be used to store the login session state. This is useful to log in to different Seeq servers at the same time or with different credentials.
- Returns:
The row that corresponds to the currently executing job. If not executing in the context of job, then the row is returned according to the interactive_index parameter.
- Return type:
pandas.Series
- seeq.spy.jobs.schedule(schedule_spec: str, datalab_notebook_url: str | None = None, label: str | None = None, user: str | None = None, suspend: bool = False, notify_on_skipped_execution: bool | None = True, notify_on_automatic_unschedule: bool | None = True, quiet: bool | None = None, status: Status | None = None, session: Session | None = None) DataFrame
Schedules the automatic execution of a Seeq Data Lab notebook.
The current notebook is scheduled for execution unless datalab_notebook_url is supplied. Scheduling can be done on behalf of another user by a user with admin privileges.
Successive calls to ‘schedule()’ for the same notebook and label but with different schedules will replace the previous schedule for the notebook- label combination.
Removing the scheduling is accomplished via unschedule().
A copy of the jobs DataFrame is automatically stored to a _Job DataFrames folder adjacent to the Notebook for which the job is scheduled.
- Parameters:
schedule_spec (str) –
A string that represents the frequency with which the notebook should execute.
Examples:
’every 15 minutes’
’every tuesday and friday at 6am’
’every fifth of the month’
The timezone used for scheduling can be specified in the current notebook using ‘spy.options.default_timezone’, otherwise the timezone specified in the logged-in user’s profile will be used.
Examples: >>> spy.options.default_timezone = ‘US/Pacific’ >>> spy.options.default_timezone = pytz.timezone(‘Europe/Amsterdam’) >>> spy.options.default_timezone = ‘EST’ To set a fixed offset from UTC, use dateutil.tz.offset() with a name of your choice and a datetime.timedelta object with your offset: >>> spy.options.default_timezone = dateutil.tz.tzoffset(“my_tzoffset”, datetime.timedelta(hours=-8))
You can also use Quartz Cron syntax. Use the following site to construct it: https://www.freeformatter.com/cron-expression-generator-quartz.html
datalab_notebook_url (str, default None) – A datalab notebook URL. If the value is not specified the currently running notebook URL is used.
label (str, default None) – A string used to enable scheduling of the Notebook by different users or from different Analysis Pages. Labels may contain letters, numbers, spaces, and the following special characters: !@#$^&-_()[]{}
user (str, default None) – Determines the user on whose behalf the notebook is executed. If the value is not specified the currently logged in user is used. The can be specified as username or a user’s Seeq ID.
suspend (bool, default False) – If True, unschedules all jobs for the specified notebook. This is used in scenarios where you wish to work with a notebook interactively and temporarily “pause” job execution. Remove the argument (or change it to False) when you are ready to resume job execution.
notify_on_skipped_execution (bool, default True) – If True, on skipped execution, the user on whose behalf the notebook is executed is notified, making it possible to investigate the problem and even try the execution if needed
notify_on_automatic_unschedule (bool, default True) – If True, in case the notebook is automatically unscheduled because of a system error, the user on whose behalf the notebook is executed is notified, making it possible to investigate the problem and reschedule the notebook if needed
quiet (bool, default False) – If True, suppresses progress output. Note that when status is provided, the quiet setting of the Status object that is passed in takes precedence.
status (spy.Status, optional) – If specified, the supplied Status object will be updated as the command progresses. It gets filled in with the same information you would see in Jupyter in the blue/green/red table below your code while the command is executed. The table itself is accessible as a DataFrame via the status.df property.
session (spy.Session, optional) – If supplied, the Session object (and its Options) will be used to store the login session state. This is useful to log in to different Seeq servers at the same time or with different credentials.
- Returns:
The jobs_df with an appended column containing a description of the schedule
- Return type:
pd.DataFrame
- seeq.spy.jobs.unschedule(datalab_notebook_url: str | None = None, label: str | None = None, quiet: bool | None = None, status: Status | None = None, session: Session | None = None)
Unschedules ALL jobs for a particular notebook and label.
The current notebook is unscheduled unless datalab_notebook_url is supplied. Unscheduling can be done on behalf of another user by a user with admin privileges.
- Parameters:
datalab_notebook_url (str, default None) – A datalab notebook URL. If the value is not specified the currently running notebook URL is used.
label (str, default None) –
A string used to enable scheduling of the Notebook by different users or from different Analysis Pages. Labels may contain letters, numbers, spaces, and the following special characters:
!@#$^&-_()[]{}
A value of ‘*’ will unschedule all jobs across all labels associated with the supplied notebook (or the current Notebook, if no datalab_notebook_url is supplied).
quiet (bool, default False) – If True, suppresses progress output. Note that when status is provided, the quiet setting of the Status object that is passed in takes precedence.
status (spy.Status, optional) – If specified, the supplied Status object will be updated as the command progresses. It gets filled in with the same information you would see in Jupyter in the blue/green/red table below your code while the command is executed. The table itself is accessible as a DataFrame via the status.df property.
session (spy.Session, optional) – If supplied, the Session object (and its Options) will be used to store the login session state. This is useful to log in to different Seeq servers at the same time or with different credentials.