import json
import os
import pandas as pd
import shutil
from seeq import spy

# Set the compatibility option so that you maximize the chance that SPy will remain compatible with your notebook/script
spy.options.compatibility = 193
# Log into Seeq Server if you're not using Seeq Data Lab:
spy.login(url='http://localhost:34216', credentials_file='../credentials.key', force=False)

Workbook Jobs

In spy.workbooks.ipynb, you can learn to push and pull workbooks (Workbench Analyses and Organizer Topics) to/from the Seeq service/server using SPy.

You may need to do something “in bulk,” in one of the following scenarios:

  • Re-mapping references (e.g. historian tags/signals) from one datasource to another, or one asset tree to another

  • Transferring work from one Seeq service/server to another, possibly including data

The set of functions in the spy.workbooks.job module are suitable for this work. Each function operates within a “job folder” that captures the state of the job. Unlike spy.workbooks.pull() and spy.workbooks.push(), the equivalent commands in spy.workbooks.job do not require all workbooks to be held in memory. This allows very large jobs to be executed (as long as there is sufficient disk space). All parts of the process are resumable, and SPy will pick up where it left off if the operation is interrupted for any reason (e.g. a network error).

This notebook will walk through the use of this module, referencing the scenarios above. In general, commands are executed in the following order:

  1. spy.workbooks.job.pull()

  2. spy.workbooks.job.data.pull() (optional)

  3. spy.workbooks.job.push()

  4. spy.workbooks.job.data.push() (optional)

Establish the Job Folder

The parameter that defines a job is a job folder. It is the first argument for all job functions, and it is managed entirely by SPy. The folder is laid out in an intuitive way that allows you to inspect it, and, in some troubleshooting cases, make modifications yourself.

job_folder = 'Output/My First Workbooks Job'

# Remove the job folder so that old file/artifacts don't affect the tutorial
if os.path.exists(job_folder):
    shutil.rmtree(job_folder)

Let’s Make Something to Work With…

We need some Analyses/Topics to work with for the purposes of demonstrating the functionality, so let’s make sure the example workbooks have been pushed.

example_workbooks = spy.workbooks.load('Support Files/Example Export.zip')
spy.workbooks.push(example_workbooks,
                   path='SPy Documentation Examples >> Workbook Job Import',
                   label=f'{spy.session.user.name} Workbook Job Example',
                   refresh=False,
                   errors='raise')

Pulling Workbooks

Start the job cycle by issuing the spy.workbooks.job.pull() to grab a set of workbooks and write them to disk.

As with spy.workbooks.pull(), we create a DataFrame full of workbooks to pull by using the spy.workbooks.search() function. Then we can supply that DataFrame to spy.workbooks.job.pull(), which takes many of the same parameters as spy.workbooks.pull().

workbooks_df = spy.workbooks.search({
    'Path': 'SPy Documentation Examples >> Workbook Job Import'
})

# Store these in variables that we'll use later
example_analysis_workbook_id = workbooks_df[workbooks_df['Name'] == 'Example Analysis'].iloc[0]['ID']
example_topic_workbook_id = workbooks_df[workbooks_df['Name'] == 'Example Topic'].iloc[0]['ID']

workbooks_df
spy.workbooks.job.pull(job_folder, workbooks_df)

As mentioned earlier, jobs are resumable. If you execute the above cell again, you will see that the Result column indicates Already pulled.

If you would like to force a job to redo its work, supply the resume=False argument. You can also inspect the job folder’s Workbooks subfolder and selectively delete workbook folders therein to force SPy to re-pull workbooks.

Pushing Workbooks

As mentioned above, there are two primary scenarios where you want to push workbooks in bulk:

  • Re-mapping references (e.g. historian tags/signals) from one datasource to another, or one asset tree to another

  • Transferring work from one Seeq service/server to another, possibly including data

Datasource Maps

In either case, it’s important to understand the concept of datasource maps. These are JSON files that contain instructions for SPy as it maps the identifiers in the pulled workbook definitions to identifiers on the target system. These maps can incorporate relatively complex Regular Expression specifications that allow you to re-orient workbooks from one set of input data to another.

The spy.workbooks.job.pull() command will create a Datasource Maps folder inside the job folder. There will be one file for every datasource that was encountered during the pull operation – if a workbook touched a datasource in some way, there will be a file for it.

Here’s what a typical file looks like:

{
    "Datasource Class": "Time Series CSV Files",
    "Datasource ID": "Example Data",
    "Datasource Name": "Example Data",
    "Item-Level Map Files": [],
    "RegEx-Based Maps": [
        {
            "Old": {
                "Type": "(?<type>.*)",
                "Datasource Class": "Time Series CSV Files",
                "Datasource Name": "Example Data",
                "Data ID": "(?<data_id>.*)"
            },
            "New": {
                "Type": "${type}",
                "Datasource Class": "Time Series CSV Files",
                "Datasource Name": "Example Data",
                "Data ID": "${data_id}"
            }
        }
    ]
}

You can make modifications to these files by loading them into an editor, including Jupyter’s text editor. Generally the most common action is to add or change entries in the RegEx-Based Maps block. That section is a list of dictionaries that each have an Old and a New subsection. Within the Old block, you can specify properties to match on. The key is the property name and the value is a regular expression, often employing a capture group. In the example above, the Data ID field is matching using the .* regex and storing it in a capture group called data_id. The New block then contains the properties and values to search upon to “map” to target items. In the example above, the "Data ID": "${data_id}" specification just means that the Data ID is being used “as-is” without any alteration.

(If you happen to be familiar with Connector Property Transforms, this regex approach may feel familiar.)

Let’s look at a more complicated example:

{
    "Datasource Class": "Time Series CSV Files",
    "Datasource ID": "Example Data",
    "Datasource Name": "Example Data",
    "Item-Level Map Files": [],
    "RegEx-Based Maps": [
        {
            "Old": {
                "Type": "(?<type>.*)",
                "Datasource Class": "Time Series CSV Files",
                "Datasource Name": "Example Data",
                "Data ID": "(?<data_id>.*)"
            },
            "New": {
                "Type": "${type}",
                "Datasource Class": "Time Series CSV Files",
                "Datasource Name": "Example Data",
                "Data ID": "${data_id}"
            }
        },
        {
            "Old": {
                "Type": "(?<type>.*)",
                "Path": "Example >> Cooling Tower 1",
                "Asset": "Area (?<subarea>[ABC])",
                "Name": "(?<name>.*)"
            },
            "New": {
                "Type": "${type}",
                "Path": "Example >> Cooling Tower 2",
                "Asset": "Area ${subarea}",
                "Name": "${name}"
            }
        }
    ]
}

In this example there are two RegEx-Based Maps specified. The first map is identical to the previous example, and it will be used first– if there is not a match on the Old regex specifications, then SPy will move on to the next. The next map matches on a particular asset path (Example >> Cooling Tower 1) and a set of subareas (A, B, or C) and then maps them to the same area underneath Example >> Cooling Tower 2.

In this manner, you can use arbitrarily-complex mapping logic to accomplish the goal of re-mapping a workbook within the same Seeq server or properly mapping from one Seeq server to another.

Datasource Mapping in Action

Let’s run through an actual mapping scenario to see how it works and how to troubleshoot it when it goes wrong.

First we have to grab a couple of signal IDs so that we can use them later to illustrate some functionality.

area_a_temperature_id = spy.search({'Datasource Name': 'Example Data', 'Name': 'Area A_Temperature'}).iloc[0]['ID']
area_a_optimizer_id = spy.search({'Datasource Name': 'Example Data', 'Name': 'Area A_Optimizer'}).iloc[0]['ID']

Now we will write out a datasource map file that has, as its first map, a New block that will map to a Name that does not exist. This will let us see what happens both when the mapping is successful and when there are errors.

datasource_map = {
    "Datasource Class": "Time Series CSV Files",
    "Datasource ID": "Example Data",
    "Datasource Name": "Example Data",
    "Item-Level Map Files": [],
    "RegEx-Based Maps": [
        {
            "Old": {
                "Type": "(?<type>.*)",
                "Datasource Class": "Time Series CSV Files",
                "Datasource Name": "Example Data",
                "Name": "Area A_Optimizer"
            },
            "New": {
                "Type": "${type}",
                "Datasource Class": "Time Series CSV Files",
                "Datasource Name": "Example Data",
                "Name": "Area NonExistent_Optimizer"
            },
            # In this contrived example, if we match on the "Old" criteria, we don't want to continue to the next regex map
            "On Match": "Stop"
        },
        {
            "Old": {
                "Type": "(?<type>.*)",
                "Datasource Class": "Time Series CSV Files",
                "Datasource Name": "Example Data",
                "Data ID": "(?<data_id>.*)"
            },
            "New": {
                "Type": "${type}",
                "Datasource Class": "Time Series CSV Files",
                "Datasource Name": "Example Data",
                "Data ID": "${data_id}"
            }
        }
    ]
}

with open(os.path.join(job_folder, 'Datasource Maps', 'Datasource_Map_Time Series CSV Files_Example Data_Example Data.json'), 'w') as f:
    json.dump(datasource_map, f)

Now we push to server using a label that is guaranteed to differentiate our activity from other users. As you will see, there will be errors reported in the Results column because one item won’t be mapped.

push_df = spy.workbooks.job.push(job_folder,
                                 path='SPy Documentation Examples >> Workbook Jobs',
                                 label=f'{spy.session.user.name} Workbook Job Example',
                                 errors='catalog')

You can see the error, but it’s not formatted very well. So let’s use a troubleshooting tool– the “explain” function on the returned DataFrame:

print(push_df.spy.item_map.explain(area_a_optimizer_id))

This detailed explanation is intended to give you a starting point for troubleshooting. You can see the regex that was specified, the property values that were matched on, the Capture groups that resulted from the RegEx specifications, and the property values that were subsequently searched for. Since Area NonExistent_Optimizer does not exist, the explanation for RegEx-Based Map 0 says Item not found on server.

Now let’s look at the explanation for a successful map (Area A_Temperature):

print(push_df.spy.item_map.explain(area_a_temperature_id))

Dummy Items

In cases where we couldn’t map successfully, we can tell SPy to create “dummy” items. A dummy item is a signal, condition or scalar that has all the properties of the original item but has no data. (We’ll show how to push data to dummy items later…)

Note the use of create_dummy_items=True, and also note resume=False so that SPy tries to push the workbooks again:

push_df = spy.workbooks.job.push(
    job_folder,
    path='SPy Documentation Examples >> Workbook Jobs',
    label=f'{spy.session.user.name} Workbook Job Example',
    create_dummy_items=True,
    resume=False,
    errors='catalog')

Now there are no errors, because any item that couldn’t be mapped would be replaced by a dummy item.

Find the Example Analysis row of the output table above and click on the link in the URL column to take a look at the resulting. You’ll see that the Details Pane worksheet contains Area A_Optimizer, which is a blank “dummy item”.

We can look at what dummy items were created by inspecting push_df.spy.item_map.dummy_items. You can see that the Name is the same as the original and important properties like Maximum Interpolation have made their way to the dummy item.

push_df.spy.item_map.dummy_items

Including Data

Dummy items are helpful, but they are “blank,” they do not have any data associated with them. If you are transferring workbooks between servers and the destination server doesn’t have access to the same datasources, it is useful to be able transfer the data itself from the source server to the dummy items on the destination server. A set of SPy functions is provided in spy.workbooks.job.data for this purpose.

As spy.workbooks.job.pull() pulls workbook information, it also tracks the usage of data items on Workbench Worksheets and in Organizer Topic Documents. This information is collated and saved to disk as the data manifest. You can inspect the manifest like so:

manifest_df = spy.workbooks.job.data.manifest(job_folder)

# Simplify the DataFrame so that it fits on the screen better
manifest_df[['ID', 'Path', 'Asset', 'Name', 'Start', 'End', 'Calculation']]

You can see Start and End columns that provide the overall time bounds that were detected in the workbook data references. There is also a Calculation column that refines this broad time period into specific “chunks” of data, defined as individual capsules and using the within() and touches() Seeq Formula functions to pull data only for those time periods – not just everything between Start and End.

This manifest DataFrame can be fed directly into spy.pull() but it is recommended that you use spy.workbooks.job.data.pull() like so:

spy.workbooks.job.data.pull(job_folder)

Data has now been added to the job folder for the time periods identified in the manifest and can be pushed to the dummy items like so:

spy.workbooks.job.data.push(job_folder)

If you refresh the page of the Details Pane within the Example Analysis, you’ll now see data for Area A_Optimizer.

However, if you move the Display Range to the left, you’ll see that Area A_Optimizer data only exists for the time period that was originally on the screen.

If you want to expand how much data is pulled for a particular item, you can execute a command to alter the manifest and then pull/push again:

# Expand the time periods for Area A_Optimizer by 2 weeks on either side
spy.workbooks.job.data.expand(job_folder, {'Name': 'Area A_Optimizer'}, by='2w')

spy.workbooks.job.data.pull(job_folder, resume=False)
spy.workbooks.job.data.push(job_folder, resume=False)

There are a series of functions to alter the manifest, and

  • spy.workbooks.job.data.expand() - Expand the existing time periods.

  • spy.workbooks.job.data.add() - Add a specific time period.

  • spy.workbooks.job.data.remove() - Remove a specific time period.

  • spy.workbooks.job.data.calculation() - Apply a specific calculation, such as resample().

Documentation for these functions is found under the Detailed Help section below.

Redoing Specific Workbooks/Items

In the process of pushing and pulling, you will usually use the errors='catalog' flag, which means that errors will be enumerated but the operation will keep going if at all possible. When you resume an operation, those items that had errors will not be re-attempted, because SPy (by default) assumes that you don’t care about them.

But you will often care about errors, and you will figure out how to fix them (say, by altering a Datasource Map). You can force SPy to redo a push or pull operation for a particular item or set of items using the redo family of functions:

spy.workbooks.job.redo(job_folder, example_analysis_workbook_id)
spy.workbooks.job.data.redo(job_folder, area_a_optimizer_id)

Zip/Unzip the Job Folder

If you are intending to transfer workbook information to another Seeq server, it is convenient to package up the job folder as a zip file. There are two functions for this purpose:

spy.workbooks.job.zip(job_folder, overwrite=True)
spy.workbooks.job.unzip(job_folder + '.zip', overwrite=True)

Detailed Help

All SPy functions have detailed documentation to help you use them. Just execute help(spy.<func>) like you see below.

Make sure you re-execute the cells below to see the latest documentation. It otherwise might be from an earlier version of SPy.

help(spy.workbooks.job.pull)
help(spy.workbooks.job.push)
help(spy.workbooks.job.data.pull)
help(spy.workbooks.job.data.manifest)
help(spy.workbooks.job.data.expand)
help(spy.workbooks.job.data.add)
help(spy.workbooks.job.data.remove)
help(spy.workbooks.job.data.calculation)
help(spy.workbooks.job.data.push)
help(spy.workbooks.job.redo)
help(spy.workbooks.job.data.redo)