import json
import os
import pandas as pd
import shutil
from seeq import spy
# Set the compatibility option so that you maximize the chance that SPy will remain compatible with your notebook/script
spy.options.compatibility = 193
# Log into Seeq Server if you're not using Seeq Data Lab:
spy.login(url='http://localhost:34216', credentials_file='../credentials.key', force=False)
Workbook Jobs
In spy.workbooks.ipynb, you can learn to push and pull workbooks (Workbench Analyses and Organizer Topics) to/from the Seeq service/server using SPy.
You may need to do something “in bulk,” in one of the following scenarios:
Re-mapping references (e.g. historian tags/signals) from one datasource to another, or one asset tree to another
Transferring work from one Seeq service/server to another, possibly including data
The set of functions in the spy.workbooks.job
module are suitable
for this work. Each function operates within a “job folder” that
captures the state of the job. Unlike spy.workbooks.pull()
and
spy.workbooks.push()
, the equivalent commands in
spy.workbooks.job
do not require all workbooks to be held in memory.
This allows very large jobs to be executed (as long as there is
sufficient disk space). All parts of the process are resumable, and
SPy will pick up where it left off if the operation is interrupted for
any reason (e.g. a network error).
This notebook will walk through the use of this module, referencing the scenarios above. In general, commands are executed in the following order:
spy.workbooks.job.pull()
spy.workbooks.job.data.pull()
(optional)spy.workbooks.job.push()
spy.workbooks.job.data.push()
(optional)
Establish the Job Folder
The parameter that defines a job is a job folder. It is the first argument for all job functions, and it is managed entirely by SPy. The folder is laid out in an intuitive way that allows you to inspect it, and, in some troubleshooting cases, make modifications yourself.
job_folder = 'Output/My First Workbooks Job'
# Remove the job folder so that old file/artifacts don't affect the tutorial
if os.path.exists(job_folder):
shutil.rmtree(job_folder)
Let’s Make Something to Work With…
We need some Analyses/Topics to work with for the purposes of demonstrating the functionality, so let’s make sure the example workbooks have been pushed.
example_workbooks = spy.workbooks.load('Support Files/Example Export.zip')
spy.workbooks.push(example_workbooks,
path='SPy Documentation Examples >> Workbook Job Import',
label=f'{spy.session.user.name} Workbook Job Example',
refresh=False,
errors='raise')
Pulling Workbooks
Start the job cycle by issuing the spy.workbooks.job.pull()
to grab
a set of workbooks and write them to disk.
As with spy.workbooks.pull()
, we create a DataFrame full of
workbooks to pull by using the spy.workbooks.search()
function. Then
we can supply that DataFrame to spy.workbooks.job.pull()
, which
takes many of the same parameters as spy.workbooks.pull()
.
workbooks_df = spy.workbooks.search({
'Path': 'SPy Documentation Examples >> Workbook Job Import'
})
# Store these in variables that we'll use later
example_analysis_workbook_id = workbooks_df[workbooks_df['Name'] == 'Example Analysis'].iloc[0]['ID']
example_topic_workbook_id = workbooks_df[workbooks_df['Name'] == 'Example Topic'].iloc[0]['ID']
workbooks_df
spy.workbooks.job.pull(job_folder, workbooks_df)
As mentioned earlier, jobs are resumable. If you execute the above
cell again, you will see that the Result column indicates
Already pulled
.
If you would like to force a job to redo its work, supply the
resume=False
argument. You can also inspect the job folder’s
Workbooks
subfolder and selectively delete workbook folders therein
to force SPy to re-pull workbooks.
Pushing Workbooks
As mentioned above, there are two primary scenarios where you want to push workbooks in bulk:
Re-mapping references (e.g. historian tags/signals) from one datasource to another, or one asset tree to another
Transferring work from one Seeq service/server to another, possibly including data
Datasource Maps
In either case, it’s important to understand the concept of datasource maps. These are JSON files that contain instructions for SPy as it maps the identifiers in the pulled workbook definitions to identifiers on the target system. These maps can incorporate relatively complex Regular Expression specifications that allow you to re-orient workbooks from one set of input data to another.
The spy.workbooks.job.pull()
command will create a
Datasource Maps
folder inside the job folder. There will be one file
for every datasource that was encountered during the pull operation – if
a workbook touched a datasource in some way, there will be a file for
it.
Here’s what a typical file looks like:
{
"Datasource Class": "Time Series CSV Files",
"Datasource ID": "Example Data",
"Datasource Name": "Example Data",
"Item-Level Map Files": [],
"RegEx-Based Maps": [
{
"Old": {
"Type": "(?<type>.*)",
"Datasource Class": "Time Series CSV Files",
"Datasource Name": "Example Data",
"Data ID": "(?<data_id>.*)"
},
"New": {
"Type": "${type}",
"Datasource Class": "Time Series CSV Files",
"Datasource Name": "Example Data",
"Data ID": "${data_id}"
}
}
]
}
You can make modifications to these files by loading them into an
editor, including Jupyter’s text editor. Generally the most common
action is to add or change entries in the RegEx-Based Maps
block.
That section is a list of dictionaries that each have an Old
and
a New
subsection. Within the Old
block, you can specify
properties to match on. The key is the property name and the value is a
regular
expression, often
employing a capture group. In the example above, the Data ID
field
is matching using the .*
regex and storing it in a capture group
called data_id
. The New
block then contains the properties and
values to search upon to “map” to target items. In the example above,
the "Data ID": "${data_id}"
specification just means that the Data
ID is being used “as-is” without any alteration.
(If you happen to be familiar with Connector Property Transforms, this regex approach may feel familiar.)
Let’s look at a more complicated example:
{
"Datasource Class": "Time Series CSV Files",
"Datasource ID": "Example Data",
"Datasource Name": "Example Data",
"Item-Level Map Files": [],
"RegEx-Based Maps": [
{
"Old": {
"Type": "(?<type>.*)",
"Datasource Class": "Time Series CSV Files",
"Datasource Name": "Example Data",
"Data ID": "(?<data_id>.*)"
},
"New": {
"Type": "${type}",
"Datasource Class": "Time Series CSV Files",
"Datasource Name": "Example Data",
"Data ID": "${data_id}"
}
},
{
"Old": {
"Type": "(?<type>.*)",
"Path": "Example >> Cooling Tower 1",
"Asset": "Area (?<subarea>[ABC])",
"Name": "(?<name>.*)"
},
"New": {
"Type": "${type}",
"Path": "Example >> Cooling Tower 2",
"Asset": "Area ${subarea}",
"Name": "${name}"
}
}
]
}
In this example there are two RegEx-Based Maps specified. The first map
is identical to the previous example, and it will be used first– if
there is not a match on the Old
regex specifications, then SPy will
move on to the next. The next map matches on a particular asset path
(Example >> Cooling Tower 1
) and a set of subareas (A
, B
, or
C
) and then maps them to the same area underneath
Example >> Cooling Tower 2
.
In this manner, you can use arbitrarily-complex mapping logic to accomplish the goal of re-mapping a workbook within the same Seeq server or properly mapping from one Seeq server to another.
Datasource Mapping in Action
Let’s run through an actual mapping scenario to see how it works and how to troubleshoot it when it goes wrong.
First we have to grab a couple of signal IDs so that we can use them later to illustrate some functionality.
area_a_temperature_id = spy.search({'Datasource Name': 'Example Data', 'Name': 'Area A_Temperature'}).iloc[0]['ID']
area_a_optimizer_id = spy.search({'Datasource Name': 'Example Data', 'Name': 'Area A_Optimizer'}).iloc[0]['ID']
Now we will write out a datasource map file that has, as its first map,
a New
block that will map to a Name
that does not exist. This
will let us see what happens both when the mapping is successful and
when there are errors.
datasource_map = {
"Datasource Class": "Time Series CSV Files",
"Datasource ID": "Example Data",
"Datasource Name": "Example Data",
"Item-Level Map Files": [],
"RegEx-Based Maps": [
{
"Old": {
"Type": "(?<type>.*)",
"Datasource Class": "Time Series CSV Files",
"Datasource Name": "Example Data",
"Name": "Area A_Optimizer"
},
"New": {
"Type": "${type}",
"Datasource Class": "Time Series CSV Files",
"Datasource Name": "Example Data",
"Name": "Area NonExistent_Optimizer"
},
# In this contrived example, if we match on the "Old" criteria, we don't want to continue to the next regex map
"On Match": "Stop"
},
{
"Old": {
"Type": "(?<type>.*)",
"Datasource Class": "Time Series CSV Files",
"Datasource Name": "Example Data",
"Data ID": "(?<data_id>.*)"
},
"New": {
"Type": "${type}",
"Datasource Class": "Time Series CSV Files",
"Datasource Name": "Example Data",
"Data ID": "${data_id}"
}
}
]
}
with open(os.path.join(job_folder, 'Datasource Maps', 'Datasource_Map_Time Series CSV Files_Example Data_Example Data.json'), 'w') as f:
json.dump(datasource_map, f)
Now we push to server using a label that is guaranteed to differentiate
our activity from other users. As you will see, there will be errors
reported in the Results
column because one item won’t be mapped.
push_df = spy.workbooks.job.push(job_folder,
path='SPy Documentation Examples >> Workbook Jobs',
label=f'{spy.session.user.name} Workbook Job Example',
errors='catalog')
You can see the error, but it’s not formatted very well. So let’s use a troubleshooting tool– the “explain” function on the returned DataFrame:
print(push_df.spy.item_map.explain(area_a_optimizer_id))
This detailed explanation is intended to give you a starting point for
troubleshooting. You can see the regex that was specified, the
property values that were matched on, the Capture groups that
resulted from the RegEx specifications, and the property values that
were subsequently searched for. Since Area NonExistent_Optimizer
does not exist, the explanation for RegEx-Based Map 0
says Item not
found on server.
Now let’s look at the explanation for a successful map
(Area A_Temperature
):
print(push_df.spy.item_map.explain(area_a_temperature_id))
Dummy Items
In cases where we couldn’t map successfully, we can tell SPy to create “dummy” items. A dummy item is a signal, condition or scalar that has all the properties of the original item but has no data. (We’ll show how to push data to dummy items later…)
Note the use of create_dummy_items=True
, and also note
resume=False
so that SPy tries to push the workbooks again:
push_df = spy.workbooks.job.push(
job_folder,
path='SPy Documentation Examples >> Workbook Jobs',
label=f'{spy.session.user.name} Workbook Job Example',
create_dummy_items=True,
resume=False,
errors='catalog')
Now there are no errors, because any item that couldn’t be mapped would be replaced by a dummy item.
Find the Example Analysis
row of the output table above and click on
the link in the URL
column to take a look at the resulting. You’ll
see that the Details Pane worksheet contains Area A_Optimizer,
which is a blank “dummy item”.
We can look at what dummy items were created by inspecting
push_df.spy.item_map.dummy_items
. You can see that the Name
is
the same as the original and important properties like
Maximum Interpolation
have made their way to the dummy item.
push_df.spy.item_map.dummy_items
Including Data
Dummy items are helpful, but they are “blank,” they do not have any data
associated with them. If you are transferring workbooks between servers
and the destination server doesn’t have access to the same datasources,
it is useful to be able transfer the data itself from the source server
to the dummy items on the destination server. A set of SPy functions is
provided in spy.workbooks.job.data
for this purpose.
As spy.workbooks.job.pull()
pulls workbook information, it also
tracks the usage of data items on Workbench Worksheets and in Organizer
Topic Documents. This information is collated and saved to disk as the
data manifest. You can inspect the manifest like so:
manifest_df = spy.workbooks.job.data.manifest(job_folder)
# Simplify the DataFrame so that it fits on the screen better
manifest_df[['ID', 'Path', 'Asset', 'Name', 'Start', 'End', 'Calculation']]
You can see Start
and End
columns that provide the overall time
bounds that were detected in the workbook data references. There is also
a Calculation
column that refines this broad time period into
specific “chunks” of data, defined as individual capsules and using the
within()
and touches()
Seeq Formula functions to pull data only
for those time periods – not just everything between Start
and
End
.
This manifest DataFrame can be fed directly into spy.pull()
but it
is recommended that you use spy.workbooks.job.data.pull()
like so:
spy.workbooks.job.data.pull(job_folder)
Data has now been added to the job folder for the time periods identified in the manifest and can be pushed to the dummy items like so:
spy.workbooks.job.data.push(job_folder)
If you refresh the page of the Details Pane within the Example Analysis, you’ll now see data for Area A_Optimizer.
However, if you move the Display Range to the left, you’ll see that Area A_Optimizer data only exists for the time period that was originally on the screen.
If you want to expand how much data is pulled for a particular item, you can execute a command to alter the manifest and then pull/push again:
# Expand the time periods for Area A_Optimizer by 2 weeks on either side
spy.workbooks.job.data.expand(job_folder, {'Name': 'Area A_Optimizer'}, by='2w')
spy.workbooks.job.data.pull(job_folder, resume=False)
spy.workbooks.job.data.push(job_folder, resume=False)
There are a series of functions to alter the manifest, and
spy.workbooks.job.data.expand()
- Expand the existing time periods.spy.workbooks.job.data.add()
- Add a specific time period.spy.workbooks.job.data.remove()
- Remove a specific time period.spy.workbooks.job.data.calculation()
- Apply a specific calculation, such as resample().
Documentation for these functions is found under the Detailed Help section below.
Redoing Specific Workbooks/Items
In the process of pushing and pulling, you will usually use the
errors='catalog'
flag, which means that errors will be enumerated
but the operation will keep going if at all possible. When you resume an
operation, those items that had errors will not be re-attempted, because
SPy (by default) assumes that you don’t care about them.
But you will often care about errors, and you will figure out how to fix
them (say, by altering a Datasource Map). You can force SPy to redo a
push or pull operation for a particular item or set of items using the
redo
family of functions:
spy.workbooks.job.redo(job_folder, example_analysis_workbook_id)
spy.workbooks.job.data.redo(job_folder, area_a_optimizer_id)
Zip/Unzip the Job Folder
If you are intending to transfer workbook information to another Seeq server, it is convenient to package up the job folder as a zip file. There are two functions for this purpose:
spy.workbooks.job.zip(job_folder, overwrite=True)
spy.workbooks.job.unzip(job_folder + '.zip', overwrite=True)
Detailed Help
All SPy functions have detailed documentation to help you use them. Just
execute help(spy.<func>)
like you see below.
Make sure you re-execute the cells below to see the latest documentation. It otherwise might be from an earlier version of SPy.
help(spy.workbooks.job.pull)
help(spy.workbooks.job.push)
help(spy.workbooks.job.data.pull)
help(spy.workbooks.job.data.manifest)
help(spy.workbooks.job.data.expand)
help(spy.workbooks.job.data.add)
help(spy.workbooks.job.data.remove)
help(spy.workbooks.job.data.calculation)
help(spy.workbooks.job.data.push)
help(spy.workbooks.job.redo)
help(spy.workbooks.job.data.redo)