Jump to content

Search the Community

Showing results for tags 'jedai'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • Lucd AI Platform Suite
    • JedAI Client
    • Python Client
    • Modeling Framework
    • General
  • Collaborate & Discuss
    • Questions & Answers
    • Data Management
    • Machine Learning
    • AI Solutions
    • Community Feedback & Requests
  • Health & Life Sciences's Discussion
  • Financial Services's Discussion
  • Retail & Consumer Packaged Goods's Discussion
  • Media & Entertainment's Discussion
  • Energy's Discussion
  • Manufacturing's Topics

Blogs

  • Lucd Team Blog
  • UX Club (Beta version only)'s Blog

Calendars

  • Community Calendar
  • Health & Life Sciences's Events
  • Financial Services's Events
  • Retail & Consumer Packaged Goods's Events
  • Media & Entertainment's Events
  • Energy's Events
  • Manufacturing's Events

Categories

  • Datasheets and White Papers
  • Press Releases
  • Industry Focus
    • Energy
    • Financial Services
    • Health and Life Services
    • Media and Entertainment
    • Manufacturing
    • Retail

Categories

  • JedAI Client
  • Python Client
  • Modeling Framework
  • General

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


About Me


Interests


Industry

  1. User Guide The Lucd JedAI Client is downloaded locally on your device and interfaces with the Lucd Platform. The client enables users to visualize, transform, and prepare data for use in modeling frameworks (TensorFlow, PyTorch, Scikit-learn, etc.). Models can be uploaded and trained in the platform, which is touchscreen capable (not required). System Requirements The following specifications are required in order to run the client. Windows or MacOS 4 GB Memory Modern CPU Recommended Configuration Although not required, we recommend the following specifications in order to maximize the performance of the client. A GPU to support accelerated rendering 1600x900 display resolution minimum Installation Instructions The client is distributed via Lucd’s Steam Store. A user is required to obtain a Steam account in order to access the client download. Usage Instructions Login Log in to the client using the credentials provided to you. Username Password Domain Cloud customers will leave the domain field blank when logging in. Private build customers will be provided a domain to use when logging in. Login - Click to submit login credentials and enter the application. New User - If this is your first time using Lucd, click here to register as a new user. Register a new user Generate password - Have Lucd suggest a password that meets the password requirements. Password requirements - Hover to view the Lucd password requirements Cannot reuse ANY old password. 2 instances of all character classes. Uppercase Lowercase Number Special: !@#$%^&*() No more than 2 characters from a class consecutive (123 is invalid). No repeating characters (33 is invalid). Register - Click to submit your details and return to the login screen. After registering a new user, you may immediately login with that username. Projects Immediately after login, the Projects view is displayed. A project is a handy way to group artifacts based on data science problem. Available Projects – a list of all projects the logged in user has access to open. Global Status – the status of all artifacts on the currently logged in system Federated Status – the number of online/offline artifacts of the given type Open Unallocated – begin using Lucd without an open project. Any artifacts created here will be saved as unallocated and still accessible from this button. Search – search for a project by name or description Grid/List view – change how the set of projects is displayed Project – select this item to see more details about the project Project Details – View the name, description and artifact counts for the selected project Open Project – Open the currently selected project Hovering over a project item will display the ‘Options’ menu. Edit Details – change the name and description of the hovered project Change Cover – select a meaningful cover photo (optional) Show Details – View project details on right side of screen Delete – Delete the project. Any artifacts allocated to that project will be moved to the ‘Unallocated’ space and can still be used. Navigation Upon login, a user will see some form of this menu bar at the top of the screen, depending on which view is currently open. Projects – close the currently open project and return to the ‘Projects’ view Data – Click to go to the Workflow space. Hover and select from Sources – view the sources visualization Query Builder – go to the query building tool Virtual Datasets – view a list of available virtual datasets for training Assets – Click to view a list of available embedding for training Modeling – Click to go to the ‘Modeling’ view to easily start a training run. Hover and select from Models – view a list of available models for training Federation – Click to view a list of currently connected federates and artifacts associated with each Project name – the name of the currently open project Federate – Hover to view the currently connected domain Username – the currently logged in user Minimize – Hide any open dockable panels to see the main view. Click again to unhide. View Log – Click to view a list of status messages on the Lucd system Settings – Click to edit various user and system settings Sources The Lucd client can show the user all available sources across the federation, as well as a data ingestion over time visualization. Sources table – list of all sources in federation Federate indicator – hover over to see which federates contain the source. If the indicator is missing, then the source only exists on the logged in domain. Refresh – refresh the displayed data Ingestion over time viz – color represents relative number of records ingested during a given time period for a source. Click a box to zoom into that time period across all sources. Back – Go up a time period (ex: Month to Year) Data After selecting a project, a user is taken to the Data Transform view, where queries can have EDA operations added to them and then built into a Virtual Dataset (VDS). Left Sidebar Saved Workflows – Each row represents a query that has been saved and is eligible to have EDA operations performed on it. Can be dragged and reordered on the ‘Active Workflows’ space. Federate – hover over to see the federates of all VDS’ contained in the saved workflow. If it is orange, then at least one of the VDS has an issue on a federate. If this icon is not visible, then all VDS’ on that workflow only exist on the logged in domain VDS – The number of VDS created from the given workflow Workflow name – This will appear red if any operations within the workflow have returned an error. The error will go away once the operation has returned successfully. Quick add – click to add the workflow to the 3D visualize space Begin a new query – Click to build a query inside query builder so that it can be saved to the Transform space Available operations – Click to begin adding an operation to a selected 3D node Workflow 3D Space Zoom – click and drag to zoom in and out of 3D space Arrow to node – click to move selection to a different node. Can also use arrow keys. Active Workflows – an ordered list of workflows currently displayed in 3D space. Can be dragged into a new order or clicked on to zoom to the root node of the selected workflow. Selected node – Select any node to see additional options Child node – children are displayed to the right of a parent node with lines connecting it. Query node Remove – remove the selected query and accompanying workflow from the 3D space Delete – delete the selected query and accompanying workflow. This cannot be undone. Edit – Reloads the query parameters into the query builder so it can modified and saved as a new query Preview Data – Execute the query and visualize the results Create VDS – Begins the process for creating a Virtual Dataset used for training Operation nodes Operation name Operation type Delete – delete the selected operation and all downstream operations in workflow. This cannot be undone. Will not execute if there is a VDS downstream. Preview Data – Execute the query all operations including this one and visualize the results. Create VDS – Begins the process for creating a Virtual Dataset used for training Virtual Dataset node VDS Name Delete – delete the selected VDS. This cannot be undone. Preview Data – Execute the query all operations leading to this VDS and visualize the results. Create Embedding - Only available with text models. Merge VDS - Click and drag to another Virtual Dataset to merge them together. Start training – Open the Modeling view to train with this VDS Federate – hover over to see the federates holding this VDS. If it is orange, then at least one of the federates is returning an error with the VDS. Navigating transforms with arrow keys Once a node is selected in the data transform space, you may use the arrow keys to navigate quickly between adjacent nodes. Collapsing nodes with double click Double clicking a node will collapse all children downstream from that node and add a superscript next to it, indicating how many nodes were collapsed. This can be useful in large, spread out trees. Rearranging active workflows Transform workflows in the active list can be rearranged in any order. This can be useful for comparing trees, or to bring two Virtual Datasets closer together to perform a merge. Preparing Text Data for Model Training Lucd provides special operations for easily preparing text data for model training, saving a model developer valuable time in manually coding routines for text transformation. After creating an EDA tree based on a query of a text data source, a developer can add a new operation to the tree based on NLP operations as shown above. NLP operations (e.g., stopword removal, whitespace removal, lemmatization) can be applied in any sequence. It’s important to select the correct facet as the “text attribute.” One can also elect to apply tokenization based on a document level (i.e., create one sequence of tokens for the entire facet value per record), or sentence level (i.e., create a token sequence per sentence in the facet for a record). Saving VDS with Processed Text When a developer wants to create a new virtual dataset including the transformed text data, they must choose the “processed_text” facet as the “sole” feature of the virtual dataset as shown below. Currently, Lucd does not support text model training incorporating multiple feature columns, only the “processed_text” facet must be selected. Applying Custom Operations Once custom operations have been defined and uploaded using the Lucd Python Client library, they are available in the GUI for usage in data transformation. As shown above, clicking on a custom operation will show further details, specifically the features the operation uses as well as the actual source code defining the op. As mentioned in the documentation for defining custom operations via the Lucd Python Client, one must select how to apply the operation based one of the following three Dask dataframe approaches: apply map_partitions applymap apply_direct - apply custom function directly on a dask dataframe Applying Image Operations To apply image operations, select the Image Ops tab within the New Op menu in an EDA tree. It’s important to select an image facet as the “Feature.” The currently provided operations are as follows: Vertical and horizontal flips Grayscale Contrast normalization Normalize (0 mean and unit variance) Resize width & height Color inversion Crop borders Gaussian blur Rotate Min-max scaling To array (converts binary data to Numpy Array) Reshape dimensions Query Builder The Lucd client offers a unique and intuitive way to query data, giving a user flexibility in how complex queries are strung together to retrieve exact results. Left Sidebar Sources – a list of available sources to query. This can be dragged into the node editor window. Quick add – click to add this source to the node editor window Federate status – Hover to see which federates that hold the source. If this icon does not show, then the source only exists on the currently logged in domain. Data Models – a list of available data models to query. This can be dragged into the node editor window. Quick add – click to add this data model to the node editor window View stats – click to view statistics of this particular data model View features – click to view the features of this particular data model Features – a list of features in this data model. This can be dragged into the node editor window Quick add – click to add this feature to the node editor window Federates – a list of available federates for filtering the query. Note: the currently logged in domain will ALWAYS return results regardless if it is selected. Node Editor Window Global search parameters – Click to view simple/advanced search filters Zoom – drag this slider or use the mouse wheel to zoom in and out of the node view Lucene syntax – a text representation of the search to be executed. Copy Lucene syntax – click to copy the Lucene syntax. This can be pasted into the global search parameters to customize a search with features not supported by the node editor. Search – Click to execute the search Save – Save the search for use in Transform workflow. Note that a search must be execute before it is saved. Group – Toggle, then click and drag around a set of nodes to add a grouping around them. This acts as a set of parentheses in the Lucene syntax. This function can also be accomplished by holding Shift + Left click + drag Refresh – Click to retrieve and repopulate the list of sources/data models/federates. Exit – Close the query builder. Any unsaved progress will be lost. Modify Node – Change node filter settings Delete Node Node connection dropdown – Click to select from AND/OR/XOR Node connector – click and drag to connect to another node or grouping Statistics – click to view statistics of last executed query Advanced Search Parameters All these words – search results must include all these words Lucene query – add a Lucene query that will take the place of whatever is in the Node Editor Window This exact phrase – search results must include this exact phrase None of these words – search results must not have any of these words Records per source/model - return this many records per source/model Total records to return - return at least this many total records Date range – search results must be from within this time period Randomize – results should be returned in a random order All Sources/Models - results should include a sample from every applicable source and data model Search Results Visualization panel – this will update with each search executed Federate distribution – a bar chart showing how many records were returned from each applicable federate Query statistics – each returned feature will show relevant statistics, and if applicable, a box plot to visualize. Adding a node and changing connection logic Nodes can be dragged into the workspace, or quickly added using the ‘+’ button on the left. The dropdown connecting two nodes or groups can be changed to AND/OR/XOR Grouping nodes Nodes can be grouped together using the ‘Group’ toggle at the top or by holding shift and dragging. Groupings will add parentheses around the selected node in the Lucene output. Manually connecting nodes Nodes can be manually connected and disconnected by clicking and dragging either of the two circles on the side of a node/group. Visualization General Query name – the name that was saved with the query Record count – number of records returned out of number total records across system that fit query Visualization selector – click each to change the visualization Quick Add – click to add another visualization window of the same data slice Maximize - click to expand the panel to full screen Table Feature/column names Histogram – lightweight visualization of numeric field distribution Top/unique value – for string types only Table row – click to see list of feature values Paging controls – Go forward or backward in results Scatterplot - 2D Axis selector – Select the axes from a list of available features Filter knobs – Drag these knobs to adjust axis filter. Drag away from plot to reset the axis Remove plot – Removes the plot from view Add new plot – Adds a new plot to view Box Plot Feature selector – Select the feature from a list of available features Remove plot – Removes the plot from view Add new plot – Adds a new plot to view Histogram Feature selector – Select the feature from a list of available features Filter knobs – Drag these knobs to adjust axis filter. Drag away from plot to reset the axis Remove plot – Removes the plot from view Add new plot – Adds a new plot to view Scatterplot - 3D Axis selector – Select the axes from a list of available features Filter knobs – Drag these knobs to adjust axis filter. Drag away from plot to reset the axis Scatterplot point – Select to view details. Double click to focus in on that point Reset view – Click to move camera back to starting view Drag – Orbit around focal point Mouse wheel – Zoom in/out Shift + Drag – Pan camera Ctrl + Drag – Look around Parallel Coordinate Plot Feature selector – Select the feature from a list of available features Add Feature – Click to add an additional feature to visualization Remove Feature – Click to remove a feature from visualization Reorder Features – Click and drag to reorder feature list Maximum Minimum Feature name Reset view – Click to move camera back to starting view Drag – Orbit around focal point Mouse wheel – Zoom in/out Shift + Drag – Pan camera Ctrl + Drag – Look around Correlation Matrix To see how each field relates to all the other fields, use a Correlation Matrix. Only numerical fields are displayed. Each bar is scaled on its y axis according to how its two contributing fields relate on a scale of –1 (red) to 1 (blue). Feature name Matrix bar - Click to see details about this specific feature pair. Reset view – Click to move camera back to starting view Drag – Orbit around focal point Mouse wheel – Zoom in/out Shift + Drag – Pan camera Ctrl + Drag – Look around Statistics Modeling Left Sidebar Models - Click and drag to training template model slot to begin training. Model library (PyTorch, Tensorflow, XGBoost, SKLearn, Federated Learning) Model framework (Simple, Advanced, Federated) Virtual Datasets - Click and drag to training template VDS slot to begin training. Federate status Assets - Click and drag to training template asset slot when a text model has already been added to begin training Show/Hide artifacts Refresh data Upload Model Training Template/Parameters VDS Slot - Drag a VDS from the left sidebar to one of these slots to set it for that phase of training. All VDS Slot - Drag a VDS here to set it to all three phases of training. Model Slot - Drag a model here to set it for training. Asset Slot - Drag an asset here when a text model has been selected to set it for training. Training Name - Give the training a name to find it easier at a later time. Default - Reset the value to the saved default. Save Defaults - Save all current values as the new default values. Reset all to defaults - Reset all changed values back to their saved defaults. Clear saved defaults - Reset all saved defaults back to factory settings. Training Parameters - Expand/Collapse parameters Dragging components to training template Models and Virtual Datasets can be dragged to the training template. Items in VDS slots can be rearranged. Right Sidebar Trainings - Click to see additional details. Model library (PyTorch, Tensorflow, XGBoost, SKLearn, Federated Learning) Status - If there is an error, click to see additional details. Training Details Start Train - Click to reload the training parameters to begin a restart Delete training Download - Click to download training artifacts as a .zip file View Profile - Click to see the training profile Show/Hide trainings Modeling Graph Model node VDS node Asset node Training connector - Click to use these artifacts in a new training Number of trainings - The number of trainings using this combination of artifacts Training Profile Performance Graph Available Plots Selector - Choose from a list of selected graphs Plot explanation - Get a description of the selected graph type Update interval - How often the graph should update in seconds. Default 100. Number of points displayed is limited to 1000 to keep updates consistent. Line Toggle - Disable this value Line Intersect - Click to freeze in place. Click again to unfreeze. Confusion Interactable Square -Click a square to see details about actual and predicted values. Values only displayed in square if greater than 0 Show Records - Toggle box values between percentages and record counts Histogram - Displays all predicted values for an actual value. Clicking a bar will update the table beneath it. Table - A tabular view of sample results from the selected prediction. Explainability Analysis Lucd provides the ability to visualize “explanations” of a model’s output given specific inputs. Generally, explanations take the form of computed attribute weights, indicating the significance that an attribute gave to a model’s decision. This supports the ability to either debug a model or scrutinize the data fed to the model. This particular feature is supported by integration of the Lime framework. The figures below illustrate the explainability panel on the model profile view for various model types. Explainability - Tabular/Regression For analyzing a tabular model, the user enters sample(s) into the input text box as a list of lists of numbers, where each inner “list” is a single sample. Then click the “Explain” button underneath the box. The time required to run explanation analysis is dependent on the complexity of the model. Models with type tabular_classification or regression can explain tabular data predictions Input Array Enter values to predict on. Must be valid JSON, as shown above % of Training Data Percentage of training data to build the explainer. Must be greater than 0 and less than or equal to 1 Number of Top Explanations Positive integer denoting how many class explanations to show Inputs Colored to show how each influences top class prediction Class Probabilities Class predictions and corresponding likelihood Explanation How each input influences a positive or negative prediction Explainability - Images Models with type image_classification can explain image predictions Sample Image Select local image to explain Positive Only If True, include only regions of the image contributing to the predicted label. Hide Rest If True, make the non-explanation part of the return image gray Explanation Returned colorized image with shaded regions of positive and negative influence. Red sections detract from the predicted class while green contributes positively to the predicted class. Predicted Probabilities Class predictions and corresponding likelihood Explainability - Text For text models, simply type the raw string you would like to have explained by your model. Models with type text_classification can explain text predictions. Input Text Text the user would like to predict and explain Output Text Output text with class probabilities highlighted in positive (green) or negative (blue) colors Predicted Probabilities Class probabilites predicted Explanation Words that contribute to positive or negative correlation Details Federated Lucd Release 6.5.0 introduces Federated Machine Learning to the Lucd platform. This capability introduces new features to the Lucd platform in order to support the development of federated models. Namely, if your Lucd platform is set up as part of a federation, many of the operations you perform within the JedAI client will automatically be federated. This includes: Query: if your query matches data on multiple systems, you will get results from all of those systems. EDA / search tree creation: saving your query into an eda tree will also create the eda tree on your other federates. VDS: a vds created containing data from a federated query will in turn be created on all federates containing relevant data. Model definition: model definitions uploaded you your JedAI GUI will also be created on other federates. Training object: when training a federated model, a corresponding training object will be created on all participating federates. Virtual Datasets Open transform - opens the transform workflow that created this VDS Copy ID - useful for finding VDS via RES API calls Create an embedding Delete - Delete a VDS Refresh - retrieves the latest VDS data Assets Delete an embedding Visualize - See embedding data on a PCA/TSNE chart Refresh - retrieves the latest Asset data PCA/TSNE Embeddings can be viewed using PCA/TSNE techniques for visualization. Style - When viewing an embedding’s PCA/TSNE, click to see terms instead of points. Region Select - Toggle to select a cluster of points using a bounding box. Multiple Select - Use to add multiple bounding boxes. Search - Search for a term. All matching terms will be highlighted, as well as shown in a list to the right until there is only one matching term. Filter - Narrow the number of occurrences for a term to a range using. Technique Select - Toggle between PCA and TSNE.
  2. User Guide The Lucd JedAI Client is downloaded locally on your device and interfaces with the Lucd Platform. The client enables users to visualize, transform, and prepare data for use in modeling frameworks (TensorFlow, PyTorch, Scikit-learn, etc.). Models can be uploaded and trained in the platform, which is touchscreen capable (not required). System Requirements The following specifications are required in order to run the client. Windows or MacOS 4 GB Memory Modern CPU Recommended Configuration Although not required, we recommend the following specifications in order to maximize the performance of the client. A GPU to support accelerated rendering 1600x900 display resolution minimum Installation Instructions The client is distributed via Lucd’s Steam Store. A user is required to obtain a Steam account in order to access the client download. Usage Instructions Login Log in to the client using the credentials provided to you. Username Password Domain Cloud customers will leave the domain field blank when logging in. Private build customers will be provided a domain to use when logging in. Login - Click to submit login credentials and enter the application. New User - If this is your first time using Lucd, click here to register as a new user. Register a new user Generate password - Have Lucd suggest a password that meets the password requirements. Password requirements - Hover to view the Lucd password requirements Cannot reuse ANY old password. 2 instances of all character classes. Uppercase Lowercase Number Special: !@#$%^&*() No more than 2 characters from a class consecutive (123 is invalid). No repeating characters (33 is invalid). Register - Click to submit your details and return to the login screen. After registering a new user, you may immediately login with that username. Projects Immediately after login, the Projects view is displayed. A project is a handy way to group artifacts based on data science problem. Available Projects – a list of all projects the logged in user has access to open. Global Status – the status of all artifacts on the currently logged in system Federated Status – the number of online/offline artifacts of the given type Open Unallocated – begin using Lucd without an open project. Any artifacts created here will be saved as unallocated and still accessible from this button. Search – search for a project by name or description Grid/List view – change how the set of projects is displayed Project – select this item to see more details about the project Project Details – View the name, description and artifact counts for the selected project Open Project – Open the currently selected project Hovering over a project item will display the ‘Options’ menu. Edit Details – change the name and description of the hovered project Change Cover – select a meaningful cover photo (optional) Show Details – View project details on right side of screen Delete – Delete the project. Any artifacts allocated to that project will be moved to the ‘Unallocated’ space and can still be used. Navigation Upon login, a user will see some form of this menu bar at the top of the screen, depending on which view is currently open. Projects – close the currently open project and return to the ‘Projects’ view Data – Click to go to the Workflow space. Hover and select from Sources – view the sources visualization Query Builder – go to the query building tool Virtual Datasets – view a list of available virtual datasets for training Assets – Click to view a list of available embedding for training Modeling – Click to go to the ‘Modeling’ view to easily start a training run. Hover and select from Models – view a list of available models for training Federation – Click to view a list of currently connected federates and artifacts associated with each Project name – the name of the currently open project Federate – Hover to view the currently connected domain Username – the currently logged in user Minimize – Hide any open dockable panels to see the main view. Click again to unhide. View Log – Click to view a list of status messages on the Lucd system Settings – Click to edit various user and system settings Sources The Lucd client can show the user all available sources across the federation, as well as a data ingestion over time visualization. Sources table – list of all sources in federation Federate indicator – hover over to see which federates contain the source. If the indicator is missing, then the source only exists on the logged in domain. Refresh – refresh the displayed data Ingestion over time viz – color represents relative number of records ingested during a given time period for a source. Click a box to zoom into that time period across all sources. Back – Go up a time period (ex: Month to Year) Data After selecting a project, a user is taken to the Data Transform view, where queries can have EDA operations added to them and then built into a Virtual Dataset (VDS). Left Sidebar Saved Workflows – Each row represents a query that has been saved and is eligible to have EDA operations performed on it. Can be dragged and reordered on the ‘Active Workflows’ space. Federate – hover over to see the federates of all VDS’ contained in the saved workflow. If it is orange, then at least one of the VDS has an issue on a federate. If this icon is not visible, then all VDS’ on that workflow only exist on the logged in domain VDS – The number of VDS created from the given workflow Workflow name – This will appear red if any operations within the workflow have returned an error. The error will go away once the operation has returned successfully. Quick add – click to add the workflow to the 3D visualize space Begin a new query – Click to build a query inside query builder so that it can be saved to the Transform space Available operations – Click to begin adding an operation to a selected 3D node Workflow 3D Space Zoom – click and drag to zoom in and out of 3D space Arrow to node – click to move selection to a different node. Can also use arrow keys. Active Workflows – an ordered list of workflows currently displayed in 3D space. Can be dragged into a new order or clicked on to zoom to the root node of the selected workflow. Selected node – Select any node to see additional options Child node – children are displayed to the right of a parent node with lines connecting it. Query node Remove – remove the selected query and accompanying workflow from the 3D space Delete – delete the selected query and accompanying workflow. This cannot be undone. Edit – Reloads the query parameters into the query builder so it can modified and saved as a new query Preview Data – Execute the query and visualize the results Create VDS – Begins the process for creating a Virtual Dataset used for training Operation nodes Operation name Operation type Delete – delete the selected operation and all downstream operations in workflow. This cannot be undone. Will not execute if there is a VDS downstream. Preview Data – Execute the query all operations including this one and visualize the results. Create VDS – Begins the process for creating a Virtual Dataset used for training Virtual Dataset node VDS Name Delete – delete the selected VDS. This cannot be undone. Preview Data – Execute the query all operations leading to this VDS and visualize the results. Create Embedding - Only available with text models. Merge VDS - Click and drag to another Virtual Dataset to merge them together. Start training – Open the Modeling view to train with this VDS Federate – hover over to see the federates holding this VDS. If it is orange, then at least one of the federates is returning an error with the VDS. Navigating transforms with arrow keys Once a node is selected in the data transform space, you may use the arrow keys to navigate quickly between adjacent nodes. Collapsing nodes with double click Double clicking a node will collapse all children downstream from that node and add a superscript next to it, indicating how many nodes were collapsed. This can be useful in large, spread out trees. Rearranging active workflows Transform workflows in the active list can be rearranged in any order. This can be useful for comparing trees, or to bring two Virtual Datasets closer together to perform a merge. Preparing Text Data for Model Training Lucd provides special operations for easily preparing text data for model training, saving a model developer valuable time in manually coding routines for text transformation. After creating an EDA tree based on a query of a text data source, a developer can add a new operation to the tree based on NLP operations as shown above. NLP operations (e.g., stopword removal, whitespace removal, lemmatization) can be applied in any sequence. It’s important to select the correct facet as the “text attribute.” One can also elect to apply tokenization based on a document level (i.e., create one sequence of tokens for the entire facet value per record), or sentence level (i.e., create a token sequence per sentence in the facet for a record). Saving VDS with Processed Text When a developer wants to create a new virtual dataset including the transformed text data, they must choose the “processed_text” facet as the “sole” feature of the virtual dataset as shown below. Currently, Lucd does not support text model training incorporating multiple feature columns, only the “processed_text” facet must be selected. Applying Custom Operations Once custom operations have been defined and uploaded using the Lucd Python Client library, they are available in the GUI for usage in data transformation. As shown above, clicking on a custom operation will show further details, specifically the features the operation uses as well as the actual source code defining the op. As mentioned in the documentation for defining custom operations via the Lucd Python Client, one must select how to apply the operation based one of the following three Dask dataframe approaches: apply map_partitions applymap apply_direct - apply custom function directly on a dask dataframe Applying Image Operations To apply image operations, select the Image Ops tab within the New Op menu in an EDA tree. It’s important to select an image facet as the “Feature.” The currently provided operations are as follows: Vertical and horizontal flips Grayscale Contrast normalization Normalize (0 mean and unit variance) Resize width & height Color inversion Crop borders Gaussian blur Rotate Min-max scaling To array (converts binary data to Numpy Array) Reshape dimensions Query Builder The Lucd client offers a unique and intuitive way to query data, giving a user flexibility in how complex queries are strung together to retrieve exact results. Left Sidebar Sources – a list of available sources to query. This can be dragged into the node editor window. Quick add – click to add this source to the node editor window Federate status – Hover to see which federates that hold the source. If this icon does not show, then the source only exists on the currently logged in domain. Data Models – a list of available data models to query. This can be dragged into the node editor window. Quick add – click to add this data model to the node editor window View stats – click to view statistics of this particular data model View features – click to view the features of this particular data model Features – a list of features in this data model. This can be dragged into the node editor window Quick add – click to add this feature to the node editor window Federates – a list of available federates for filtering the query. Note: the currently logged in domain will ALWAYS return results regardless if it is selected. Node Editor Window Global search parameters – Click to view simple/advanced search filters Zoom – drag this slider or use the mouse wheel to zoom in and out of the node view Lucene syntax – a text representation of the search to be executed. Copy Lucene syntax – click to copy the Lucene syntax. This can be pasted into the global search parameters to customize a search with features not supported by the node editor. Search – Click to execute the search Save – Save the search for use in Transform workflow. Note that a search must be execute before it is saved. Group – Toggle, then click and drag around a set of nodes to add a grouping around them. This acts as a set of parentheses in the Lucene syntax. This function can also be accomplished by holding Shift + Left click + drag Refresh – Click to retrieve and repopulate the list of sources/data models/federates. Exit – Close the query builder. Any unsaved progress will be lost. Modify Node – Change node filter settings Delete Node Node connection dropdown – Click to select from AND/OR/XOR Node connector – click and drag to connect to another node or grouping Statistics – click to view statistics of last executed query Advanced Search Parameters All these words – search results must include all these words Lucene query – add a Lucene query that will take the place of whatever is in the Node Editor Window This exact phrase – search results must include this exact phrase None of these words – search results must not have any of these words Records per source/model - return this many records per source/model Total records to return - return at least this many total records Date range – search results must be from within this time period Randomize – results should be returned in a random order All Sources/Models - results should include a sample from every applicable source and data model Search Results Visualization panel – this will update with each search executed Federate distribution – a bar chart showing how many records were returned from each applicable federate Query statistics – each returned feature will show relevant statistics, and if applicable, a box plot to visualize. Adding a node and changing connection logic Nodes can be dragged into the workspace, or quickly added using the ‘+’ button on the left. The dropdown connecting two nodes or groups can be changed to AND/OR/XOR Grouping nodes Nodes can be grouped together using the ‘Group’ toggle at the top or by holding shift and dragging. Groupings will add parentheses around the selected node in the Lucene output. Manually connecting nodes Nodes can be manually connected and disconnected by clicking and dragging either of the two circles on the side of a node/group. Visualization General Query name – the name that was saved with the query Record count – number of records returned out of number total records across system that fit query Visualization selector – click each to change the visualization Quick Add – click to add another visualization window of the same data slice Maximize - click to expand the panel to full screen Table Feature/column names Histogram – lightweight visualization of numeric field distribution Top/unique value – for string types only Table row – click to see list of feature values Paging controls – Go forward or backward in results Scatterplot - 2D Axis selector – Select the axes from a list of available features Filter knobs – Drag these knobs to adjust axis filter. Drag away from plot to reset the axis Remove plot – Removes the plot from view Add new plot – Adds a new plot to view Box Plot Feature selector – Select the feature from a list of available features Remove plot – Removes the plot from view Add new plot – Adds a new plot to view Histogram Feature selector – Select the feature from a list of available features Filter knobs – Drag these knobs to adjust axis filter. Drag away from plot to reset the axis Remove plot – Removes the plot from view Add new plot – Adds a new plot to view Scatterplot - 3D Axis selector – Select the axes from a list of available features Filter knobs – Drag these knobs to adjust axis filter. Drag away from plot to reset the axis Scatterplot point – Select to view details. Double click to focus in on that point Reset view – Click to move camera back to starting view Drag – Orbit around focal point Mouse wheel – Zoom in/out Shift + Drag – Pan camera Ctrl + Drag – Look around Parallel Coordinate Plot Feature selector – Select the feature from a list of available features Add Feature – Click to add an additional feature to visualization Remove Feature – Click to remove a feature from visualization Reorder Features – Click and drag to reorder feature list Maximum Minimum Feature name Reset view – Click to move camera back to starting view Drag – Orbit around focal point Mouse wheel – Zoom in/out Shift + Drag – Pan camera Ctrl + Drag – Look around Correlation Matrix To see how each field relates to all the other fields, use a Correlation Matrix. Only numerical fields are displayed. Each bar is scaled on its y axis according to how its two contributing fields relate on a scale of –1 (red) to 1 (blue). Feature name Matrix bar - Click to see details about this specific feature pair. Reset view – Click to move camera back to starting view Drag – Orbit around focal point Mouse wheel – Zoom in/out Shift + Drag – Pan camera Ctrl + Drag – Look around Statistics Modeling Left Sidebar Models - Click and drag to training template model slot to begin training. Model library (PyTorch, Tensorflow, XGBoost, SKLearn, Federated Learning) Model framework (Simple, Advanced, Federated) Virtual Datasets - Click and drag to training template VDS slot to begin training. Federate status Assets - Click and drag to training template asset slot when a text model has already been added to begin training Show/Hide artifacts Refresh data Upload Model Training Template/Parameters VDS Slot - Drag a VDS from the left sidebar to one of these slots to set it for that phase of training. All VDS Slot - Drag a VDS here to set it to all three phases of training. Model Slot - Drag a model here to set it for training. Asset Slot - Drag an asset here when a text model has been selected to set it for training. Training Name - Give the training a name to find it easier at a later time. Default - Reset the value to the saved default. Save Defaults - Save all current values as the new default values. Reset all to defaults - Reset all changed values back to their saved defaults. Clear saved defaults - Reset all saved defaults back to factory settings. Training Parameters - Expand/Collapse parameters Dragging components to training template Models and Virtual Datasets can be dragged to the training template. Items in VDS slots can be rearranged. Right Sidebar Trainings - Click to see additional details. Model library (PyTorch, Tensorflow, XGBoost, SKLearn, Federated Learning) Status - If there is an error, click to see additional details. Training Details Start Train - Click to reload the training parameters to begin a restart Delete training Download - Click to download training artifacts as a .zip file View Profile - Click to see the training profile Show/Hide trainings Modeling Graph Model node VDS node Asset node Training connector - Click to use these artifacts in a new training Number of trainings - The number of trainings using this combination of artifacts Training Profile Performance Graph Available Plots Selector - Choose from a list of selected graphs Plot explanation - Get a description of the selected graph type Update interval - How often the graph should update in seconds. Default 100. Number of points displayed is limited to 1000 to keep updates consistent. Line Toggle - Disable this value Line Intersect - Click to freeze in place. Click again to unfreeze. Confusion Interactable Square -Click a square to see details about actual and predicted values. Values only displayed in square if greater than 0 Show Records - Toggle box values between percentages and record counts Histogram - Displays all predicted values for an actual value. Clicking a bar will update the table beneath it. Table - A tabular view of sample results from the selected prediction. Explainability Analysis Lucd provides the ability to visualize “explanations” of a model’s output given specific inputs. Generally, explanations take the form of computed attribute weights, indicating the significance that an attribute gave to a model’s decision. This supports the ability to either debug a model or scrutinize the data fed to the model. This particular feature is supported by integration of the Lime framework. The figures below illustrate the explainability panel on the model profile view for various model types. Explainability - Tabular/Regression For analyzing a tabular model, the user enters sample(s) into the input text box as a list of lists of numbers, where each inner “list” is a single sample. Then click the “Explain” button underneath the box. The time required to run explanation analysis is dependent on the complexity of the model. Models with type tabular_classification or regression can explain tabular data predictions Input Array Enter values to predict on. Must be valid JSON, as shown above % of Training Data Percentage of training data to build the explainer. Must be greater than 0 and less than or equal to 1 Number of Top Explanations Positive integer denoting how many class explanations to show Inputs Colored to show how each influences top class prediction Class Probabilities Class predictions and corresponding likelihood Explanation How each input influences a positive or negative prediction Explainability - Images Models with type image_classification can explain image predictions Sample Image Select local image to explain Positive Only If True, include only regions of the image contributing to the predicted label. Hide Rest If True, make the non-explanation part of the return image gray Explanation Returned colorized image with shaded regions of positive and negative influence. Red sections detract from the predicted class while green contributes positively to the predicted class. Predicted Probabilities Class predictions and corresponding likelihood Explainability - Text For text models, simply type the raw string you would like to have explained by your model. Models with type text_classification can explain text predictions. Input Text Text the user would like to predict and explain Output Text Output text with class probabilities highlighted in positive (green) or negative (blue) colors Predicted Probabilities Class probabilites predicted Explanation Words that contribute to positive or negative correlation Details Federated Lucd Release 6.5.0 introduces Federated Machine Learning to the Lucd platform. This capability introduces new features to the Lucd platform in order to support the development of federated models. Namely, if your Lucd platform is set up as part of a federation, many of the operations you perform within the JedAI client will automatically be federated. This includes: Query: if your query matches data on multiple systems, you will get results from all of those systems. EDA / search tree creation: saving your query into an eda tree will also create the eda tree on your other federates. VDS: a vds created containing data from a federated query will in turn be created on all federates containing relevant data. Model definition: model definitions uploaded you your JedAI GUI will also be created on other federates. Training object: when training a federated model, a corresponding training object will be created on all participating federates. Virtual Datasets Open transform - opens the transform workflow that created this VDS Copy ID - useful for finding VDS via RES API calls Create an embedding Delete - Delete a VDS Refresh - retrieves the latest VDS data Assets Delete an embedding Visualize - See embedding data on a PCA/TSNE chart Refresh - retrieves the latest Asset data PCA/TSNE Embeddings can be viewed using PCA/TSNE techniques for visualization. Style - When viewing an embedding’s PCA/TSNE, click to see terms instead of points. Region Select - Toggle to select a cluster of points using a bounding box. Multiple Select - Use to add multiple bounding boxes. Search - Search for a term. All matching terms will be highlighted, as well as shown in a list to the right until there is only one matching term. Filter - Narrow the number of occurrences for a term to a range using. Technique Select - Toggle between PCA and TSNE. View full record
  3. Lucd Avicenna is the latest feature of the Lucd JedAI Client, look for the Epidemiology button on the application menu. Avicenna Model Predicts Impact of Pandemic Downstream Events can drive behavior just as behavior drives events. Machine Learning models today, whether classic machine learning models like linear regression or random forests, or 21st century neural networks, can be extremely valuable predictive engines. One shortcoming of many predictive models today, however, is the need for data -- lots of data. So what do when do when there's just not enough data to prime the pump, as it were? At Lucd, we suggest that other approaches are also quite useful depending on the context of the situation. In the case of Avicenna, a simulation model provides incredibly accurate predictions without the need for mountains of data. Simulation Models Simulation models can leverage valuable data to represent a discrete sequence of events in time. Whether it is pandemic response planning or preparedness for the next disruption an organization may face, simulation models can provide forward-looking insights. This can not only be done for a response during an event but also it can be used to address challenges post-event. As an example, how will hospitals and health organizations address their future? Lucd's Data Fusion, security, and scalability enable planning, actions, and mitigation for targeted and efficient command and control. Lucd is able to consume and ingest large data sets including real-time data capabilities. This can help organizations consume that data and create simulation models. chain and countless other impacted areas. Lucd builds advanced Enterprise AI solutions including advanced pandemic event-driven simulation modeling, AI-powered workforce insights, supply chain analytics, and more. Lucd calls this model, Avicenna. Imagine your organization, whether it is a retail outlet, a financial organization, manufacturing, or any other industry, empowered with the ability to leverage proven simulation models to help plan and prepare. Agent-Based Simulation Models As an agent-based simulation model, Avicenna is capable of analyzing effective data and variables to help businesses predict potential outcomes of events based on behavioral assumptions. Better planning for product releases, market entry decisions, staffing challenges, financial analysis, potential supply chain disruptions and much more can be modeled to help predict potential outcomes with greater accuracy. How is this done? The Avicenna model leverages industry-leading machine learning and deep learning capabilities that only the Lucd Enterprise AI platform delivers. Lucd JedAI Health Helps Hospitals and Health Organizations to Plan and Prepare The Pandemic Downstream In the wake of a series of black swan events like those unleashed during something as pervasive and debilitating as a pandemic, how can health organizations cope and plan for the inevitable challenges they will face? How will they handle a surge of elective surgeries? Staff planning for procedures? Restrictions/mandates while delivering the services? Through Lucd's JedAI Avicenna model and simulation capabilities in relation to today's myriad health crises, Lucd ingests real-time COVID-19 data coupled with hospital patient information, bed counts, fire department, police, and ambulance information, and more in order to look forward and enable planning and preparedness. As hospital organizations attempt to plan for the new normal, the Avicenna event-driven simulation model provides these outlooks. Lucd JedAI Retail All industries and business sectors have been negatively impacted by the pandemic. Retail outlets have been especially hit hard. Most retailers were shut down with reopening resulting in a new environment. Conforming to new standards and regulations will impact customer engagement, customer planning, product delivery, and more. Retailers will need to transform the way in which they support consumers and their entire way of business. How can they plan, prepare, react and have better insights as to what decisions they should make? Lucd JedAI Retail is able to analyze the industry, market, and mobility data of consumers for better planning and delivering necessary solutions, even in the wake of industry-transforming events like a pandemic. Customer service, product availability, and supply chain are other examples of how Lucd JedAI Retail can use the Avicenna event-driven simulation model to enable retailers to plan for macro or micro-market challenges. Lucd JedAI In Summary Organizations were not fully prepared for the impact of today's crisis. But hospital management is now be able to manage, staff, and plan for the likely surge of elective surgeries as the current crisis progresses. Retailers will be better able to manage new and emerging consumer behavior and interactions as the disease runs its natural course. Lucd JedAI Retail solutions will reduce cost, maximize revenue and mitigate risk as to the emerging new rules and behaviors that govern the post-pandemic world. Give your team the tools it needs to properly prepare. The Avicenna event-driven simulation model is now available for a free trial for your business. Simply download the Lucd JedAI client and try the Avicenna model to give your business the tools that it needs to make the best decisions. Lucd is pioneering the creation of Enterprise AI with its end-to-end platform. About this Software The Lucd JedAI Client provides an immersive easy to use user experience that facilitates a collaborative approach to Visual Analytics (understanding data) and Exploratory Data Analysis (preparing and transforming data for analysis) and is the primary mechanism to accomplish these tasks with a secure interface into the Lucd Unified Data Space (UDS). The stunning 3D UI accesses the UDS remotely via a secure network connection (SSL/TLS). Get the Lucd JedAI Client
  4. Welcome to the Lucd JedAI Client tutorials and resources. The Lucd JedAI Client is the result of years of R&D, built on top of a future-proof, state-of-the-art game platform that will scale across a wide array of devices, eventually including virtual reality interfaces. We hope you enjoy learning to use our client as much as we have enjoyed creating it. Mastering this interface is something you will want to accomplish as your organization fully engages in the data revolution upon us. With the Lucd JedAI Client and our peerless security, edge-to-edge ML/AI capabilities under the hood, we are confident you will accelerate the discovery of sustainable competitive advantage possibilities that lay dormant in the data your organization owns and generates today. Video Tutorials in this Series: 1. Client Overview: An overview of all the features and functions 2. Installation, Registration and Login: Everything you need to get you started on your own system 3. Starting and Managing Projects: Projects are the high-level containers you will use 4. Workspaces: Weaving assets together 5. Data Sources and Visualization: It's all about the data 6. Query Builder: SQL without the SQL 7. Data Visualization Features: Seeing the data 8. Data Transformation: ETL with the Lucd Client 9. Merging Data: Creating Virtual Data Sets 10. Modeling: Creating prediction machines 11. Model Profiling: Monitoring and controlling model performance Additional Resources Lucd JerdAI Client User Guide Release Notes
  5. Basic Exploratory Data Analysis in the Lucd Client Exploratory Data Analysis (EDA) is a key part of any model-building lifecycle. In this post, we will create a basic EDA workflow in the Lucd client platform. We will use the Iris dataset so that anyone with the Lucd client can follow along. The Lucd client is available as a free download on Steam. For a video walkthrough of many of these and other capabilities, refer to the JedAI client video tutorials. Preparation To generate a basic query to use the Iris dataset for EDA, within the Lucd client, either within a project or after selecting "Open Unallocated" from the main landing page: Select the “Data” > "Query Builder" from the horizontal menu on the top of the screen From the "Sources" menu, click the "+" next to "Iris" to add it on the grid From the "Data Models" menu, click the "+" next to "flower" to add it to the grid Select the “Search” button to display query results Once satisfied with the query results, click "X" to close (not shown) Select the "Save" icon and enter a name and description for the query, then click "Save" - we used "Iris_Query_1" as the name in this example The system will save the query and automatically drop it on the workspace. EDA in the Lucd Client Once the query is visible in the workspace: Preview the query output by selecting the "Preview Data" (eye) icon from below the magnifying glass (Optional) Click the maximize button in the top right corner of the Visualize window output (not shown) The table of the Iris query should now appear, similar in appearance to that shown in the image on the left. Table View The table view shows the fields available in the dataset – in this example: petal_length petal_width sepal_length sepal_width species Additionally, the table view shows: Other fields created upon consumption of the dataset into the Lucd client The federates that are supplying the data – in this case, only p1.lucd.ai (only relevant if accessing data across multiple systems in a Federated Machine Learning scenario) 3D Scatterplot The 3D scatterplot is arguably the most powerful visualization in the Lucd client, showing points in three-dimensional space. This visualization is particularly helpful spotting groupings (labels or classes) and how they can be potentially be predicted by different input variables. To view the 3D scatterplot, select the “Scatterplot - 3D” link. The user can click and drag the scatterplot in any direction to view how the points fall in three-dimensional space from any desired perspective. Thick dots in the scatterplot represent a point in three-dimensional space; thin gray dots represent points on two-dimensional planes corresponding to the intersection of two axes. As such, any single thick dot has multiple corresponding thin dots. The 3D scatterplot allows the user to change axis and color variables as desired. Simply use the dropdowns in the lower-right corner to do so. The view shown in the image on the left effectively clusters the different species of Irises. To generate that view, select the following: X Axis: petal_length Y Axis: petal_width Z Axis: sepal_length Color: species Helpful Hints: Set the Y-axis or Color to the label/class and test the X- and Z-axes using different values to visualize how they work individually and together to call out class differences. Selected axis variables can be categorical / discrete; they need not be continuous. In adjacent image, the user has selected the Y-axis variable to be the species. Note how the platform has assigned the species into three distinct groupings along the Y-axis. The visualization to the left clearly illustrates between-species differences, particularly with the one group in the lower-left corner, whose points are bunched at one end of the petal_length axis. Rotating the plot (dragging right-to-left) shows a similar bunching using the petal_width axis. Users of the Lucd platform can quickly test different combinations when searching for the best predictors using the 3D scatterplot. In the image on the left, the Z-axis is sepal_width; in the image on the right, it is petal_width - everything else is the same. Comparing the two, it appears that the sepal_width variable does not provide as much segmentation along its axis as does petal_width, providing a visual indicator that petal_width may indeed be a better classifier. Correlation Matrix The Correlation Matrix is another useful EDA tool. The plot allows the user to visualize correlations and/or identify covariates that may be useful for modeling. To view the Correlation Matrix, select the "Correlation Matrix" link within the Visualize window. The Correlation Matrix is based on the Pearson Correlation statistic between numeric variables: bar heights are equal to the value of the correlation between the indicated pairings. An example for the Iris data is shown to the left. Highly positive correlations are shown as purple bars. The tallest bar – running diagonally across the plot – represents a Pearson correlation coefficient of +1. The Iris dataset contains many fields with high positive correlations, so spotting the diagonal with the +1 bar height is not as intuitive as it is with many other datasets, however, one simple way to know the height of the bar is to align a variable on the X-axis with the corresponding variable on the Y-axis and examining the resulting bar’s height. Using the power of the Lucd client’s Unity-based engine, the user can also rotate the plot to the desired view to check bar heights. Clicking on a specific bar will toggle the display of the Pearson correlation statistic for the two variables, ranging from -1 (perfect negative correlation) to +1 (perfect positive correlation). Short bars can be meaningful in these plots. They may indicate strong negative correlations. Visually examining for colors that differ most from the purple corresponding to the top bar height (Pearson correlation coefficient of +1) along the diagonal, we can spot a progressively contrasting color when more disagreement exists, indicative of strong negative Pearson correlation coefficient values. Examine, for instance, the relationship in the lower left corner between sepal_width and petal_length. Negative correlations can be useful in model-building - sometimes more useful than their positive counterparts. Histogram Histograms are useful to analyze the distribution of a numeric variable. To view the Histogram, select the "Histogram" link within the Visualize window. Select the variable for which you would like to see a histogram using the dropdown at the top of the window. The user can add histograms by clicking on the “+” button in the lower right portion of the window. Any plot can be closed by clicking the “x” at the top right. The histogram to the left shows that the petal_length variable has a bit of a gap somewhere around the 2.4 value – this could be an indicator of a potentially useful predictor for the species label. 2D Scatterplot 2D scatterplots are useful to visualize the relationship between numeric variables. Visualizing across a third dimension is also possible using the Color variable. To view 2D Scatterplots, select the "Scatterplot - 2D" link within the Visualize window. An example is shown in the image to the left. To change the selected variable, click on the variable along either the X- or Y-axis – or the Color variable on the right side of the chart – and choose the desired selection. The example shows that categorical variables can be selected – and are particularly useful for the Color selection. Clicking on the filter button below the selector for the color bar variable allows the user to filter for specific items within the selected variable. The user can add scatterplots by clicking on the “+” to the right of the existing plot. Any plot can be closed by clicking the “x” below the plot. Box Plot Box plots are useful to analyze the distribution of numeric variables, particularly when the user is interested in a visual representation of where the data points fall within quartiles. To view a box plot, select the "Box Plot" link within the Visualize window. Select the desired variable using the dropdown at the top of the plot. The user can add box plots by clicking on the “+” to the right of the existing plot. Any plot can be closed by clicking the “x” adjacent to the variable selection. The box plot to the left shows that the data in the petal_length variable are heavily concentrated between the median (corresponding to a value of 4.05 units) and third quartiles (5.1). The user can quickly see that this gap of roughly one-third of a unit is small compared to the adjacent lower quartile, which spans nearly three full units – another potentially useful predictor for the species label. Parallel Coordinate Plot Parallel coordinate plots are powerful EDA tools. To view a Parallel Coordinate plot, select the "Parallel Coordinate Plot" link within the Visualize window. Select desired plot variables using the dropdowns at the lower right of the window. Add variables to the plot by clicking on the “+” button just above the variable selection boxes. Rearrange the order of items in the list by clicking and dragging the handles on the far right of the list of selected fields. The view can be reset using the button just to the left of the "+" button. The Parallel Coordinate plot to the left shows the species field on the far right. Clearly, the three different species are separated and easily distinguishable – and categorical variables such as species can be useful, particularly on one end of a plot. From there, it is helpful to look for patterns in slopes or groups that clearly lie in different sections of the visual than others. We can see clearly that the versicolor species is generally in the middle of the petal_width and petal_length distributions, while the setosa species is at the bottom of both. The other species (virginica) – though not identified on the plot per se – typically has the highest petal_length and petal_width. So both these factors may be useful in a prediction model. Conversely, if we had seen these points all overlapping with inconsistent slopes from panel to panel of the plot, we would conclude that these variables would not be good species predictors. It's often helpful to rearrange the order of the class variable ("species" in this example) to take more directly examine fields that correlate to it.
  6. Welcome to the Lucd JedAI Client tutorials and resources. The Lucd JedAI Client is the result of years of R&D, built on top of a future-proof, state-of-the-art game platform that will scale across a wide array of devices, eventually including virtual reality interfaces. We hope you enjoy learning to use our client as much as we have enjoyed creating it. Mastering this interface is something you will want to accomplish as your organization fully engages in the data revolution upon us. With the Lucd JedAI Client and our peerless security, edge-to-edge ML/AI capabilities under the hood, we are confident you will accelerate the discovery of sustainable competitive advantage possibilities that lay dormant in the data your organization owns and generates today. Video Tutorials in this Series: 1. Client Overview: An overview of all the features and functions 2. Installation, Registration and Login: Everything you need to get you started on your own system 3. Starting and Managing Projects: Projects are the high-level containers you will use 4. Workspaces: Weaving assets together 5. Data Sources and Visualization: It's all about the data 6. Query Builder: SQL without the SQL 7. Data Visualization Features: Seeing the data 8. Data Transformation: ETL with the Lucd Client 9. Merging Data: Creating Virtual Data Sets 10. Modeling: Creating prediction machines 11. Model Profiling: Monitoring and controlling model performance Additional Resources Lucd JerdAI Client User Guide Release Notes View full record
  7. Model Profiling, Maintenance and Explainability Creating a model is one thing. Measuring model performances is something else altogether. Profiling model performance is necessary during model creation, which is essentially an experimental process. But once placed into production, the performance of a model can decay over time as data changes. And just as important is our ability to explain why models behave as they do -- how did my model arrive at the specific prediction it did given the data? Accommodating both the creation process and the maintenance phase, the Lucd Client gives us the necessary visibility into model metrics such as to easily profile model performance. We can know if and when we've hit thresholds of acceptability or if we need to keep refining data and training. By the same token, we can know when it's time to retire a production model with something a little more resilient. Something all of us in the industry are faced with is the need to explain how and why models made their predictions. Model Review Boards with large organizations are becoming more common. To more easily provide the documentation and assurances such bodies require, the Lucd Client Model Profiling interfaces give users significant value in that regard, making review and regulatory reporting so much easier than would otherwise be the case. The video covers how to analyze and profile your models both during and after training.
  8. Merging Data In the last video we covered ETL/ELT capabilities of the Lucd Client. But that was just the beginning. It's a given that a rich set of data transformation capabilities be included in any enterprise-worthy platform. Extracting data from a litany of sources, transforming data to better fit downstream analytics, loading data into long-term storage, extracting again, transforming again, reloading as needed -- those general capabilities are certainly enabled by the Lucd Client and underlying platform. But the real strength of our work isn't just our data transformation capabilities -- it's what we can do with an ensemble of transformed data once we have it. Virtual Data Sets (VDSs) are what you really want when it comes to exploring the possibilities of the potential energy inherent in your data. Virtual Data Sets A Virtual Data Set or VDS is an in-memory result of a set of queries, extractions, transformations, or workflows. In essence, a VDS is the really fast data structure that contains the results of all the previous ETL/ELT work that came before it, which were combined to create the VDS. The really great thing about ETL/ELT is that we can ingest and modify data to better fit downstream analytics purposes. But if what if we want to use multiple data sources? Multiple queries? What if by combining the data transformation results of several queries we can create data workflows that quickly provide insights not possible without multiple intermediate storage steps and weeks of coding and testing? That's the real benefit of the VDS implementation we provide: fast, efficient, cleansed, transformed, combinatorial data sets -- in memory. From oil in the ground to gas in the tank. From wind on the sea to electricity in your socket. That's what a VDS can do. In this video of the Lucd client tutorial series, we cover data merging and discuss capabilities and features of VDSs. NEXT CHAPTER: MODELING
  9. Getting Started In the previous chapter we discussed a few design principles behind the implementation of FML and backend architecture of the Lucd platform. What you see, however, is the Lucd Client -- the application that resides on your laptop or tablet, that allows you to create virtual data sets, train and deploy models, and weave together knowledge hidden in a wide array of enterprise data silos in a secure and cost-effective manner. The Lucd Client was envisioned to be a future-proof solution for human interaction, with today's devices and ready for tomorrow's innovations. If the Lucd Client looks different from what you may be used to in an enterprise software application, there's a reason for that: we designed it to be different. We wanted to be scalable, just as our back-end architecture is scalable. But for the client, that means device flexible as well as temporally scalable -- an interface that will work well on today's wide array of devices and be just as cool and awesome on future devices. Today you're reading this on your laptop or tablet or desktop screen. Maybe you're on a smartphone. But soon, large, wall-sized monitors will be needed to represent and contain the explosion of data sources, silos, and collections for the enterprise ecosystem. Soon 3d, holodeck-like experiences will migrate to enterprise applications out of necessity. We recognize and welcome this arc of innovation from which we all benefit, And at Lucd we have created a client interface that will scale over time to meet the needs of the enterprise ecosystem today and tomorrow. Unity and Steam Built on the Unity gaming platform, deployed on the Steam Community gaming hub, the Lucd Client gives users a modern, familiar yet futuristic experience that will accompany users to future devices and even immersive experiences. The game-like interface of the Lucd Client is not a nice-to-have; when it comes to being an investment in the future, it's a must-have. Always improving, we're proud of the breakthrough and innovations we have captured in the client and we are confident it will play a key role in removing barriers to adoption that might be keeping many of your own enterprise users from becoming ML/AI contributors in a meaningful way. Downloading the Lucd Client 1. Download Steam from here. 2. Setup a Steam account. This can be done by running Steam on your machine and selecting “Create A New Account.” 3. Once Steam is setup, login and open the “Store.” 4. Search for “Lucd.” There should only be one result. 5. Select “Install Game” and your download should start 6. Go to your “Library” (next to Store) and you can launch the Lucd Client. One you have your Lucd Client installed on your system you are ready to login. Here is quick video showing how to log in to the application, explains what the Domain field is, and covers registration for new users. NEXT CHAPTER: PROJECTS
  10. Basic Exploratory Data Analysis in the Lucd Client Exploratory Data Analysis (EDA) is a key part of any model-building lifecycle. In this post, we will create a basic EDA workflow in the Lucd client platform. We will use the Iris dataset so that anyone with the Lucd client can follow along. The Lucd client is available as a free download on Steam. For a video walkthrough of many of these and other capabilities, refer to the JedAI client video tutorials. Preparation To generate a basic query to use the Iris dataset for EDA, within the Lucd client, either within a project or after selecting "Open Unallocated" from the main landing page: Select the “Data” > "Query Builder" from the horizontal menu on the top of the screen From the "Sources" menu, click the "+" next to "Iris" to add it on the grid From the "Data Models" menu, click the "+" next to "flower" to add it to the grid Select the “Search” button to display query results Once satisfied with the query results, click "X" to close (not shown) Select the "Save" icon and enter a name and description for the query, then click "Save" - we used "Iris_Query_1" as the name in this example The system will save the query and automatically drop it on the workspace. EDA in the Lucd Client Once the query is visible in the workspace: Preview the query output by selecting the "Preview Data" (eye) icon from below the magnifying glass (Optional) Click the maximize button in the top right corner of the Visualize window output (not shown) The table of the Iris query should now appear, similar in appearance to that shown in the image on the left. Table View The table view shows the fields available in the dataset – in this example: petal_length petal_width sepal_length sepal_width species Additionally, the table view shows: Other fields created upon consumption of the dataset into the Lucd client The federates that are supplying the data – in this case, only p1.lucd.ai (only relevant if accessing data across multiple systems in a Federated Machine Learning scenario) 3D Scatterplot The 3D scatterplot is arguably the most powerful visualization in the Lucd client, showing points in three-dimensional space. This visualization is particularly helpful spotting groupings (labels or classes) and how they can be potentially be predicted by different input variables. To view the 3D scatterplot, select the “Scatterplot - 3D” link. The user can click and drag the scatterplot in any direction to view how the points fall in three-dimensional space from any desired perspective. Thick dots in the scatterplot represent a point in three-dimensional space; thin gray dots represent points on two-dimensional planes corresponding to the intersection of two axes. As such, any single thick dot has multiple corresponding thin dots. The 3D scatterplot allows the user to change axis and color variables as desired. Simply use the dropdowns in the lower-right corner to do so. The view shown in the image on the left effectively clusters the different species of Irises. To generate that view, select the following: X Axis: petal_length Y Axis: petal_width Z Axis: sepal_length Color: species Helpful Hints: Set the Y-axis or Color to the label/class and test the X- and Z-axes using different values to visualize how they work individually and together to call out class differences. Selected axis variables can be categorical / discrete; they need not be continuous. In adjacent image, the user has selected the Y-axis variable to be the species. Note how the platform has assigned the species into three distinct groupings along the Y-axis. The visualization to the left clearly illustrates between-species differences, particularly with the one group in the lower-left corner, whose points are bunched at one end of the petal_length axis. Rotating the plot (dragging right-to-left) shows a similar bunching using the petal_width axis. Users of the Lucd platform can quickly test different combinations when searching for the best predictors using the 3D scatterplot. In the image on the left, the Z-axis is sepal_width; in the image on the right, it is petal_width - everything else is the same. Comparing the two, it appears that the sepal_width variable does not provide as much segmentation along its axis as does petal_width, providing a visual indicator that petal_width may indeed be a better classifier. Correlation Matrix The Correlation Matrix is another useful EDA tool. The plot allows the user to visualize correlations and/or identify covariates that may be useful for modeling. To view the Correlation Matrix, select the "Correlation Matrix" link within the Visualize window. The Correlation Matrix is based on the Pearson Correlation statistic between numeric variables: bar heights are equal to the value of the correlation between the indicated pairings. An example for the Iris data is shown to the left. Highly positive correlations are shown as purple bars. The tallest bar – running diagonally across the plot – represents a Pearson correlation coefficient of +1. The Iris dataset contains many fields with high positive correlations, so spotting the diagonal with the +1 bar height is not as intuitive as it is with many other datasets, however, one simple way to know the height of the bar is to align a variable on the X-axis with the corresponding variable on the Y-axis and examining the resulting bar’s height. Using the power of the Lucd client’s Unity-based engine, the user can also rotate the plot to the desired view to check bar heights. Clicking on a specific bar will toggle the display of the Pearson correlation statistic for the two variables, ranging from -1 (perfect negative correlation) to +1 (perfect positive correlation). Short bars can be meaningful in these plots. They may indicate strong negative correlations. Visually examining for colors that differ most from the purple corresponding to the top bar height (Pearson correlation coefficient of +1) along the diagonal, we can spot a progressively contrasting color when more disagreement exists, indicative of strong negative Pearson correlation coefficient values. Examine, for instance, the relationship in the lower left corner between sepal_width and petal_length. Negative correlations can be useful in model-building - sometimes more useful than their positive counterparts. Histogram Histograms are useful to analyze the distribution of a numeric variable. To view the Histogram, select the "Histogram" link within the Visualize window. Select the variable for which you would like to see a histogram using the dropdown at the top of the window. The user can add histograms by clicking on the “+” button in the lower right portion of the window. Any plot can be closed by clicking the “x” at the top right. The histogram to the left shows that the petal_length variable has a bit of a gap somewhere around the 2.4 value – this could be an indicator of a potentially useful predictor for the species label. 2D Scatterplot 2D scatterplots are useful to visualize the relationship between numeric variables. Visualizing across a third dimension is also possible using the Color variable. To view 2D Scatterplots, select the "Scatterplot - 2D" link within the Visualize window. An example is shown in the image to the left. To change the selected variable, click on the variable along either the X- or Y-axis – or the Color variable on the right side of the chart – and choose the desired selection. The example shows that categorical variables can be selected – and are particularly useful for the Color selection. Clicking on the filter button below the selector for the color bar variable allows the user to filter for specific items within the selected variable. The user can add scatterplots by clicking on the “+” to the right of the existing plot. Any plot can be closed by clicking the “x” below the plot. Box Plot Box plots are useful to analyze the distribution of numeric variables, particularly when the user is interested in a visual representation of where the data points fall within quartiles. To view a box plot, select the "Box Plot" link within the Visualize window. Select the desired variable using the dropdown at the top of the plot. The user can add box plots by clicking on the “+” to the right of the existing plot. Any plot can be closed by clicking the “x” adjacent to the variable selection. The box plot to the left shows that the data in the petal_length variable are heavily concentrated between the median (corresponding to a value of 4.05 units) and third quartiles (5.1). The user can quickly see that this gap of roughly one-third of a unit is small compared to the adjacent lower quartile, which spans nearly three full units – another potentially useful predictor for the species label. Parallel Coordinate Plot Parallel coordinate plots are powerful EDA tools. To view a Parallel Coordinate plot, select the "Parallel Coordinate Plot" link within the Visualize window. Select desired plot variables using the dropdowns at the lower right of the window. Add variables to the plot by clicking on the “+” button just above the variable selection boxes. Rearrange the order of items in the list by clicking and dragging the handles on the far right of the list of selected fields. The view can be reset using the button just to the left of the "+" button. The Parallel Coordinate plot to the left shows the species field on the far right. Clearly, the three different species are separated and easily distinguishable – and categorical variables such as species can be useful, particularly on one end of a plot. From there, it is helpful to look for patterns in slopes or groups that clearly lie in different sections of the visual than others. We can see clearly that the versicolor species is generally in the middle of the petal_width and petal_length distributions, while the setosa species is at the bottom of both. The other species (virginica) – though not identified on the plot per se – typically has the highest petal_length and petal_width. So both these factors may be useful in a prediction model. Conversely, if we had seen these points all overlapping with inconsistent slopes from panel to panel of the plot, we would conclude that these variables would not be good species predictors. It's often helpful to rearrange the order of the class variable ("species" in this example) to take more directly examine fields that correlate to it. View full record
  11. The Importance of Data Visualization Human brains process images 60,000 times faster than text. So data visualization is important for us humans to learn. We humans are pattern seers. We see faces in clouds. We see symmetry and beauty and patterns in nature. And we're hard-wired to see patterns a whole lot faster than we can process text or manipulate numbers. Machines don't need data visualization to learn from data. We humans do. But we need machines that can help us produce visually consumable data, that we might learn better from our machines. Data visualization is the representation of data or information in a graph, chart, or other visual image. Visualizing data communicates relationships of data using images. This is so important, especially today, because it allows trends and patterns to be more easily seen. With the rise of Big Data and the generation of mountains of terabytes on a daily basis, we need to be able to understand and interpret increasingly larger batches of data. ML/AI makes it easier to digest, analyze and predict, but if and only if we can represent all that data in succinct images. Data visualization is not only important for data scientists and data analysts, it is necessary to understand data visualization in any career. Whether you work in finance, marketing, tech, design, or any role in the modern organization, you need to visualize data. That fact showcases the importance of data visualization and one of the key differentiators for the Lucd Client. In the Lucd Client we have included some of the most fundamental visualization techniques used in data science today, baked-in in order to provide fast and efficient understanding of patterns in data without having to write a line of code. In this video of the Lucd client tutorial series, we cover the available data visualization types and what they offer. We also cover how to make basic adjustments within these visualizations. NEXT CHAPTER: DATA TRANSFORMATION
  12. First A Project You downloaded and installed the Lucd Client on your local system. You logged in. You're ready to dive in. Everything we do with the platform starts with a Project. Projects are the high-level containers which allow us to create and manage various assets (such as data, virtual data sets, work flows, data transformations and models) in order to achieve the results we desire. The Lucd Client comes with several example projects to help get you started. You'll want to know how to navigate through Projects and underlying assets from the Projects screen, which is the first screen you see when you login. In this video we will go over the Projects screen and the associated controls in that view. NEXT CHAPTER: WORKSPACES
  13. Workspaces The various assets that comprise a Project need to be well-managed and woven together in such a manner as to allow for the complex combinations of ingredients which can be easily and securely harnessed in a useful application. In today's enterprise, islands of data (let's call them Silos) exist, often separated by distance, governance policies, storage policies, encryption, security considerations, varying international regulations and intermittent network failures. Data Silos have been requirements, and also a growing problem in enterprise IT implementations for decades, dating back to the early mainframe era. While more modern innovations such as data warehouses, Big Data, and cloud storage have, to some degree, softened the hard walls of many silos, large organizations still wrestle with the need for secrecy and at the same time the need for transparency. It is in this context -- the transformation of silo data to secure ML/AI asset -- that the concept of Lucd Workspaces should be considered. In this video we explain how to access more of the features of the Lucd Client regarding Workspaces in general and how to manage assets. NEXT CHAPTER: DATA SOURCES
  14. Data Sources and Visualization The importance of data in the most recent years of the Network Age cannot be understated. Data drives everything: every innovation, every decision, every discovery, productivity increase, quality improvement or disruptive discovery is rooted in the data we can usefully gather, distill and assemble. The more an organization can collect and make use of the potential energy inherent in their data, the more that organization with thrive. Securing, managing, and clearly understanding the vast quantities of data each organization owns is a daunting problem. And only then does the opportunity to find value in that data become possible. They key to fully understanding data is our ability to visualize the story data can provide. The Lucd Client gives enterprise users the ability to manage hundreds of data sources from Silos all over the world. Add to that the immediate, baked-in visualization tools the platform provides, and that specific value of the Lucd Client, for simply gathering and visualizing data, is apparent. But that's just one small part of what we do. In this video, we go over the Lucd Client tools for managing data sources. We also cover some of the data visualizations of those sources as well. These are all methods that can help keep track of your data usage and troubleshoot problems. NEXT CHAPTER: QUERY BUILDER
  15. The Lucd Client Query Builder What is a query? In human communication, a query is a request for information. But in computer programming, it refers to something similar, except the information is retrieved from a database. In other words, a database query refers to a request for data from one or more databases. However, writing a query requires a set of pre-defined code to make the database understand the instruction. This concept is also known as the query language. Structured Query Language, or SQL, has been around since the early 1970s. Typically SQL is used to query structured data sources. But in the last decade significant progress has been made toward the integration of unstructured data with structured data such as to enrich applications. Structured data typically means all data stored in a traditional relational database. Unstructured data is pretty much everything else. The vast majority of data in the world is unstructured -- many experts peg that number to be between 80-90%. In other words, MOST of the data in your organization is not subject to traditional SQL queries. Although much progress has been made in the last decade bringing unstructured data into the more useful realm of structured data (think schema on read), that still leaves us with the requirement to employ highly-skilled DBAs to master the complexities of SQL over large and diverse datasets. SQL is a great tool for a database or two or three. But try creating a SQL query over a hundred different databases at a dozen different locations merged with another hundred unstructured data sources, and the limits of SQL quickly become apparent. And that sort of scenario is precisely the universe of data we would hope to harness in our intelligent applications. Hence: Query Builder In this tutorial video, we go over the Query Builder. We cover a few different query examples with increasing complexity. We also cover all of the informational displays and interfaces in this workspace. NEXT CHAPTER: DATA VISUALIZATION
  16. Data Transformation In computing, extract, transform, load (ETL) is the rather common practice of copying data from one or more sources into a destination system which represents the data differently from the source(s) or in a different context than the source(s). ETL processes became a popular concept in the 1970s when data warehousing was becoming an important choice for organizations facing ever-increasing amounts of data. Even then the problems and opportunities that accompany the generation of digital information was becoming paramount. With the advent of Big Data in the first decade of the 21st century, innovations like schema-on-read changed things up a bit. While the order of operations many have changed in some instances -- from ETL to ELT -- the basic concepts remain. Data extraction involves extracting data from homogeneous or heterogeneous sources. Data transformation processes data by cleansing and transforming it into a proper storage format or structure for the purposes of querying and analysis. And finally, data loading describes the insertion of data into the final target database or other storage formats, such as an operational data store, a data mart, data lake or a data warehouse. ETL and/or ELT are essential capabilities for any organization that would take full advantage of the potential energy inherent in their data. As such, any ML/AI platform employed by the enterprise must include a rich set of features and functions, prepackaged, easy to use, and workflow-ready components that will allow even non-technical users to perform those essential functions. With the Lucd Client, we've got you covered. This video is an introduction to the wide array of ETL/ELT capabilities we offer in the platform. NEXT CHAPTER: MERGING DATA
  17. Modeling Everything we have covered to this point has been pointing this this: models. The whole point of what we're doing is to harness the potential energy inherent in our data to create models that will accurately predict. Models are prediction machines. The entire thesis of ML/AI is this one salient fact: when we accurately predict, we efficiently move forward. What is ML/AI if not prediction machines? Granted, we human beings have had the ability to make fact-grounded predictions for a long time. Since the dawn of the Scientific Revolution in the early 17th century, we have learned how to gather information and from that information reason our way through to conclusions regarding cause and effect. And from that foundation, make predictions. Our ability to make predictions has grown linearly since those humble beginnings, when Galileo had the temerity to suggest the earth was not the center of the universe. But with ML/AI, our ability to predict is no longer a linear growth path -- it is entirely disruptive because, relative to previous eras and methods of prediction, it is orders of magnitude chapter today than it was just a few years ago. And with that lower cost threshold for reasonable predictions based on information, the extraordinary potential of the potential energy inherent in our data becomes clear: Prediction is at the heart of making decisions under uncertainty. All organizations are riddled with such decisions. Prediction tools increase productivity—operating machines, handling documents, communicating with stakeholders. Uncertainty constrains strategy. Better prediction creates opportunities for new business structures and strategies to improve and compete. Open Source Software Today open source software is available for anyone with a computer and Internet connection to download and use. Data itself is increasingly becoming an easy-to-acquire commodity. Notice we didn't use the word 'free' when describing these innovations. While there may be no- or low-cost options when it comes to applications and even data, there is nothing free about it. One way or another, there are costs involved, whether it is hiring and/or training skilled programmers, data engineers, machine learning specialists, or analysts, no matter what set of tools you decide to embrace, there will always be costs involved. Of course it makes sense to leverage open source software when appropriate, but if and only if it makes sense in the context of your organization. Are you prepared to take full responsibility for the software you leverage? Are you ready to hire and maintain a crew of programmers and managers, skilled in the nuances of the application in question? Or might you better served to focus on matters more core to your mission and strategy, rather than the distraction of tools that may or may not be pertinent a few years down the road? Many organizations have made the mistake of thinking that open source software, by itself, can solve their problems. Nothing could be further from the truth. We know. How? Because we leverage open source software ourselves! And we are well aware of the organizational commitment it takes to integrate, maintain, and extend such tools in the broader scheme of things for our own organization. There is no organization today that does not use open source software in some way. Whether they know it or not, somewhere in their vast array of digital servers and devices, there will be some set of open source software. The Internet would not exist were it not for open source. But that's not to say it is free. Your Data Your Model Your Innovation What is your mission? Why do you exist? The answers to those questions will inform the decisions you make and the predictions you value. Where does your competitive advantage reside in a world where technology is distributed world wide? It's not your technology unless you are an inventor of technology. Your people? Maybe. But people are not permanent assets. People are temporal. Your buildings? Your rolodex? None of those can or will provide a sustainable competitive advantage. But you do have something that will: your data. The data you own, that no one else in the world has access to without your permission is the key foundation upon which you will build an advantage over competitors that will never abate. From you data, models you own will be built. Predictions from those models that benefit you will be the result. And from there, the innovations you will inherit are actually unpredictable -- but the potential energy inherent in your data is far more than you can even imagine today. In this video of the Lucd Client tutorial series, we show you how to create and train your models in the Lucd client. NEXT CHAPTER: MODEL PROFILING
  18. The Lucd platform enables the user to perform model performance analysis. Easy to use tools allows the user to view and compare the data prior to and after the data has been trained against a selected model. The platform also enables tracking of the critical governance and explainability associated with the process.
  19. The Lucd platform provides necessary, easy-to-use tools for dataset query and data visualization. Query and visualization are important initial steps when preparing data for machine learning and Lucd makes this a snap.
  20. Lucd enables users to transform large-scale data and provides an easy-to-use method for creating a "virtual dataset" based on the data desired to be trained. This transformed data can then be trained against the required models.
  21. The Lucd JedAI Client is downloaded locally on your device and interfaces with the Lucd Platform. The client enables users to visualize, transform, and prepare data for use in modeling frameworks (TensorFlow, PyTorch, etc.). Models can be uploaded and trained in the platform, which is touchscreen capable (not required). System Requirements The following specifications are required in order to run the client. Windows, Linux or MacOS 4 GB Memory Modern CPU Recommended Configuration Although not required, we recommend the following specifications in order to maximize the performance of the client. A GPU to support accelerated rendering 1600x900 display resolution minimum Installation Instructions The client is distributed via Lucd's Steam Store. A user is required to obtain a Steam account in order to access the client download. Usage Instructions Login Log on to the client using the credentials provided to you. Username Password Domain Cloud customers will leave the domain field blank when logging in. Private build customers will be provided a domain to use when logging in. Login Exit Application Navigation After successful authentication, the user is brought to the Home screen. The buttons along the left edge navigate to other 2D screen overlays. The buttons in the right corner manipulate camera perspective and visualization behavior. Home Data Modeling Assets Governance Epidemiology Collapse Sidebar Options Logout Reset Perspective Home The Home screen has numerous features. The primary feature of the Home screen is the Sources histogram, displaying the ingested record sources, the number of records per source, as well as date/time information relating to the ingested records. List of currently visible ingested sources Source Histogram of ingestion timeline for each visible source The actual data of the records are not displayed in the histogram. When browsing the sources histogram, the Lucd JedAI Client makes it easy to drill down on a time range of ingestion across all sources, down to the hour. Click-and-Drag date filter To narrow the range of shown data, click and drag the date filters to expand that window of time. Source Toggles Sometimes, a source may have had so much data ingested at a single time that it skews the histogram display scaling. In these cases, another useful function of the Home screen is the ability to hide that specific source from the histogram display by clicking its axis label. At this point, the chart will automatically re-scale the remaining visible data in the histogram, allowing a better, proportional chart display. Selectable bars To expand a single unit of time on the graph (e.g. see a single year of data), click on a bar to zoom in across all sources and change the axis scale to that unit of time. The Lucd JedAI Client allows scaling down to the day, so that a 24-hour period can be seen across all sources on the histogram. Active filters Data & Visualization The Data and Visualization screen is where users will query, visualize, perform Exploratory Data Analysis (EDA) functions, and transform the dataset into a Virtual Data Set (VDS) to be used with the machine learning model. The screen will initially open with a blank panel on the right and the Query option selected on the left. Query Data Query tab To execute a query, begin by navigating to the Query tab. The Lucd JedAI Client provides four ways of querying data: Sources, Facets, Keywords / Dates, and Concepts. These can be combined to get a very specific result set. Data Sources To narrow which sources are queried, navigate to Sources and select the boxes of the desired sources. By default, none are selected, and so all sources will be queried. Facets filters To filter by data facet, navigate to Facets, select the drop-down of a data model, click “Add Filter” next to the desired facet. Selected facet filters will show up below the available facets list. To remove a facet filter, click its red X button. Multiple filters on the same facet are possible. Keyword/Dates filter To search by keyword/date-time range, navigate to the Keywords/Dates tab and enter values in the desired fields. Concepts filter To search by concepts, navigate to the Concepts tab and enter a keyword into the first input field. Optionally, specify a similarity threshold in the second input field. Acceptable values range from 0 to 1. A list of concepts will display below the input field. Select one to see similar concepts in a list below the threshold input. Lucene Query Lucene queries can be run directly from the client. Execute Search Once query parameters have been specified, click search to see a basic table of the resulting dataset. Reset Parameters Click this button to reset search parameters. *Sources *(2)** are OR'd together and Facets (3) are AND'd together in their own sub blocks before being combined inside a parent ‘must' or ‘must_not' block of the query with other parameters.* Visualizations Visualize The Visualize tab provides numerous ways to view your data. Options To load a visualization, select it from the list. Table To see a table structure of all fields in the query results, use Table. This displays all fields of each record. To see more in-depth detail about a record, select it from the table. Scatterplot To see a structured plot of the query results, use Scatterplot. This can display numerical and categorical data in an interactive plot. Drag the plot to rotate it, and scroll/pinch to zoom. If you are on a non-touchscreen enabled device use the mouse scroll wheel or the keyboard shortcut for zooming. Data Points The red orbs are the data points in space. Projection The blue squares exist on all six walls of the plot and represent a projection of all data points on a given 2D plane. Each point can be selected to see the x, y, and z values of that point. Dialog Box Click on the dialog box to close it. Zoom Select the zoom button to focus on that point and see more of its facets. Numerical Range Filter Numerical fields can have a range applied to them by dragging the filter handles. Recenter To re-center the plot, select Recenter. Reset Filters To remove categorical filters, select Reset. Rotate To toggle the graph rotation, select Rotate. Axis Feature The x, y, and z fields can be changed by changing the value from their respective drop downs. Categorical Filter Categorical data can be filtered by selecting Filter and toggling the desired values. Submit Click Submit to apply changes. Parallel Coordinate Plot To see trends in features across a result set, use a Parallel Coordinate Plot. Each line from end to end represents a record. Drag to rotate, and move along the length of the plot by holding shift and dragging or by dragging with two fingers. Field Each blue plane represents a field. Field minimum Field maximum Recenter To re-center the plot, click the Recenter button. Add Field To add an additional field, click the [+] button. Field Select Each field can be changed by its drop down. Display Order A Field can be moved up or down in its display order. Remove Field Histogram To see how data is distributed across values, use a Histogram. Add Histogram To add an additional field histogram, select the [+] button. At this time, only numerical fields will be automatically added. The collection of charts can be scrolled across by dragging. Field Select To change a chart's field, select it from the drop-down. Remove To remove the chart, select the [X] button. Filter Each chart can be filtered by dragging the yellow handles. These filters will be applied across all open charts. The new maximum and minimum will be displayed below. Box Plot To see the statistical distribution across values, use a Box Plot. Add Box Plot To add an additional field box plot, select the [+] button. At this time, only numerical fields will be automatically added. The collection of charts can be scrolled across by dragging. Field Select To change a chart's field, select it from the drop-down. Remove To remove the chart, select the red [X] button. 2D Scatterplot To see the values distributed and on an XY plane, use a 2D Scatterplot. Add Scatterplot To add an additional scatterplot, select the blue [+] button. The collection of charts can be scrolled across by dragging. Field Select To change a chart's field, select it from the drop-down. Remove To remove the chart, select the red [X] button. Pearson Correlation To see how each field relates to all the other fields, use a Correlation Matrix. Only numerical fields are displayed. The matrix can be rotated by dragging. Each bar is scaled on its y axis according to how its two contributing fields relate on a scale of –1 (red) to 1 (blue). Select a bar to see more information about it. Select it again to hide the details. Exploratory Data Analysis The Exploratory Data Analysis (EDA) tab is where data can be transformed and shaped before it is used to train a model. Once a query is run, its results can be shaped and filled using Exploratory Data Analysis, or EDA. Create Tree To begin EDA on the most recent search, select the floppy disk icon. Existing Trees Saved searches for EDA will appear in the scroll view. Tabletop Once a saved search has been selected, it will show up on the tabletop. Operations EDA operations that have been added to the saved search will show up as white nodes. Menu Clicking a node will bring up a menu available options for that node. Statistics To see overview statistics on a selected node, choose it from the dropdown. Save VDS¶ When saving a Virtual Dataset, complete the creation process by entering a name, description, and selecting features to include, as well as if the data should be persisted. New Op The Lucd JedAI Client provides flexible options for data transformation without having to leave the GUI. Operation Type When adding an operation to a saved search during EDA, choose between standard operations like Fill/Filter/Replace, NLP operations, Custom defined operations, and image specific operations. Operation Selection Select the desired operation from the dropdown. Operation Parameters Parameters must be specified before saving an operation. Preparing Text Data for Model Training Lucd provides special operations for easily preparing text data for model training, saving a model developer valuable time in manually coding routines for text transformation. After creating an EDA tree based on a query of a text data source, a developer can add a new operation to the tree based on NLP operations as shown above. NLP operations (e.g., stopword removal, whitespace removal, lemmatization) can be applied in any sequence. It's important to select the correct facet as the “text attribute.” One can also elect to apply tokenization based on a document level (i.e., create one sequence of tokens for the entire facet value per record), or sentence level (i.e., create a token sequence per sentence in the facet for a record). Saving VDS with Processed Text When a developer wants to create a new virtual dataset including the transformed text data, they must choose the “processed_text” facet as the “sole” feature of the virtual dataset as shown below. Currently, Lucd does not support text model training incorporating multiple feature columns, only the “processed_text” facet must be selected. Applying Custom Operations Once custom operations have been defined and uploaded using the Lucd Python Client library, they are available in the GUI for usage in data transformation. As shown above, clicking on a custom operation will show further details, specifically the features the operation uses as well as the actual source code defining the op. As mentioned in the documentation for defining custom operations via the Lucd Python Client, one must select how to apply the operation based one of the following three Dask dataframe approaches: apply map_partitions applymap Image Workflows The Lucd framework supports image-based workflows. Binary image data contained within fields of a record will automatically be rendered in the 3d client. The images below are from the Stanford Dogs dataset. Applying Image Operations To apply image operations, select the Image Ops tab within the New Op menu in an EDA tree. It's important to select an image facet as the “Feature.” The currently provided operations are as follows: Vertical and horizontal flips Grayscale Contrast normalization Normalize (0 mean and unit variance) Resize width & height Color inversion Crop borders Gaussian blur Rotate Min-max scaling To array (converts binary data to Numpy Array) Reshape dimensions * Operations can be applied to percentages of a dataset instead of the entirety, and can also be used to augment existing data instead of operating in-place. Modeling The Lucd JedAI Client provides an intuitive and practical dashboard for data science/machine learning modeling. View Select On the Modeling screen, review available model definitions by selecting that option from the dropdown. Model Upload Button to upload new python model files (Tensorflow, Pytorch, xgboost, etc.) Refresh Select to retrieve model statuses from the backend & refresh the GUI. Existing Model Definitions Model definitions are displayed in the center. Status Indicator Lights Each model will indicate if it has models in training, training complete, or errors. Filters Display only models matching filters selected (TensorFlow, XGBoost, Classification, etc.) Group/Sort Drop down boxes for defining defition grouping & sorting. Distribution Model library and type distribution can be seen at the bottom. Model Details The currently selected model's details can be seen on the right. Train To begin training the selected model, click “START TRAINING”. Performance See all training runs for a selected model by viewing the performance analysis. Start Training Training runs require the selection of a VDS and specification of parameters/assets. Asset To set up a training run, begin by selecting an Asset to include, if any. Virtual Dataset Choose an existing VDS to train against. Parameters Set the parameters for the training run. Trained Models Trained models can also be inspected within the dashboard. View Select To review a training run, first select “Trained Models” from the dropdown. Training Runs Select a training run and view its details. Current status of the run is designated by the colored corner of the list item. Training Artifact Files Download run artifacts Model Profile View See more metrics about the training run in real time Governance Submit for governance approval Stop/Restart Pause a model & restart it. Can also be used to begin a new run after a training run has completed. Model Profile: Performance Trained model performance can be viewed more closely in real-time. View Performance To view training performance, first select “Performance” tab Update Interval Number of seconds between plot updates. Number of points displayed is limited to 1000 to keep updates consistent. Default 10 Available Plots Selector Dropdown showing available plots to display Selected Plot Description Hover to view the user input description for the plot Plot Legend Color codes individual lines in a plot for easy recognition Plot Line Toggles Turn lines on or off. Adjusts axes as well Model Profile: Confusion Matrix Trained models can generate confusion matrices for analysis. View Confusion Matrix To view a confusion matrix, first select “Confusion Matrix” tab Interactable Square Click a square to see details about actual and predicted values. Values only displayed in square if greater than 0 Percentage Toggle Display prediction percentage across row's values Model Profile: Confusion Matrix Details Confusion matrix boxes, when clicked, show more details about the values. Predicted Value Click any orange histogram bar to see sample predictions Sample Predictions A table of sample predictions from a selected histogram bar Model Profile: Predict Models with type tabular_classification and regression can predict output based on JSON input View Predictions To view a predictions, first select “Predict” tab Enter JSON Input Enter values to predict on. Must be valid JSON Click Predict Run predict operation View results in JSON Ouput is formatted JSON Model Profile: Explain - Tabular/Regression Models with type tabular_classification or regression can explain tabular data predictions View Explainability To view a explainability, first select “Explainability” tab Enter JSON Input Enter values to predict on. Must be valid JSON, as shown above % of Training Data used to build explainer Percentage of training data to build the explainer. Must be greater than 0 and less than or equal to 1 Number of Top Explanations Positive integer denoting how many class explanations to show Inputs and corresponding features Colored to show how each influences top class prediction Class Probabilities Class predictions and corresponding likelihood Explanation How each input influences a positive or negative prediction Model Profile: Explain - Images Models with type image_classification can explain image predictions View Explainability To view a explainability, first select “Explainability” tab Sample Image Select local image to explain Positive Only If True, include only regions of the image contributing to the predicted label. Hide Rest If True, make the non-explanation part of the return image gray Class Probabilities Class predictions and corresponding likelihood Colorized Explained Image Returned image with shaded regions of positive and negative influence Model Profile: Explain - Text Models with type text_classification can explain text predictions View Explainability To view a explainability, first select “Explainability” tab Input Text Text the user would like to predict and explain Predicted Probabilities Class probabilities predicted Explanation Words that contribute to positive or negative correlation Output Text Output text with class probabilities highlighted in positive (orange) or negative (blue) colors Assets The Assets page provides a singular look at all existing user “Assets” (e.g. VDS, Embeddings). View Select To see available Virtual Datasets, select it from the dropdown. Usage Counters and indicator lights displaying training run usage of an Asset. Pre-Op Heatmap Heatmap before running the selected EDA operations. Post-Op Heatmap Heatmap after running the selected EDA operations. Operations EDA operations applied to the Asset. EDA Tree View The VDS can be viewed in the context of its parent saved search by clicking 3D. Embedding _Create an embedding from the given VDS (discussed below). Embeddings The Lucd JedAI Client provides the ability to easily generate word embedding Assets for use in modeling. View Select To see available Embeddings, select it from the dropdown. Download Embeddings can be downloaded locally. PCA/TSNE View PCA/TSNE charts for for the selected embedding. Restart _Restart the embedding training here. PCA/TSNE Embeddings can be viewed using PCA/TSNE techniques for visualization. Style When viewing an embedding's PCA/TSNE, click to see terms instead of points. Region Select Toggle to select a cluster of points using a bounding box. Multiple Select Use to add multiple bounding boxes. Word Search Search for a term. All matching terms will be highlighted, as well as shown in a list to the right until there is only one matching term. Filter Narrow the number of occurrences for a term to a range using. Technique Select _Toggle between PCA and TSNE. Governance The Governance view illustrates what data, data transformations, and assets (e.g., VDS, word embeddings) were used as inputs to training a given model. The value is that a user can quickly gain insights as to what data caused a model to yield certain performance results. The following figure shows an overview of the Governance view. The main panel in the middle illustrates, for a selected model, what data and assets were used for training the model. The top half of the view shows information about the data which was used to create a virtual dataset for training the model. Submitted Models The main panel on the left-hand side displays what models are available for viewing in the Governance view. The dropdown menus at the top allow the user to select from models based on their governance approval status (i.e., “pending approval,” “approved,” or “rejected”) as well as sort the models based on various criteria. Query This represents the query that was used to generate the initial dataset, whether for the purposes of model training data or word embedding generation. Clicking the query will show query details at the bottom of the view. Transformation This represents the transformations performed on the initial dataset to establish either a virtual dataset (as in the case with training a model) or a word embedding. These are the same transformations that were applied in the exploratory data analysis section of the tool. Clicking the transformation box will show details at the bottom of the view, such as shown in the figure below. Heatmap Visualization of selected attributes (or facets) of queried or transformed data for virtual datasets or word embeddings. Dropdown selectors underneath each visual enable a user to customize the visualization (“feature 1” selects data for the y-axis and “feature 2” selects data for the x-axis). The “metric” selector chooses what statistic of the selected data to use for defining the heatmaps. In the current release, only total “counts” are available. Clicking “fetch metrics” will populate the visualization. Comparing and visualizing data heatmaps (or distributions) before and after a set of transformations is helpful for governance purposes since it can reveal, for example, if data biases exist and what transformation operations might have introduced them. Embedding Details Illustrates the name of the asset result from word embedding generation. The bottom half of the view shows details about word embedding data for models which require embeddings for training. Trained Model This represents the trained model after all previous operational flows are complete. Metadata & Performance Statistics Information like start / end time, model type, assets used, and training parameters are displayed here. Submit Report Clicking the green button enables the user to submit a governance report, either approving or rejecting the model for usage. Explainability Analysis Lucd provides the ability to visualize “explanations” of a model's output given specific inputs. Generally, explanations take the form of computed attribute weights, indicating the significance that an attribute gave to a model's decision. This supports the ability to either debug a model or scrutinize the data fed to the model. This particular feature is supported by integration of the Lime framework. The figure below illustrates the explainability panel in the governance view. This panel is displayed when the user clicks the model element (6) in the Governance view. Currently, model explainability only works for text classification models. Support for tabular data and image data will be available soon. Input Text For analyzing a text classification model, the user enters sample text into the input text box and clicks the “explain text” button underneath the box. The time required to run explanation analysis is dependent on the amount of text entered and the complexity of the model. Probability Output This is a simple bar chart showing the probabilities of a given model's outputs. In the figure, the classes are “negative” and “positive”; however, more classes may be displayed depending on the model. The class labels are obtained from the labels returned by a user's model, as explained in documentation for the Lucd modeling framework. Features Output This illustrates the weights of the most significant features determined to affect the model's output. For instance, in referring to the figure, the tag "<UNKNOWN>" is highly indicative of a piece of text (in this case, a movie review) having a “negative” sentiment. The user is encouraged to try multiple examples to understand the explainability feature. Output Text The text on the right shows the major features (words) highlighted in the text. Note that the text shown is that processed by the transformation operations for the embedding creation (which the user specified when using NLP operations before creating the embedding set). This is so that the user understands what is done to the text before inputting it to a model, which might offer extra insight into the model's decision logic. Epidemiology Lucd provides the ability to visualize epidemics and supply chain breakdowns on a map. Trained models can predict future infection rates and supply shortages down to the census tract level. Train To start training an Epidemic model, click “Train Model…” Trained Models A list of previously trained models can be found here. Train Epidemiology Model This view appears after selecting “Train Model” in the previous view. Dataset To finalize training start, a dataset to train against must be selected. Parameters Enter any custom parameters for training. Confirm 3D Map View Selecting a trained Epidemiology model will display a 3D map view. Map View Census tracts, counties, and states can all be displayed. Details Information regarding a selected region on the map. Disease Statistic Selecting the disease statistic changes the value used when polygons are extruded. Civilian Features Selecting civilian features displays bar chart value on each census tract. Search The map can be searched to snap to a specific location. Style The map style can be changed via drop-down menu. Extent Configuration for the extent of the map. Terrain Toggle switch for map terrain. Save Settings The current zoom level and location can be saved to the model object to reload later. Polygon Extrude Polygon extruding can be toggled to make the underlying map easier to read.
  22. New Features This release contains improvements to model profile features. Additional tooltips Model Profile improvements, including: Tabular Data Predict allows multiple JSON records of input to be predicted and output Tabular/Regression Explain provides further explanation of a tabular data prediction Image Explain allows users to upload images and generate explanations of predictions Text Explain allows users to send text input and receive an explanation of predictions
  23. New Features This release contains improvements to model profile features. Additional tooltips Model Profile improvements, including: Tabular Data Predict allows multiple JSON records of input to be predicted and output Tabular/Regression Explain provides further explanation of a tabular data prediction Image Explain allows users to upload images and generate explanations of predictions Text Explain allows users to send text input and receive an explanation of predictions View full record
  24. The Lucd JedAI Client is downloaded locally on your device and interfaces with the Lucd Platform. The client enables users to visualize, transform, and prepare data for use in modeling frameworks (TensorFlow, PyTorch, etc.). Models can be uploaded and trained in the platform, which is touchscreen capable (not required). System Requirements The following specifications are required in order to run the client. Windows, Linux or MacOS 4 GB Memory Modern CPU Recommended Configuration Although not required, we recommend the following specifications in order to maximize the performance of the client. A GPU to support accelerated rendering 1600x900 display resolution minimum Installation Instructions The client is distributed via Lucd's Steam Store. A user is required to obtain a Steam account in order to access the client download. Usage Instructions Login Log on to the client using the credentials provided to you. Username Password Domain Cloud customers will leave the domain field blank when logging in. Private build customers will be provided a domain to use when logging in. Login Exit Application Navigation After successful authentication, the user is brought to the Home screen. The buttons along the left edge navigate to other 2D screen overlays. The buttons in the right corner manipulate camera perspective and visualization behavior. Home Data Modeling Assets Governance Epidemiology Collapse Sidebar Options Logout Reset Perspective Home The Home screen has numerous features. The primary feature of the Home screen is the Sources histogram, displaying the ingested record sources, the number of records per source, as well as date/time information relating to the ingested records. List of currently visible ingested sources Source Histogram of ingestion timeline for each visible source The actual data of the records are not displayed in the histogram. When browsing the sources histogram, the Lucd JedAI Client makes it easy to drill down on a time range of ingestion across all sources, down to the hour. Click-and-Drag date filter To narrow the range of shown data, click and drag the date filters to expand that window of time. Source Toggles Sometimes, a source may have had so much data ingested at a single time that it skews the histogram display scaling. In these cases, another useful function of the Home screen is the ability to hide that specific source from the histogram display by clicking its axis label. At this point, the chart will automatically re-scale the remaining visible data in the histogram, allowing a better, proportional chart display. Selectable bars To expand a single unit of time on the graph (e.g. see a single year of data), click on a bar to zoom in across all sources and change the axis scale to that unit of time. The Lucd JedAI Client allows scaling down to the day, so that a 24-hour period can be seen across all sources on the histogram. Active filters Data & Visualization The Data and Visualization screen is where users will query, visualize, perform Exploratory Data Analysis (EDA) functions, and transform the dataset into a Virtual Data Set (VDS) to be used with the machine learning model. The screen will initially open with a blank panel on the right and the Query option selected on the left. Query Data Query tab To execute a query, begin by navigating to the Query tab. The Lucd JedAI Client provides four ways of querying data: Sources, Facets, Keywords / Dates, and Concepts. These can be combined to get a very specific result set. Data Sources To narrow which sources are queried, navigate to Sources and select the boxes of the desired sources. By default, none are selected, and so all sources will be queried. Facets filters To filter by data facet, navigate to Facets, select the drop-down of a data model, click “Add Filter” next to the desired facet. Selected facet filters will show up below the available facets list. To remove a facet filter, click its red X button. Multiple filters on the same facet are possible. Keyword/Dates filter To search by keyword/date-time range, navigate to the Keywords/Dates tab and enter values in the desired fields. Concepts filter To search by concepts, navigate to the Concepts tab and enter a keyword into the first input field. Optionally, specify a similarity threshold in the second input field. Acceptable values range from 0 to 1. A list of concepts will display below the input field. Select one to see similar concepts in a list below the threshold input. Lucene Query Lucene queries can be run directly from the client. Execute Search Once query parameters have been specified, click search to see a basic table of the resulting dataset. Reset Parameters Click this button to reset search parameters. *Sources *(2)** are OR'd together and Facets (3) are AND'd together in their own sub blocks before being combined inside a parent ‘must' or ‘must_not' block of the query with other parameters.* Visualizations Visualize The Visualize tab provides numerous ways to view your data. Options To load a visualization, select it from the list. Table To see a table structure of all fields in the query results, use Table. This displays all fields of each record. To see more in-depth detail about a record, select it from the table. Scatterplot To see a structured plot of the query results, use Scatterplot. This can display numerical and categorical data in an interactive plot. Drag the plot to rotate it, and scroll/pinch to zoom. If you are on a non-touchscreen enabled device use the mouse scroll wheel or the keyboard shortcut for zooming. Data Points The red orbs are the data points in space. Projection The blue squares exist on all six walls of the plot and represent a projection of all data points on a given 2D plane. Each point can be selected to see the x, y, and z values of that point. Dialog Box Click on the dialog box to close it. Zoom Select the zoom button to focus on that point and see more of its facets. Numerical Range Filter Numerical fields can have a range applied to them by dragging the filter handles. Recenter To re-center the plot, select Recenter. Reset Filters To remove categorical filters, select Reset. Rotate To toggle the graph rotation, select Rotate. Axis Feature The x, y, and z fields can be changed by changing the value from their respective drop downs. Categorical Filter Categorical data can be filtered by selecting Filter and toggling the desired values. Submit Click Submit to apply changes. Parallel Coordinate Plot To see trends in features across a result set, use a Parallel Coordinate Plot. Each line from end to end represents a record. Drag to rotate, and move along the length of the plot by holding shift and dragging or by dragging with two fingers. Field Each blue plane represents a field. Field minimum Field maximum Recenter To re-center the plot, click the Recenter button. Add Field To add an additional field, click the [+] button. Field Select Each field can be changed by its drop down. Display Order A Field can be moved up or down in its display order. Remove Field Histogram To see how data is distributed across values, use a Histogram. Add Histogram To add an additional field histogram, select the [+] button. At this time, only numerical fields will be automatically added. The collection of charts can be scrolled across by dragging. Field Select To change a chart's field, select it from the drop-down. Remove To remove the chart, select the [X] button. Filter Each chart can be filtered by dragging the yellow handles. These filters will be applied across all open charts. The new maximum and minimum will be displayed below. Box Plot To see the statistical distribution across values, use a Box Plot. Add Box Plot To add an additional field box plot, select the [+] button. At this time, only numerical fields will be automatically added. The collection of charts can be scrolled across by dragging. Field Select To change a chart's field, select it from the drop-down. Remove To remove the chart, select the red [X] button. 2D Scatterplot To see the values distributed and on an XY plane, use a 2D Scatterplot. Add Scatterplot To add an additional scatterplot, select the blue [+] button. The collection of charts can be scrolled across by dragging. Field Select To change a chart's field, select it from the drop-down. Remove To remove the chart, select the red [X] button. Pearson Correlation To see how each field relates to all the other fields, use a Correlation Matrix. Only numerical fields are displayed. The matrix can be rotated by dragging. Each bar is scaled on its y axis according to how its two contributing fields relate on a scale of –1 (red) to 1 (blue). Select a bar to see more information about it. Select it again to hide the details. Exploratory Data Analysis The Exploratory Data Analysis (EDA) tab is where data can be transformed and shaped before it is used to train a model. Once a query is run, its results can be shaped and filled using Exploratory Data Analysis, or EDA. Create Tree To begin EDA on the most recent search, select the floppy disk icon. Existing Trees Saved searches for EDA will appear in the scroll view. Tabletop Once a saved search has been selected, it will show up on the tabletop. Operations EDA operations that have been added to the saved search will show up as white nodes. Menu Clicking a node will bring up a menu available options for that node. Statistics To see overview statistics on a selected node, choose it from the dropdown. Save VDS¶ When saving a Virtual Dataset, complete the creation process by entering a name, description, and selecting features to include, as well as if the data should be persisted. New Op The Lucd JedAI Client provides flexible options for data transformation without having to leave the GUI. Operation Type When adding an operation to a saved search during EDA, choose between standard operations like Fill/Filter/Replace, NLP operations, Custom defined operations, and image specific operations. Operation Selection Select the desired operation from the dropdown. Operation Parameters Parameters must be specified before saving an operation. Preparing Text Data for Model Training Lucd provides special operations for easily preparing text data for model training, saving a model developer valuable time in manually coding routines for text transformation. After creating an EDA tree based on a query of a text data source, a developer can add a new operation to the tree based on NLP operations as shown above. NLP operations (e.g., stopword removal, whitespace removal, lemmatization) can be applied in any sequence. It's important to select the correct facet as the “text attribute.” One can also elect to apply tokenization based on a document level (i.e., create one sequence of tokens for the entire facet value per record), or sentence level (i.e., create a token sequence per sentence in the facet for a record). Saving VDS with Processed Text When a developer wants to create a new virtual dataset including the transformed text data, they must choose the “processed_text” facet as the “sole” feature of the virtual dataset as shown below. Currently, Lucd does not support text model training incorporating multiple feature columns, only the “processed_text” facet must be selected. Applying Custom Operations Once custom operations have been defined and uploaded using the Lucd Python Client library, they are available in the GUI for usage in data transformation. As shown above, clicking on a custom operation will show further details, specifically the features the operation uses as well as the actual source code defining the op. As mentioned in the documentation for defining custom operations via the Lucd Python Client, one must select how to apply the operation based one of the following three Dask dataframe approaches: apply map_partitions applymap Image Workflows The Lucd framework supports image-based workflows. Binary image data contained within fields of a record will automatically be rendered in the 3d client. The images below are from the Stanford Dogs dataset. Applying Image Operations To apply image operations, select the Image Ops tab within the New Op menu in an EDA tree. It's important to select an image facet as the “Feature.” The currently provided operations are as follows: Vertical and horizontal flips Grayscale Contrast normalization Normalize (0 mean and unit variance) Resize width & height Color inversion Crop borders Gaussian blur Rotate Min-max scaling To array (converts binary data to Numpy Array) Reshape dimensions * Operations can be applied to percentages of a dataset instead of the entirety, and can also be used to augment existing data instead of operating in-place. Modeling The Lucd JedAI Client provides an intuitive and practical dashboard for data science/machine learning modeling. View Select On the Modeling screen, review available model definitions by selecting that option from the dropdown. Model Upload Button to upload new python model files (Tensorflow, Pytorch, xgboost, etc.) Refresh Select to retrieve model statuses from the backend & refresh the GUI. Existing Model Definitions Model definitions are displayed in the center. Status Indicator Lights Each model will indicate if it has models in training, training complete, or errors. Filters Display only models matching filters selected (TensorFlow, XGBoost, Classification, etc.) Group/Sort Drop down boxes for defining defition grouping & sorting. Distribution Model library and type distribution can be seen at the bottom. Model Details The currently selected model's details can be seen on the right. Train To begin training the selected model, click “START TRAINING”. Performance See all training runs for a selected model by viewing the performance analysis. Start Training Training runs require the selection of a VDS and specification of parameters/assets. Asset To set up a training run, begin by selecting an Asset to include, if any. Virtual Dataset Choose an existing VDS to train against. Parameters Set the parameters for the training run. Trained Models Trained models can also be inspected within the dashboard. View Select To review a training run, first select “Trained Models” from the dropdown. Training Runs Select a training run and view its details. Current status of the run is designated by the colored corner of the list item. Training Artifact Files Download run artifacts Model Profile View See more metrics about the training run in real time Governance Submit for governance approval Stop/Restart Pause a model & restart it. Can also be used to begin a new run after a training run has completed. Model Profile: Performance Trained model performance can be viewed more closely in real-time. View Performance To view training performance, first select “Performance” tab Update Interval Number of seconds between plot updates. Number of points displayed is limited to 1000 to keep updates consistent. Default 10 Available Plots Selector Dropdown showing available plots to display Selected Plot Description Hover to view the user input description for the plot Plot Legend Color codes individual lines in a plot for easy recognition Plot Line Toggles Turn lines on or off. Adjusts axes as well Model Profile: Confusion Matrix Trained models can generate confusion matrices for analysis. View Confusion Matrix To view a confusion matrix, first select “Confusion Matrix” tab Interactable Square Click a square to see details about actual and predicted values. Values only displayed in square if greater than 0 Percentage Toggle Display prediction percentage across row's values Model Profile: Confusion Matrix Details Confusion matrix boxes, when clicked, show more details about the values. Predicted Value Click any orange histogram bar to see sample predictions Sample Predictions A table of sample predictions from a selected histogram bar Model Profile: Predict Models with type tabular_classification and regression can predict output based on JSON input View Predictions To view a predictions, first select “Predict” tab Enter JSON Input Enter values to predict on. Must be valid JSON Click Predict Run predict operation View results in JSON Ouput is formatted JSON Model Profile: Explain - Tabular/Regression Models with type tabular_classification or regression can explain tabular data predictions View Explainability To view a explainability, first select “Explainability” tab Enter JSON Input Enter values to predict on. Must be valid JSON, as shown above % of Training Data used to build explainer Percentage of training data to build the explainer. Must be greater than 0 and less than or equal to 1 Number of Top Explanations Positive integer denoting how many class explanations to show Inputs and corresponding features Colored to show how each influences top class prediction Class Probabilities Class predictions and corresponding likelihood Explanation How each input influences a positive or negative prediction Model Profile: Explain - Images Models with type image_classification can explain image predictions View Explainability To view a explainability, first select “Explainability” tab Sample Image Select local image to explain Positive Only If True, include only regions of the image contributing to the predicted label. Hide Rest If True, make the non-explanation part of the return image gray Class Probabilities Class predictions and corresponding likelihood Colorized Explained Image Returned image with shaded regions of positive and negative influence Model Profile: Explain - Text Models with type text_classification can explain text predictions View Explainability To view a explainability, first select “Explainability” tab Input Text Text the user would like to predict and explain Predicted Probabilities Class probabilities predicted Explanation Words that contribute to positive or negative correlation Output Text Output text with class probabilities highlighted in positive (orange) or negative (blue) colors Assets The Assets page provides a singular look at all existing user “Assets” (e.g. VDS, Embeddings). View Select To see available Virtual Datasets, select it from the dropdown. Usage Counters and indicator lights displaying training run usage of an Asset. Pre-Op Heatmap Heatmap before running the selected EDA operations. Post-Op Heatmap Heatmap after running the selected EDA operations. Operations EDA operations applied to the Asset. EDA Tree View The VDS can be viewed in the context of its parent saved search by clicking 3D. Embedding _Create an embedding from the given VDS (discussed below). Embeddings The Lucd JedAI Client provides the ability to easily generate word embedding Assets for use in modeling. View Select To see available Embeddings, select it from the dropdown. Download Embeddings can be downloaded locally. PCA/TSNE View PCA/TSNE charts for for the selected embedding. Restart _Restart the embedding training here. PCA/TSNE Embeddings can be viewed using PCA/TSNE techniques for visualization. Style When viewing an embedding's PCA/TSNE, click to see terms instead of points. Region Select Toggle to select a cluster of points using a bounding box. Multiple Select Use to add multiple bounding boxes. Word Search Search for a term. All matching terms will be highlighted, as well as shown in a list to the right until there is only one matching term. Filter Narrow the number of occurrences for a term to a range using. Technique Select _Toggle between PCA and TSNE. Governance The Governance view illustrates what data, data transformations, and assets (e.g., VDS, word embeddings) were used as inputs to training a given model. The value is that a user can quickly gain insights as to what data caused a model to yield certain performance results. The following figure shows an overview of the Governance view. The main panel in the middle illustrates, for a selected model, what data and assets were used for training the model. The top half of the view shows information about the data which was used to create a virtual dataset for training the model. Submitted Models The main panel on the left-hand side displays what models are available for viewing in the Governance view. The dropdown menus at the top allow the user to select from models based on their governance approval status (i.e., “pending approval,” “approved,” or “rejected”) as well as sort the models based on various criteria. Query This represents the query that was used to generate the initial dataset, whether for the purposes of model training data or word embedding generation. Clicking the query will show query details at the bottom of the view. Transformation This represents the transformations performed on the initial dataset to establish either a virtual dataset (as in the case with training a model) or a word embedding. These are the same transformations that were applied in the exploratory data analysis section of the tool. Clicking the transformation box will show details at the bottom of the view, such as shown in the figure below. Heatmap Visualization of selected attributes (or facets) of queried or transformed data for virtual datasets or word embeddings. Dropdown selectors underneath each visual enable a user to customize the visualization (“feature 1” selects data for the y-axis and “feature 2” selects data for the x-axis). The “metric” selector chooses what statistic of the selected data to use for defining the heatmaps. In the current release, only total “counts” are available. Clicking “fetch metrics” will populate the visualization. Comparing and visualizing data heatmaps (or distributions) before and after a set of transformations is helpful for governance purposes since it can reveal, for example, if data biases exist and what transformation operations might have introduced them. Embedding Details Illustrates the name of the asset result from word embedding generation. The bottom half of the view shows details about word embedding data for models which require embeddings for training. Trained Model This represents the trained model after all previous operational flows are complete. Metadata & Performance Statistics Information like start / end time, model type, assets used, and training parameters are displayed here. Submit Report Clicking the green button enables the user to submit a governance report, either approving or rejecting the model for usage. Explainability Analysis Lucd provides the ability to visualize “explanations” of a model's output given specific inputs. Generally, explanations take the form of computed attribute weights, indicating the significance that an attribute gave to a model's decision. This supports the ability to either debug a model or scrutinize the data fed to the model. This particular feature is supported by integration of the Lime framework. The figure below illustrates the explainability panel in the governance view. This panel is displayed when the user clicks the model element (6) in the Governance view. Currently, model explainability only works for text classification models. Support for tabular data and image data will be available soon. Input Text For analyzing a text classification model, the user enters sample text into the input text box and clicks the “explain text” button underneath the box. The time required to run explanation analysis is dependent on the amount of text entered and the complexity of the model. Probability Output This is a simple bar chart showing the probabilities of a given model's outputs. In the figure, the classes are “negative” and “positive”; however, more classes may be displayed depending on the model. The class labels are obtained from the labels returned by a user's model, as explained in documentation for the Lucd modeling framework. Features Output This illustrates the weights of the most significant features determined to affect the model's output. For instance, in referring to the figure, the tag "<UNKNOWN>" is highly indicative of a piece of text (in this case, a movie review) having a “negative” sentiment. The user is encouraged to try multiple examples to understand the explainability feature. Output Text The text on the right shows the major features (words) highlighted in the text. Note that the text shown is that processed by the transformation operations for the embedding creation (which the user specified when using NLP operations before creating the embedding set). This is so that the user understands what is done to the text before inputting it to a model, which might offer extra insight into the model's decision logic. Epidemiology Lucd provides the ability to visualize epidemics and supply chain breakdowns on a map. Trained models can predict future infection rates and supply shortages down to the census tract level. Train To start training an Epidemic model, click “Train Model…” Trained Models A list of previously trained models can be found here. Train Epidemiology Model This view appears after selecting “Train Model” in the previous view. Dataset To finalize training start, a dataset to train against must be selected. Parameters Enter any custom parameters for training. Confirm 3D Map View Selecting a trained Epidemiology model will display a 3D map view. Map View Census tracts, counties, and states can all be displayed. Details Information regarding a selected region on the map. Disease Statistic Selecting the disease statistic changes the value used when polygons are extruded. Civilian Features Selecting civilian features displays bar chart value on each census tract. Search The map can be searched to snap to a specific location. Style The map style can be changed via drop-down menu. Extent Configuration for the extent of the map. Terrain Toggle switch for map terrain. Save Settings The current zoom level and location can be saved to the model object to reload later. Polygon Extrude Polygon extruding can be toggled to make the underlying map easier to read. View full record
  25. This video introduces you to our Lucd client software. This is a very basic overview of the primary features along with some scenarios where our software would be used. View full record

HELP & SUPPORT

ABOUT US

Lucd is an AI software platform company that supports multiple industry verticals, allowing for its users to build enterprise-ready AI solutions with Low Code / No Code development practices. Lucd supports the entire AI lifecycle, allowing for the secure fusing of structured and unstructured data, empowering data analysts as well as business professionals to work collaboratively, resulting in reduced time to uncover new opportunities and solutions.

×
×
  • Create New...