Jump to content

Search the Community

Showing results for tags 'lucd client'.

  • Search By Tags

    Type tags separated by commas.
  • Search By Author

Content Type


Forums

  • Lucd AI Platform Suite
    • JedAI Client
    • Python Client
    • Modeling Framework
    • General
  • Collaborate & Discuss
    • Questions & Answers
    • Data Management
    • Machine Learning
    • AI Solutions
    • Community Feedback & Requests
  • Health & Life Sciences's Discussion
  • Financial Services's Discussion
  • Retail & Consumer Packaged Goods's Discussion
  • Media & Entertainment's Discussion
  • Energy's Discussion
  • Manufacturing's Topics

Blogs

  • Lucd Team Blog
  • UX Club (Beta version only)'s Blog

Calendars

  • Community Calendar
  • Health & Life Sciences's Events
  • Financial Services's Events
  • Retail & Consumer Packaged Goods's Events
  • Media & Entertainment's Events
  • Energy's Events
  • Manufacturing's Events

Categories

  • Datasheets and White Papers
  • Press Releases
  • Industry Focus
    • Energy
    • Financial Services
    • Health and Life Services
    • Media and Entertainment
    • Manufacturing
    • Retail

Categories

  • JedAI Client
  • Python Client
  • Modeling Framework
  • General

Find results in...

Find results that contain...


Date Created

  • Start

    End


Last Updated

  • Start

    End


Filter by number of...

Joined

  • Start

    End


Group


About Me


Interests


Industry

Found 9 results

  1. User Guide The Lucd JedAI Client is downloaded locally on your device and interfaces with the Lucd Platform. The client enables users to visualize, transform, and prepare data for use in modeling frameworks (TensorFlow, PyTorch, Scikit-learn, etc.). Models can be uploaded and trained in the platform, which is touchscreen capable (not required). System Requirements The following specifications are required in order to run the client. Windows or MacOS 4 GB Memory Modern CPU Recommended Configuration Although not required, we recommend the following specifications in order to maximize the performance of the client. A GPU to support accelerated rendering 1600x900 display resolution minimum Installation Instructions The client is distributed via Lucd’s Steam Store. A user is required to obtain a Steam account in order to access the client download. Usage Instructions Login Log in to the client using the credentials provided to you. Username Password Domain Cloud customers will leave the domain field blank when logging in. Private build customers will be provided a domain to use when logging in. Login - Click to submit login credentials and enter the application. New User - If this is your first time using Lucd, click here to register as a new user. Register a new user Generate password - Have Lucd suggest a password that meets the password requirements. Password requirements - Hover to view the Lucd password requirements Cannot reuse ANY old password. 2 instances of all character classes. Uppercase Lowercase Number Special: !@#$%^&*() No more than 2 characters from a class consecutive (123 is invalid). No repeating characters (33 is invalid). Register - Click to submit your details and return to the login screen. After registering a new user, you may immediately login with that username. Projects Immediately after login, the Projects view is displayed. A project is a handy way to group artifacts based on data science problem. Available Projects – a list of all projects the logged in user has access to open. Global Status – the status of all artifacts on the currently logged in system Federated Status – the number of online/offline artifacts of the given type Open Unallocated – begin using Lucd without an open project. Any artifacts created here will be saved as unallocated and still accessible from this button. Search – search for a project by name or description Grid/List view – change how the set of projects is displayed Project – select this item to see more details about the project Project Details – View the name, description and artifact counts for the selected project Open Project – Open the currently selected project Hovering over a project item will display the ‘Options’ menu. Edit Details – change the name and description of the hovered project Change Cover – select a meaningful cover photo (optional) Show Details – View project details on right side of screen Delete – Delete the project. Any artifacts allocated to that project will be moved to the ‘Unallocated’ space and can still be used. Navigation Upon login, a user will see some form of this menu bar at the top of the screen, depending on which view is currently open. Projects – close the currently open project and return to the ‘Projects’ view Data – Click to go to the Workflow space. Hover and select from Sources – view the sources visualization Query Builder – go to the query building tool Virtual Datasets – view a list of available virtual datasets for training Assets – Click to view a list of available embedding for training Modeling – Click to go to the ‘Modeling’ view to easily start a training run. Hover and select from Models – view a list of available models for training Federation – Click to view a list of currently connected federates and artifacts associated with each Project name – the name of the currently open project Federate – Hover to view the currently connected domain Username – the currently logged in user Minimize – Hide any open dockable panels to see the main view. Click again to unhide. View Log – Click to view a list of status messages on the Lucd system Settings – Click to edit various user and system settings Sources The Lucd client can show the user all available sources across the federation, as well as a data ingestion over time visualization. Sources table – list of all sources in federation Federate indicator – hover over to see which federates contain the source. If the indicator is missing, then the source only exists on the logged in domain. Refresh – refresh the displayed data Ingestion over time viz – color represents relative number of records ingested during a given time period for a source. Click a box to zoom into that time period across all sources. Back – Go up a time period (ex: Month to Year) Data After selecting a project, a user is taken to the Data Transform view, where queries can have EDA operations added to them and then built into a Virtual Dataset (VDS). Left Sidebar Saved Workflows – Each row represents a query that has been saved and is eligible to have EDA operations performed on it. Can be dragged and reordered on the ‘Active Workflows’ space. Federate – hover over to see the federates of all VDS’ contained in the saved workflow. If it is orange, then at least one of the VDS has an issue on a federate. If this icon is not visible, then all VDS’ on that workflow only exist on the logged in domain VDS – The number of VDS created from the given workflow Workflow name – This will appear red if any operations within the workflow have returned an error. The error will go away once the operation has returned successfully. Quick add – click to add the workflow to the 3D visualize space Begin a new query – Click to build a query inside query builder so that it can be saved to the Transform space Available operations – Click to begin adding an operation to a selected 3D node Workflow 3D Space Zoom – click and drag to zoom in and out of 3D space Arrow to node – click to move selection to a different node. Can also use arrow keys. Active Workflows – an ordered list of workflows currently displayed in 3D space. Can be dragged into a new order or clicked on to zoom to the root node of the selected workflow. Selected node – Select any node to see additional options Child node – children are displayed to the right of a parent node with lines connecting it. Query node Remove – remove the selected query and accompanying workflow from the 3D space Delete – delete the selected query and accompanying workflow. This cannot be undone. Edit – Reloads the query parameters into the query builder so it can modified and saved as a new query Preview Data – Execute the query and visualize the results Create VDS – Begins the process for creating a Virtual Dataset used for training Operation nodes Operation name Operation type Delete – delete the selected operation and all downstream operations in workflow. This cannot be undone. Will not execute if there is a VDS downstream. Preview Data – Execute the query all operations including this one and visualize the results. Create VDS – Begins the process for creating a Virtual Dataset used for training Virtual Dataset node VDS Name Delete – delete the selected VDS. This cannot be undone. Preview Data – Execute the query all operations leading to this VDS and visualize the results. Create Embedding - Only available with text models. Merge VDS - Click and drag to another Virtual Dataset to merge them together. Start training – Open the Modeling view to train with this VDS Federate – hover over to see the federates holding this VDS. If it is orange, then at least one of the federates is returning an error with the VDS. Navigating transforms with arrow keys Once a node is selected in the data transform space, you may use the arrow keys to navigate quickly between adjacent nodes. Collapsing nodes with double click Double clicking a node will collapse all children downstream from that node and add a superscript next to it, indicating how many nodes were collapsed. This can be useful in large, spread out trees. Rearranging active workflows Transform workflows in the active list can be rearranged in any order. This can be useful for comparing trees, or to bring two Virtual Datasets closer together to perform a merge. Preparing Text Data for Model Training Lucd provides special operations for easily preparing text data for model training, saving a model developer valuable time in manually coding routines for text transformation. After creating an EDA tree based on a query of a text data source, a developer can add a new operation to the tree based on NLP operations as shown above. NLP operations (e.g., stopword removal, whitespace removal, lemmatization) can be applied in any sequence. It’s important to select the correct facet as the “text attribute.” One can also elect to apply tokenization based on a document level (i.e., create one sequence of tokens for the entire facet value per record), or sentence level (i.e., create a token sequence per sentence in the facet for a record). Saving VDS with Processed Text When a developer wants to create a new virtual dataset including the transformed text data, they must choose the “processed_text” facet as the “sole” feature of the virtual dataset as shown below. Currently, Lucd does not support text model training incorporating multiple feature columns, only the “processed_text” facet must be selected. Applying Custom Operations Once custom operations have been defined and uploaded using the Lucd Python Client library, they are available in the GUI for usage in data transformation. As shown above, clicking on a custom operation will show further details, specifically the features the operation uses as well as the actual source code defining the op. As mentioned in the documentation for defining custom operations via the Lucd Python Client, one must select how to apply the operation based one of the following three Dask dataframe approaches: apply map_partitions applymap apply_direct - apply custom function directly on a dask dataframe Applying Image Operations To apply image operations, select the Image Ops tab within the New Op menu in an EDA tree. It’s important to select an image facet as the “Feature.” The currently provided operations are as follows: Vertical and horizontal flips Grayscale Contrast normalization Normalize (0 mean and unit variance) Resize width & height Color inversion Crop borders Gaussian blur Rotate Min-max scaling To array (converts binary data to Numpy Array) Reshape dimensions Query Builder The Lucd client offers a unique and intuitive way to query data, giving a user flexibility in how complex queries are strung together to retrieve exact results. Left Sidebar Sources – a list of available sources to query. This can be dragged into the node editor window. Quick add – click to add this source to the node editor window Federate status – Hover to see which federates that hold the source. If this icon does not show, then the source only exists on the currently logged in domain. Data Models – a list of available data models to query. This can be dragged into the node editor window. Quick add – click to add this data model to the node editor window View stats – click to view statistics of this particular data model View features – click to view the features of this particular data model Features – a list of features in this data model. This can be dragged into the node editor window Quick add – click to add this feature to the node editor window Federates – a list of available federates for filtering the query. Note: the currently logged in domain will ALWAYS return results regardless if it is selected. Node Editor Window Global search parameters – Click to view simple/advanced search filters Zoom – drag this slider or use the mouse wheel to zoom in and out of the node view Lucene syntax – a text representation of the search to be executed. Copy Lucene syntax – click to copy the Lucene syntax. This can be pasted into the global search parameters to customize a search with features not supported by the node editor. Search – Click to execute the search Save – Save the search for use in Transform workflow. Note that a search must be execute before it is saved. Group – Toggle, then click and drag around a set of nodes to add a grouping around them. This acts as a set of parentheses in the Lucene syntax. This function can also be accomplished by holding Shift + Left click + drag Refresh – Click to retrieve and repopulate the list of sources/data models/federates. Exit – Close the query builder. Any unsaved progress will be lost. Modify Node – Change node filter settings Delete Node Node connection dropdown – Click to select from AND/OR/XOR Node connector – click and drag to connect to another node or grouping Statistics – click to view statistics of last executed query Advanced Search Parameters All these words – search results must include all these words Lucene query – add a Lucene query that will take the place of whatever is in the Node Editor Window This exact phrase – search results must include this exact phrase None of these words – search results must not have any of these words Records per source/model - return this many records per source/model Total records to return - return at least this many total records Date range – search results must be from within this time period Randomize – results should be returned in a random order All Sources/Models - results should include a sample from every applicable source and data model Search Results Visualization panel – this will update with each search executed Federate distribution – a bar chart showing how many records were returned from each applicable federate Query statistics – each returned feature will show relevant statistics, and if applicable, a box plot to visualize. Adding a node and changing connection logic Nodes can be dragged into the workspace, or quickly added using the ‘+’ button on the left. The dropdown connecting two nodes or groups can be changed to AND/OR/XOR Grouping nodes Nodes can be grouped together using the ‘Group’ toggle at the top or by holding shift and dragging. Groupings will add parentheses around the selected node in the Lucene output. Manually connecting nodes Nodes can be manually connected and disconnected by clicking and dragging either of the two circles on the side of a node/group. Visualization General Query name – the name that was saved with the query Record count – number of records returned out of number total records across system that fit query Visualization selector – click each to change the visualization Quick Add – click to add another visualization window of the same data slice Maximize - click to expand the panel to full screen Table Feature/column names Histogram – lightweight visualization of numeric field distribution Top/unique value – for string types only Table row – click to see list of feature values Paging controls – Go forward or backward in results Scatterplot - 2D Axis selector – Select the axes from a list of available features Filter knobs – Drag these knobs to adjust axis filter. Drag away from plot to reset the axis Remove plot – Removes the plot from view Add new plot – Adds a new plot to view Box Plot Feature selector – Select the feature from a list of available features Remove plot – Removes the plot from view Add new plot – Adds a new plot to view Histogram Feature selector – Select the feature from a list of available features Filter knobs – Drag these knobs to adjust axis filter. Drag away from plot to reset the axis Remove plot – Removes the plot from view Add new plot – Adds a new plot to view Scatterplot - 3D Axis selector – Select the axes from a list of available features Filter knobs – Drag these knobs to adjust axis filter. Drag away from plot to reset the axis Scatterplot point – Select to view details. Double click to focus in on that point Reset view – Click to move camera back to starting view Drag – Orbit around focal point Mouse wheel – Zoom in/out Shift + Drag – Pan camera Ctrl + Drag – Look around Parallel Coordinate Plot Feature selector – Select the feature from a list of available features Add Feature – Click to add an additional feature to visualization Remove Feature – Click to remove a feature from visualization Reorder Features – Click and drag to reorder feature list Maximum Minimum Feature name Reset view – Click to move camera back to starting view Drag – Orbit around focal point Mouse wheel – Zoom in/out Shift + Drag – Pan camera Ctrl + Drag – Look around Correlation Matrix To see how each field relates to all the other fields, use a Correlation Matrix. Only numerical fields are displayed. Each bar is scaled on its y axis according to how its two contributing fields relate on a scale of –1 (red) to 1 (blue). Feature name Matrix bar - Click to see details about this specific feature pair. Reset view – Click to move camera back to starting view Drag – Orbit around focal point Mouse wheel – Zoom in/out Shift + Drag – Pan camera Ctrl + Drag – Look around Statistics Modeling Left Sidebar Models - Click and drag to training template model slot to begin training. Model library (PyTorch, Tensorflow, XGBoost, SKLearn, Federated Learning) Model framework (Simple, Advanced, Federated) Virtual Datasets - Click and drag to training template VDS slot to begin training. Federate status Assets - Click and drag to training template asset slot when a text model has already been added to begin training Show/Hide artifacts Refresh data Upload Model Training Template/Parameters VDS Slot - Drag a VDS from the left sidebar to one of these slots to set it for that phase of training. All VDS Slot - Drag a VDS here to set it to all three phases of training. Model Slot - Drag a model here to set it for training. Asset Slot - Drag an asset here when a text model has been selected to set it for training. Training Name - Give the training a name to find it easier at a later time. Default - Reset the value to the saved default. Save Defaults - Save all current values as the new default values. Reset all to defaults - Reset all changed values back to their saved defaults. Clear saved defaults - Reset all saved defaults back to factory settings. Training Parameters - Expand/Collapse parameters Dragging components to training template Models and Virtual Datasets can be dragged to the training template. Items in VDS slots can be rearranged. Right Sidebar Trainings - Click to see additional details. Model library (PyTorch, Tensorflow, XGBoost, SKLearn, Federated Learning) Status - If there is an error, click to see additional details. Training Details Start Train - Click to reload the training parameters to begin a restart Delete training Download - Click to download training artifacts as a .zip file View Profile - Click to see the training profile Show/Hide trainings Modeling Graph Model node VDS node Asset node Training connector - Click to use these artifacts in a new training Number of trainings - The number of trainings using this combination of artifacts Training Profile Performance Graph Available Plots Selector - Choose from a list of selected graphs Plot explanation - Get a description of the selected graph type Update interval - How often the graph should update in seconds. Default 100. Number of points displayed is limited to 1000 to keep updates consistent. Line Toggle - Disable this value Line Intersect - Click to freeze in place. Click again to unfreeze. Confusion Interactable Square -Click a square to see details about actual and predicted values. Values only displayed in square if greater than 0 Show Records - Toggle box values between percentages and record counts Histogram - Displays all predicted values for an actual value. Clicking a bar will update the table beneath it. Table - A tabular view of sample results from the selected prediction. Explainability Analysis Lucd provides the ability to visualize “explanations” of a model’s output given specific inputs. Generally, explanations take the form of computed attribute weights, indicating the significance that an attribute gave to a model’s decision. This supports the ability to either debug a model or scrutinize the data fed to the model. This particular feature is supported by integration of the Lime framework. The figures below illustrate the explainability panel on the model profile view for various model types. Explainability - Tabular/Regression For analyzing a tabular model, the user enters sample(s) into the input text box as a list of lists of numbers, where each inner “list” is a single sample. Then click the “Explain” button underneath the box. The time required to run explanation analysis is dependent on the complexity of the model. Models with type tabular_classification or regression can explain tabular data predictions Input Array Enter values to predict on. Must be valid JSON, as shown above % of Training Data Percentage of training data to build the explainer. Must be greater than 0 and less than or equal to 1 Number of Top Explanations Positive integer denoting how many class explanations to show Inputs Colored to show how each influences top class prediction Class Probabilities Class predictions and corresponding likelihood Explanation How each input influences a positive or negative prediction Explainability - Images Models with type image_classification can explain image predictions Sample Image Select local image to explain Positive Only If True, include only regions of the image contributing to the predicted label. Hide Rest If True, make the non-explanation part of the return image gray Explanation Returned colorized image with shaded regions of positive and negative influence. Red sections detract from the predicted class while green contributes positively to the predicted class. Predicted Probabilities Class predictions and corresponding likelihood Explainability - Text For text models, simply type the raw string you would like to have explained by your model. Models with type text_classification can explain text predictions. Input Text Text the user would like to predict and explain Output Text Output text with class probabilities highlighted in positive (green) or negative (blue) colors Predicted Probabilities Class probabilites predicted Explanation Words that contribute to positive or negative correlation Details Federated Lucd Release 6.5.0 introduces Federated Machine Learning to the Lucd platform. This capability introduces new features to the Lucd platform in order to support the development of federated models. Namely, if your Lucd platform is set up as part of a federation, many of the operations you perform within the JedAI client will automatically be federated. This includes: Query: if your query matches data on multiple systems, you will get results from all of those systems. EDA / search tree creation: saving your query into an eda tree will also create the eda tree on your other federates. VDS: a vds created containing data from a federated query will in turn be created on all federates containing relevant data. Model definition: model definitions uploaded you your JedAI GUI will also be created on other federates. Training object: when training a federated model, a corresponding training object will be created on all participating federates. Virtual Datasets Open transform - opens the transform workflow that created this VDS Copy ID - useful for finding VDS via RES API calls Create an embedding Delete - Delete a VDS Refresh - retrieves the latest VDS data Assets Delete an embedding Visualize - See embedding data on a PCA/TSNE chart Refresh - retrieves the latest Asset data PCA/TSNE Embeddings can be viewed using PCA/TSNE techniques for visualization. Style - When viewing an embedding’s PCA/TSNE, click to see terms instead of points. Region Select - Toggle to select a cluster of points using a bounding box. Multiple Select - Use to add multiple bounding boxes. Search - Search for a term. All matching terms will be highlighted, as well as shown in a list to the right until there is only one matching term. Filter - Narrow the number of occurrences for a term to a range using. Technique Select - Toggle between PCA and TSNE.
  2. User Guide The Lucd JedAI Client is downloaded locally on your device and interfaces with the Lucd Platform. The client enables users to visualize, transform, and prepare data for use in modeling frameworks (TensorFlow, PyTorch, Scikit-learn, etc.). Models can be uploaded and trained in the platform, which is touchscreen capable (not required). System Requirements The following specifications are required in order to run the client. Windows or MacOS 4 GB Memory Modern CPU Recommended Configuration Although not required, we recommend the following specifications in order to maximize the performance of the client. A GPU to support accelerated rendering 1600x900 display resolution minimum Installation Instructions The client is distributed via Lucd’s Steam Store. A user is required to obtain a Steam account in order to access the client download. Usage Instructions Login Log in to the client using the credentials provided to you. Username Password Domain Cloud customers will leave the domain field blank when logging in. Private build customers will be provided a domain to use when logging in. Login - Click to submit login credentials and enter the application. New User - If this is your first time using Lucd, click here to register as a new user. Register a new user Generate password - Have Lucd suggest a password that meets the password requirements. Password requirements - Hover to view the Lucd password requirements Cannot reuse ANY old password. 2 instances of all character classes. Uppercase Lowercase Number Special: !@#$%^&*() No more than 2 characters from a class consecutive (123 is invalid). No repeating characters (33 is invalid). Register - Click to submit your details and return to the login screen. After registering a new user, you may immediately login with that username. Projects Immediately after login, the Projects view is displayed. A project is a handy way to group artifacts based on data science problem. Available Projects – a list of all projects the logged in user has access to open. Global Status – the status of all artifacts on the currently logged in system Federated Status – the number of online/offline artifacts of the given type Open Unallocated – begin using Lucd without an open project. Any artifacts created here will be saved as unallocated and still accessible from this button. Search – search for a project by name or description Grid/List view – change how the set of projects is displayed Project – select this item to see more details about the project Project Details – View the name, description and artifact counts for the selected project Open Project – Open the currently selected project Hovering over a project item will display the ‘Options’ menu. Edit Details – change the name and description of the hovered project Change Cover – select a meaningful cover photo (optional) Show Details – View project details on right side of screen Delete – Delete the project. Any artifacts allocated to that project will be moved to the ‘Unallocated’ space and can still be used. Navigation Upon login, a user will see some form of this menu bar at the top of the screen, depending on which view is currently open. Projects – close the currently open project and return to the ‘Projects’ view Data – Click to go to the Workflow space. Hover and select from Sources – view the sources visualization Query Builder – go to the query building tool Virtual Datasets – view a list of available virtual datasets for training Assets – Click to view a list of available embedding for training Modeling – Click to go to the ‘Modeling’ view to easily start a training run. Hover and select from Models – view a list of available models for training Federation – Click to view a list of currently connected federates and artifacts associated with each Project name – the name of the currently open project Federate – Hover to view the currently connected domain Username – the currently logged in user Minimize – Hide any open dockable panels to see the main view. Click again to unhide. View Log – Click to view a list of status messages on the Lucd system Settings – Click to edit various user and system settings Sources The Lucd client can show the user all available sources across the federation, as well as a data ingestion over time visualization. Sources table – list of all sources in federation Federate indicator – hover over to see which federates contain the source. If the indicator is missing, then the source only exists on the logged in domain. Refresh – refresh the displayed data Ingestion over time viz – color represents relative number of records ingested during a given time period for a source. Click a box to zoom into that time period across all sources. Back – Go up a time period (ex: Month to Year) Data After selecting a project, a user is taken to the Data Transform view, where queries can have EDA operations added to them and then built into a Virtual Dataset (VDS). Left Sidebar Saved Workflows – Each row represents a query that has been saved and is eligible to have EDA operations performed on it. Can be dragged and reordered on the ‘Active Workflows’ space. Federate – hover over to see the federates of all VDS’ contained in the saved workflow. If it is orange, then at least one of the VDS has an issue on a federate. If this icon is not visible, then all VDS’ on that workflow only exist on the logged in domain VDS – The number of VDS created from the given workflow Workflow name – This will appear red if any operations within the workflow have returned an error. The error will go away once the operation has returned successfully. Quick add – click to add the workflow to the 3D visualize space Begin a new query – Click to build a query inside query builder so that it can be saved to the Transform space Available operations – Click to begin adding an operation to a selected 3D node Workflow 3D Space Zoom – click and drag to zoom in and out of 3D space Arrow to node – click to move selection to a different node. Can also use arrow keys. Active Workflows – an ordered list of workflows currently displayed in 3D space. Can be dragged into a new order or clicked on to zoom to the root node of the selected workflow. Selected node – Select any node to see additional options Child node – children are displayed to the right of a parent node with lines connecting it. Query node Remove – remove the selected query and accompanying workflow from the 3D space Delete – delete the selected query and accompanying workflow. This cannot be undone. Edit – Reloads the query parameters into the query builder so it can modified and saved as a new query Preview Data – Execute the query and visualize the results Create VDS – Begins the process for creating a Virtual Dataset used for training Operation nodes Operation name Operation type Delete – delete the selected operation and all downstream operations in workflow. This cannot be undone. Will not execute if there is a VDS downstream. Preview Data – Execute the query all operations including this one and visualize the results. Create VDS – Begins the process for creating a Virtual Dataset used for training Virtual Dataset node VDS Name Delete – delete the selected VDS. This cannot be undone. Preview Data – Execute the query all operations leading to this VDS and visualize the results. Create Embedding - Only available with text models. Merge VDS - Click and drag to another Virtual Dataset to merge them together. Start training – Open the Modeling view to train with this VDS Federate – hover over to see the federates holding this VDS. If it is orange, then at least one of the federates is returning an error with the VDS. Navigating transforms with arrow keys Once a node is selected in the data transform space, you may use the arrow keys to navigate quickly between adjacent nodes. Collapsing nodes with double click Double clicking a node will collapse all children downstream from that node and add a superscript next to it, indicating how many nodes were collapsed. This can be useful in large, spread out trees. Rearranging active workflows Transform workflows in the active list can be rearranged in any order. This can be useful for comparing trees, or to bring two Virtual Datasets closer together to perform a merge. Preparing Text Data for Model Training Lucd provides special operations for easily preparing text data for model training, saving a model developer valuable time in manually coding routines for text transformation. After creating an EDA tree based on a query of a text data source, a developer can add a new operation to the tree based on NLP operations as shown above. NLP operations (e.g., stopword removal, whitespace removal, lemmatization) can be applied in any sequence. It’s important to select the correct facet as the “text attribute.” One can also elect to apply tokenization based on a document level (i.e., create one sequence of tokens for the entire facet value per record), or sentence level (i.e., create a token sequence per sentence in the facet for a record). Saving VDS with Processed Text When a developer wants to create a new virtual dataset including the transformed text data, they must choose the “processed_text” facet as the “sole” feature of the virtual dataset as shown below. Currently, Lucd does not support text model training incorporating multiple feature columns, only the “processed_text” facet must be selected. Applying Custom Operations Once custom operations have been defined and uploaded using the Lucd Python Client library, they are available in the GUI for usage in data transformation. As shown above, clicking on a custom operation will show further details, specifically the features the operation uses as well as the actual source code defining the op. As mentioned in the documentation for defining custom operations via the Lucd Python Client, one must select how to apply the operation based one of the following three Dask dataframe approaches: apply map_partitions applymap apply_direct - apply custom function directly on a dask dataframe Applying Image Operations To apply image operations, select the Image Ops tab within the New Op menu in an EDA tree. It’s important to select an image facet as the “Feature.” The currently provided operations are as follows: Vertical and horizontal flips Grayscale Contrast normalization Normalize (0 mean and unit variance) Resize width & height Color inversion Crop borders Gaussian blur Rotate Min-max scaling To array (converts binary data to Numpy Array) Reshape dimensions Query Builder The Lucd client offers a unique and intuitive way to query data, giving a user flexibility in how complex queries are strung together to retrieve exact results. Left Sidebar Sources – a list of available sources to query. This can be dragged into the node editor window. Quick add – click to add this source to the node editor window Federate status – Hover to see which federates that hold the source. If this icon does not show, then the source only exists on the currently logged in domain. Data Models – a list of available data models to query. This can be dragged into the node editor window. Quick add – click to add this data model to the node editor window View stats – click to view statistics of this particular data model View features – click to view the features of this particular data model Features – a list of features in this data model. This can be dragged into the node editor window Quick add – click to add this feature to the node editor window Federates – a list of available federates for filtering the query. Note: the currently logged in domain will ALWAYS return results regardless if it is selected. Node Editor Window Global search parameters – Click to view simple/advanced search filters Zoom – drag this slider or use the mouse wheel to zoom in and out of the node view Lucene syntax – a text representation of the search to be executed. Copy Lucene syntax – click to copy the Lucene syntax. This can be pasted into the global search parameters to customize a search with features not supported by the node editor. Search – Click to execute the search Save – Save the search for use in Transform workflow. Note that a search must be execute before it is saved. Group – Toggle, then click and drag around a set of nodes to add a grouping around them. This acts as a set of parentheses in the Lucene syntax. This function can also be accomplished by holding Shift + Left click + drag Refresh – Click to retrieve and repopulate the list of sources/data models/federates. Exit – Close the query builder. Any unsaved progress will be lost. Modify Node – Change node filter settings Delete Node Node connection dropdown – Click to select from AND/OR/XOR Node connector – click and drag to connect to another node or grouping Statistics – click to view statistics of last executed query Advanced Search Parameters All these words – search results must include all these words Lucene query – add a Lucene query that will take the place of whatever is in the Node Editor Window This exact phrase – search results must include this exact phrase None of these words – search results must not have any of these words Records per source/model - return this many records per source/model Total records to return - return at least this many total records Date range – search results must be from within this time period Randomize – results should be returned in a random order All Sources/Models - results should include a sample from every applicable source and data model Search Results Visualization panel – this will update with each search executed Federate distribution – a bar chart showing how many records were returned from each applicable federate Query statistics – each returned feature will show relevant statistics, and if applicable, a box plot to visualize. Adding a node and changing connection logic Nodes can be dragged into the workspace, or quickly added using the ‘+’ button on the left. The dropdown connecting two nodes or groups can be changed to AND/OR/XOR Grouping nodes Nodes can be grouped together using the ‘Group’ toggle at the top or by holding shift and dragging. Groupings will add parentheses around the selected node in the Lucene output. Manually connecting nodes Nodes can be manually connected and disconnected by clicking and dragging either of the two circles on the side of a node/group. Visualization General Query name – the name that was saved with the query Record count – number of records returned out of number total records across system that fit query Visualization selector – click each to change the visualization Quick Add – click to add another visualization window of the same data slice Maximize - click to expand the panel to full screen Table Feature/column names Histogram – lightweight visualization of numeric field distribution Top/unique value – for string types only Table row – click to see list of feature values Paging controls – Go forward or backward in results Scatterplot - 2D Axis selector – Select the axes from a list of available features Filter knobs – Drag these knobs to adjust axis filter. Drag away from plot to reset the axis Remove plot – Removes the plot from view Add new plot – Adds a new plot to view Box Plot Feature selector – Select the feature from a list of available features Remove plot – Removes the plot from view Add new plot – Adds a new plot to view Histogram Feature selector – Select the feature from a list of available features Filter knobs – Drag these knobs to adjust axis filter. Drag away from plot to reset the axis Remove plot – Removes the plot from view Add new plot – Adds a new plot to view Scatterplot - 3D Axis selector – Select the axes from a list of available features Filter knobs – Drag these knobs to adjust axis filter. Drag away from plot to reset the axis Scatterplot point – Select to view details. Double click to focus in on that point Reset view – Click to move camera back to starting view Drag – Orbit around focal point Mouse wheel – Zoom in/out Shift + Drag – Pan camera Ctrl + Drag – Look around Parallel Coordinate Plot Feature selector – Select the feature from a list of available features Add Feature – Click to add an additional feature to visualization Remove Feature – Click to remove a feature from visualization Reorder Features – Click and drag to reorder feature list Maximum Minimum Feature name Reset view – Click to move camera back to starting view Drag – Orbit around focal point Mouse wheel – Zoom in/out Shift + Drag – Pan camera Ctrl + Drag – Look around Correlation Matrix To see how each field relates to all the other fields, use a Correlation Matrix. Only numerical fields are displayed. Each bar is scaled on its y axis according to how its two contributing fields relate on a scale of –1 (red) to 1 (blue). Feature name Matrix bar - Click to see details about this specific feature pair. Reset view – Click to move camera back to starting view Drag – Orbit around focal point Mouse wheel – Zoom in/out Shift + Drag – Pan camera Ctrl + Drag – Look around Statistics Modeling Left Sidebar Models - Click and drag to training template model slot to begin training. Model library (PyTorch, Tensorflow, XGBoost, SKLearn, Federated Learning) Model framework (Simple, Advanced, Federated) Virtual Datasets - Click and drag to training template VDS slot to begin training. Federate status Assets - Click and drag to training template asset slot when a text model has already been added to begin training Show/Hide artifacts Refresh data Upload Model Training Template/Parameters VDS Slot - Drag a VDS from the left sidebar to one of these slots to set it for that phase of training. All VDS Slot - Drag a VDS here to set it to all three phases of training. Model Slot - Drag a model here to set it for training. Asset Slot - Drag an asset here when a text model has been selected to set it for training. Training Name - Give the training a name to find it easier at a later time. Default - Reset the value to the saved default. Save Defaults - Save all current values as the new default values. Reset all to defaults - Reset all changed values back to their saved defaults. Clear saved defaults - Reset all saved defaults back to factory settings. Training Parameters - Expand/Collapse parameters Dragging components to training template Models and Virtual Datasets can be dragged to the training template. Items in VDS slots can be rearranged. Right Sidebar Trainings - Click to see additional details. Model library (PyTorch, Tensorflow, XGBoost, SKLearn, Federated Learning) Status - If there is an error, click to see additional details. Training Details Start Train - Click to reload the training parameters to begin a restart Delete training Download - Click to download training artifacts as a .zip file View Profile - Click to see the training profile Show/Hide trainings Modeling Graph Model node VDS node Asset node Training connector - Click to use these artifacts in a new training Number of trainings - The number of trainings using this combination of artifacts Training Profile Performance Graph Available Plots Selector - Choose from a list of selected graphs Plot explanation - Get a description of the selected graph type Update interval - How often the graph should update in seconds. Default 100. Number of points displayed is limited to 1000 to keep updates consistent. Line Toggle - Disable this value Line Intersect - Click to freeze in place. Click again to unfreeze. Confusion Interactable Square -Click a square to see details about actual and predicted values. Values only displayed in square if greater than 0 Show Records - Toggle box values between percentages and record counts Histogram - Displays all predicted values for an actual value. Clicking a bar will update the table beneath it. Table - A tabular view of sample results from the selected prediction. Explainability Analysis Lucd provides the ability to visualize “explanations” of a model’s output given specific inputs. Generally, explanations take the form of computed attribute weights, indicating the significance that an attribute gave to a model’s decision. This supports the ability to either debug a model or scrutinize the data fed to the model. This particular feature is supported by integration of the Lime framework. The figures below illustrate the explainability panel on the model profile view for various model types. Explainability - Tabular/Regression For analyzing a tabular model, the user enters sample(s) into the input text box as a list of lists of numbers, where each inner “list” is a single sample. Then click the “Explain” button underneath the box. The time required to run explanation analysis is dependent on the complexity of the model. Models with type tabular_classification or regression can explain tabular data predictions Input Array Enter values to predict on. Must be valid JSON, as shown above % of Training Data Percentage of training data to build the explainer. Must be greater than 0 and less than or equal to 1 Number of Top Explanations Positive integer denoting how many class explanations to show Inputs Colored to show how each influences top class prediction Class Probabilities Class predictions and corresponding likelihood Explanation How each input influences a positive or negative prediction Explainability - Images Models with type image_classification can explain image predictions Sample Image Select local image to explain Positive Only If True, include only regions of the image contributing to the predicted label. Hide Rest If True, make the non-explanation part of the return image gray Explanation Returned colorized image with shaded regions of positive and negative influence. Red sections detract from the predicted class while green contributes positively to the predicted class. Predicted Probabilities Class predictions and corresponding likelihood Explainability - Text For text models, simply type the raw string you would like to have explained by your model. Models with type text_classification can explain text predictions. Input Text Text the user would like to predict and explain Output Text Output text with class probabilities highlighted in positive (green) or negative (blue) colors Predicted Probabilities Class probabilites predicted Explanation Words that contribute to positive or negative correlation Details Federated Lucd Release 6.5.0 introduces Federated Machine Learning to the Lucd platform. This capability introduces new features to the Lucd platform in order to support the development of federated models. Namely, if your Lucd platform is set up as part of a federation, many of the operations you perform within the JedAI client will automatically be federated. This includes: Query: if your query matches data on multiple systems, you will get results from all of those systems. EDA / search tree creation: saving your query into an eda tree will also create the eda tree on your other federates. VDS: a vds created containing data from a federated query will in turn be created on all federates containing relevant data. Model definition: model definitions uploaded you your JedAI GUI will also be created on other federates. Training object: when training a federated model, a corresponding training object will be created on all participating federates. Virtual Datasets Open transform - opens the transform workflow that created this VDS Copy ID - useful for finding VDS via RES API calls Create an embedding Delete - Delete a VDS Refresh - retrieves the latest VDS data Assets Delete an embedding Visualize - See embedding data on a PCA/TSNE chart Refresh - retrieves the latest Asset data PCA/TSNE Embeddings can be viewed using PCA/TSNE techniques for visualization. Style - When viewing an embedding’s PCA/TSNE, click to see terms instead of points. Region Select - Toggle to select a cluster of points using a bounding box. Multiple Select - Use to add multiple bounding boxes. Search - Search for a term. All matching terms will be highlighted, as well as shown in a list to the right until there is only one matching term. Filter - Narrow the number of occurrences for a term to a range using. Technique Select - Toggle between PCA and TSNE. View full record
  3. Lucd Avicenna is the latest feature of the Lucd JedAI Client, look for the Epidemiology button on the application menu. Avicenna Model Predicts Impact of Pandemic Downstream Events can drive behavior just as behavior drives events. Machine Learning models today, whether classic machine learning models like linear regression or random forests, or 21st century neural networks, can be extremely valuable predictive engines. One shortcoming of many predictive models today, however, is the need for data -- lots of data. So what do when do when there's just not enough data to prime the pump, as it were? At Lucd, we suggest that other approaches are also quite useful depending on the context of the situation. In the case of Avicenna, a simulation model provides incredibly accurate predictions without the need for mountains of data. Simulation Models Simulation models can leverage valuable data to represent a discrete sequence of events in time. Whether it is pandemic response planning or preparedness for the next disruption an organization may face, simulation models can provide forward-looking insights. This can not only be done for a response during an event but also it can be used to address challenges post-event. As an example, how will hospitals and health organizations address their future? Lucd's Data Fusion, security, and scalability enable planning, actions, and mitigation for targeted and efficient command and control. Lucd is able to consume and ingest large data sets including real-time data capabilities. This can help organizations consume that data and create simulation models. chain and countless other impacted areas. Lucd builds advanced Enterprise AI solutions including advanced pandemic event-driven simulation modeling, AI-powered workforce insights, supply chain analytics, and more. Lucd calls this model, Avicenna. Imagine your organization, whether it is a retail outlet, a financial organization, manufacturing, or any other industry, empowered with the ability to leverage proven simulation models to help plan and prepare. Agent-Based Simulation Models As an agent-based simulation model, Avicenna is capable of analyzing effective data and variables to help businesses predict potential outcomes of events based on behavioral assumptions. Better planning for product releases, market entry decisions, staffing challenges, financial analysis, potential supply chain disruptions and much more can be modeled to help predict potential outcomes with greater accuracy. How is this done? The Avicenna model leverages industry-leading machine learning and deep learning capabilities that only the Lucd Enterprise AI platform delivers. Lucd JedAI Health Helps Hospitals and Health Organizations to Plan and Prepare The Pandemic Downstream In the wake of a series of black swan events like those unleashed during something as pervasive and debilitating as a pandemic, how can health organizations cope and plan for the inevitable challenges they will face? How will they handle a surge of elective surgeries? Staff planning for procedures? Restrictions/mandates while delivering the services? Through Lucd's JedAI Avicenna model and simulation capabilities in relation to today's myriad health crises, Lucd ingests real-time COVID-19 data coupled with hospital patient information, bed counts, fire department, police, and ambulance information, and more in order to look forward and enable planning and preparedness. As hospital organizations attempt to plan for the new normal, the Avicenna event-driven simulation model provides these outlooks. Lucd JedAI Retail All industries and business sectors have been negatively impacted by the pandemic. Retail outlets have been especially hit hard. Most retailers were shut down with reopening resulting in a new environment. Conforming to new standards and regulations will impact customer engagement, customer planning, product delivery, and more. Retailers will need to transform the way in which they support consumers and their entire way of business. How can they plan, prepare, react and have better insights as to what decisions they should make? Lucd JedAI Retail is able to analyze the industry, market, and mobility data of consumers for better planning and delivering necessary solutions, even in the wake of industry-transforming events like a pandemic. Customer service, product availability, and supply chain are other examples of how Lucd JedAI Retail can use the Avicenna event-driven simulation model to enable retailers to plan for macro or micro-market challenges. Lucd JedAI In Summary Organizations were not fully prepared for the impact of today's crisis. But hospital management is now be able to manage, staff, and plan for the likely surge of elective surgeries as the current crisis progresses. Retailers will be better able to manage new and emerging consumer behavior and interactions as the disease runs its natural course. Lucd JedAI Retail solutions will reduce cost, maximize revenue and mitigate risk as to the emerging new rules and behaviors that govern the post-pandemic world. Give your team the tools it needs to properly prepare. The Avicenna event-driven simulation model is now available for a free trial for your business. Simply download the Lucd JedAI client and try the Avicenna model to give your business the tools that it needs to make the best decisions. Lucd is pioneering the creation of Enterprise AI with its end-to-end platform. About this Software The Lucd JedAI Client provides an immersive easy to use user experience that facilitates a collaborative approach to Visual Analytics (understanding data) and Exploratory Data Analysis (preparing and transforming data for analysis) and is the primary mechanism to accomplish these tasks with a secure interface into the Lucd Unified Data Space (UDS). The stunning 3D UI accesses the UDS remotely via a secure network connection (SSL/TLS). Get the Lucd JedAI Client
  4. Basic Exploratory Data Analysis in the Lucd Client Exploratory Data Analysis (EDA) is a key part of any model-building lifecycle. In this post, we will create a basic EDA workflow in the Lucd client platform. We will use the Iris dataset so that anyone with the Lucd client can follow along. The Lucd client is available as a free download on Steam. For a video walkthrough of many of these and other capabilities, refer to the JedAI client video tutorials. Preparation To generate a basic query to use the Iris dataset for EDA, within the Lucd client, either within a project or after selecting "Open Unallocated" from the main landing page: Select the “Data” > "Query Builder" from the horizontal menu on the top of the screen From the "Sources" menu, click the "+" next to "Iris" to add it on the grid From the "Data Models" menu, click the "+" next to "flower" to add it to the grid Select the “Search” button to display query results Once satisfied with the query results, click "X" to close (not shown) Select the "Save" icon and enter a name and description for the query, then click "Save" - we used "Iris_Query_1" as the name in this example The system will save the query and automatically drop it on the workspace. EDA in the Lucd Client Once the query is visible in the workspace: Preview the query output by selecting the "Preview Data" (eye) icon from below the magnifying glass (Optional) Click the maximize button in the top right corner of the Visualize window output (not shown) The table of the Iris query should now appear, similar in appearance to that shown in the image on the left. Table View The table view shows the fields available in the dataset – in this example: petal_length petal_width sepal_length sepal_width species Additionally, the table view shows: Other fields created upon consumption of the dataset into the Lucd client The federates that are supplying the data – in this case, only p1.lucd.ai (only relevant if accessing data across multiple systems in a Federated Machine Learning scenario) 3D Scatterplot The 3D scatterplot is arguably the most powerful visualization in the Lucd client, showing points in three-dimensional space. This visualization is particularly helpful spotting groupings (labels or classes) and how they can be potentially be predicted by different input variables. To view the 3D scatterplot, select the “Scatterplot - 3D” link. The user can click and drag the scatterplot in any direction to view how the points fall in three-dimensional space from any desired perspective. Thick dots in the scatterplot represent a point in three-dimensional space; thin gray dots represent points on two-dimensional planes corresponding to the intersection of two axes. As such, any single thick dot has multiple corresponding thin dots. The 3D scatterplot allows the user to change axis and color variables as desired. Simply use the dropdowns in the lower-right corner to do so. The view shown in the image on the left effectively clusters the different species of Irises. To generate that view, select the following: X Axis: petal_length Y Axis: petal_width Z Axis: sepal_length Color: species Helpful Hints: Set the Y-axis or Color to the label/class and test the X- and Z-axes using different values to visualize how they work individually and together to call out class differences. Selected axis variables can be categorical / discrete; they need not be continuous. In adjacent image, the user has selected the Y-axis variable to be the species. Note how the platform has assigned the species into three distinct groupings along the Y-axis. The visualization to the left clearly illustrates between-species differences, particularly with the one group in the lower-left corner, whose points are bunched at one end of the petal_length axis. Rotating the plot (dragging right-to-left) shows a similar bunching using the petal_width axis. Users of the Lucd platform can quickly test different combinations when searching for the best predictors using the 3D scatterplot. In the image on the left, the Z-axis is sepal_width; in the image on the right, it is petal_width - everything else is the same. Comparing the two, it appears that the sepal_width variable does not provide as much segmentation along its axis as does petal_width, providing a visual indicator that petal_width may indeed be a better classifier. Correlation Matrix The Correlation Matrix is another useful EDA tool. The plot allows the user to visualize correlations and/or identify covariates that may be useful for modeling. To view the Correlation Matrix, select the "Correlation Matrix" link within the Visualize window. The Correlation Matrix is based on the Pearson Correlation statistic between numeric variables: bar heights are equal to the value of the correlation between the indicated pairings. An example for the Iris data is shown to the left. Highly positive correlations are shown as purple bars. The tallest bar – running diagonally across the plot – represents a Pearson correlation coefficient of +1. The Iris dataset contains many fields with high positive correlations, so spotting the diagonal with the +1 bar height is not as intuitive as it is with many other datasets, however, one simple way to know the height of the bar is to align a variable on the X-axis with the corresponding variable on the Y-axis and examining the resulting bar’s height. Using the power of the Lucd client’s Unity-based engine, the user can also rotate the plot to the desired view to check bar heights. Clicking on a specific bar will toggle the display of the Pearson correlation statistic for the two variables, ranging from -1 (perfect negative correlation) to +1 (perfect positive correlation). Short bars can be meaningful in these plots. They may indicate strong negative correlations. Visually examining for colors that differ most from the purple corresponding to the top bar height (Pearson correlation coefficient of +1) along the diagonal, we can spot a progressively contrasting color when more disagreement exists, indicative of strong negative Pearson correlation coefficient values. Examine, for instance, the relationship in the lower left corner between sepal_width and petal_length. Negative correlations can be useful in model-building - sometimes more useful than their positive counterparts. Histogram Histograms are useful to analyze the distribution of a numeric variable. To view the Histogram, select the "Histogram" link within the Visualize window. Select the variable for which you would like to see a histogram using the dropdown at the top of the window. The user can add histograms by clicking on the “+” button in the lower right portion of the window. Any plot can be closed by clicking the “x” at the top right. The histogram to the left shows that the petal_length variable has a bit of a gap somewhere around the 2.4 value – this could be an indicator of a potentially useful predictor for the species label. 2D Scatterplot 2D scatterplots are useful to visualize the relationship between numeric variables. Visualizing across a third dimension is also possible using the Color variable. To view 2D Scatterplots, select the "Scatterplot - 2D" link within the Visualize window. An example is shown in the image to the left. To change the selected variable, click on the variable along either the X- or Y-axis – or the Color variable on the right side of the chart – and choose the desired selection. The example shows that categorical variables can be selected – and are particularly useful for the Color selection. Clicking on the filter button below the selector for the color bar variable allows the user to filter for specific items within the selected variable. The user can add scatterplots by clicking on the “+” to the right of the existing plot. Any plot can be closed by clicking the “x” below the plot. Box Plot Box plots are useful to analyze the distribution of numeric variables, particularly when the user is interested in a visual representation of where the data points fall within quartiles. To view a box plot, select the "Box Plot" link within the Visualize window. Select the desired variable using the dropdown at the top of the plot. The user can add box plots by clicking on the “+” to the right of the existing plot. Any plot can be closed by clicking the “x” adjacent to the variable selection. The box plot to the left shows that the data in the petal_length variable are heavily concentrated between the median (corresponding to a value of 4.05 units) and third quartiles (5.1). The user can quickly see that this gap of roughly one-third of a unit is small compared to the adjacent lower quartile, which spans nearly three full units – another potentially useful predictor for the species label. Parallel Coordinate Plot Parallel coordinate plots are powerful EDA tools. To view a Parallel Coordinate plot, select the "Parallel Coordinate Plot" link within the Visualize window. Select desired plot variables using the dropdowns at the lower right of the window. Add variables to the plot by clicking on the “+” button just above the variable selection boxes. Rearrange the order of items in the list by clicking and dragging the handles on the far right of the list of selected fields. The view can be reset using the button just to the left of the "+" button. The Parallel Coordinate plot to the left shows the species field on the far right. Clearly, the three different species are separated and easily distinguishable – and categorical variables such as species can be useful, particularly on one end of a plot. From there, it is helpful to look for patterns in slopes or groups that clearly lie in different sections of the visual than others. We can see clearly that the versicolor species is generally in the middle of the petal_width and petal_length distributions, while the setosa species is at the bottom of both. The other species (virginica) – though not identified on the plot per se – typically has the highest petal_length and petal_width. So both these factors may be useful in a prediction model. Conversely, if we had seen these points all overlapping with inconsistent slopes from panel to panel of the plot, we would conclude that these variables would not be good species predictors. It's often helpful to rearrange the order of the class variable ("species" in this example) to take more directly examine fields that correlate to it.
  5. Basic Exploratory Data Analysis in the Lucd Client Exploratory Data Analysis (EDA) is a key part of any model-building lifecycle. In this post, we will create a basic EDA workflow in the Lucd client platform. We will use the Iris dataset so that anyone with the Lucd client can follow along. The Lucd client is available as a free download on Steam. For a video walkthrough of many of these and other capabilities, refer to the JedAI client video tutorials. Preparation To generate a basic query to use the Iris dataset for EDA, within the Lucd client, either within a project or after selecting "Open Unallocated" from the main landing page: Select the “Data” > "Query Builder" from the horizontal menu on the top of the screen From the "Sources" menu, click the "+" next to "Iris" to add it on the grid From the "Data Models" menu, click the "+" next to "flower" to add it to the grid Select the “Search” button to display query results Once satisfied with the query results, click "X" to close (not shown) Select the "Save" icon and enter a name and description for the query, then click "Save" - we used "Iris_Query_1" as the name in this example The system will save the query and automatically drop it on the workspace. EDA in the Lucd Client Once the query is visible in the workspace: Preview the query output by selecting the "Preview Data" (eye) icon from below the magnifying glass (Optional) Click the maximize button in the top right corner of the Visualize window output (not shown) The table of the Iris query should now appear, similar in appearance to that shown in the image on the left. Table View The table view shows the fields available in the dataset – in this example: petal_length petal_width sepal_length sepal_width species Additionally, the table view shows: Other fields created upon consumption of the dataset into the Lucd client The federates that are supplying the data – in this case, only p1.lucd.ai (only relevant if accessing data across multiple systems in a Federated Machine Learning scenario) 3D Scatterplot The 3D scatterplot is arguably the most powerful visualization in the Lucd client, showing points in three-dimensional space. This visualization is particularly helpful spotting groupings (labels or classes) and how they can be potentially be predicted by different input variables. To view the 3D scatterplot, select the “Scatterplot - 3D” link. The user can click and drag the scatterplot in any direction to view how the points fall in three-dimensional space from any desired perspective. Thick dots in the scatterplot represent a point in three-dimensional space; thin gray dots represent points on two-dimensional planes corresponding to the intersection of two axes. As such, any single thick dot has multiple corresponding thin dots. The 3D scatterplot allows the user to change axis and color variables as desired. Simply use the dropdowns in the lower-right corner to do so. The view shown in the image on the left effectively clusters the different species of Irises. To generate that view, select the following: X Axis: petal_length Y Axis: petal_width Z Axis: sepal_length Color: species Helpful Hints: Set the Y-axis or Color to the label/class and test the X- and Z-axes using different values to visualize how they work individually and together to call out class differences. Selected axis variables can be categorical / discrete; they need not be continuous. In adjacent image, the user has selected the Y-axis variable to be the species. Note how the platform has assigned the species into three distinct groupings along the Y-axis. The visualization to the left clearly illustrates between-species differences, particularly with the one group in the lower-left corner, whose points are bunched at one end of the petal_length axis. Rotating the plot (dragging right-to-left) shows a similar bunching using the petal_width axis. Users of the Lucd platform can quickly test different combinations when searching for the best predictors using the 3D scatterplot. In the image on the left, the Z-axis is sepal_width; in the image on the right, it is petal_width - everything else is the same. Comparing the two, it appears that the sepal_width variable does not provide as much segmentation along its axis as does petal_width, providing a visual indicator that petal_width may indeed be a better classifier. Correlation Matrix The Correlation Matrix is another useful EDA tool. The plot allows the user to visualize correlations and/or identify covariates that may be useful for modeling. To view the Correlation Matrix, select the "Correlation Matrix" link within the Visualize window. The Correlation Matrix is based on the Pearson Correlation statistic between numeric variables: bar heights are equal to the value of the correlation between the indicated pairings. An example for the Iris data is shown to the left. Highly positive correlations are shown as purple bars. The tallest bar – running diagonally across the plot – represents a Pearson correlation coefficient of +1. The Iris dataset contains many fields with high positive correlations, so spotting the diagonal with the +1 bar height is not as intuitive as it is with many other datasets, however, one simple way to know the height of the bar is to align a variable on the X-axis with the corresponding variable on the Y-axis and examining the resulting bar’s height. Using the power of the Lucd client’s Unity-based engine, the user can also rotate the plot to the desired view to check bar heights. Clicking on a specific bar will toggle the display of the Pearson correlation statistic for the two variables, ranging from -1 (perfect negative correlation) to +1 (perfect positive correlation). Short bars can be meaningful in these plots. They may indicate strong negative correlations. Visually examining for colors that differ most from the purple corresponding to the top bar height (Pearson correlation coefficient of +1) along the diagonal, we can spot a progressively contrasting color when more disagreement exists, indicative of strong negative Pearson correlation coefficient values. Examine, for instance, the relationship in the lower left corner between sepal_width and petal_length. Negative correlations can be useful in model-building - sometimes more useful than their positive counterparts. Histogram Histograms are useful to analyze the distribution of a numeric variable. To view the Histogram, select the "Histogram" link within the Visualize window. Select the variable for which you would like to see a histogram using the dropdown at the top of the window. The user can add histograms by clicking on the “+” button in the lower right portion of the window. Any plot can be closed by clicking the “x” at the top right. The histogram to the left shows that the petal_length variable has a bit of a gap somewhere around the 2.4 value – this could be an indicator of a potentially useful predictor for the species label. 2D Scatterplot 2D scatterplots are useful to visualize the relationship between numeric variables. Visualizing across a third dimension is also possible using the Color variable. To view 2D Scatterplots, select the "Scatterplot - 2D" link within the Visualize window. An example is shown in the image to the left. To change the selected variable, click on the variable along either the X- or Y-axis – or the Color variable on the right side of the chart – and choose the desired selection. The example shows that categorical variables can be selected – and are particularly useful for the Color selection. Clicking on the filter button below the selector for the color bar variable allows the user to filter for specific items within the selected variable. The user can add scatterplots by clicking on the “+” to the right of the existing plot. Any plot can be closed by clicking the “x” below the plot. Box Plot Box plots are useful to analyze the distribution of numeric variables, particularly when the user is interested in a visual representation of where the data points fall within quartiles. To view a box plot, select the "Box Plot" link within the Visualize window. Select the desired variable using the dropdown at the top of the plot. The user can add box plots by clicking on the “+” to the right of the existing plot. Any plot can be closed by clicking the “x” adjacent to the variable selection. The box plot to the left shows that the data in the petal_length variable are heavily concentrated between the median (corresponding to a value of 4.05 units) and third quartiles (5.1). The user can quickly see that this gap of roughly one-third of a unit is small compared to the adjacent lower quartile, which spans nearly three full units – another potentially useful predictor for the species label. Parallel Coordinate Plot Parallel coordinate plots are powerful EDA tools. To view a Parallel Coordinate plot, select the "Parallel Coordinate Plot" link within the Visualize window. Select desired plot variables using the dropdowns at the lower right of the window. Add variables to the plot by clicking on the “+” button just above the variable selection boxes. Rearrange the order of items in the list by clicking and dragging the handles on the far right of the list of selected fields. The view can be reset using the button just to the left of the "+" button. The Parallel Coordinate plot to the left shows the species field on the far right. Clearly, the three different species are separated and easily distinguishable – and categorical variables such as species can be useful, particularly on one end of a plot. From there, it is helpful to look for patterns in slopes or groups that clearly lie in different sections of the visual than others. We can see clearly that the versicolor species is generally in the middle of the petal_width and petal_length distributions, while the setosa species is at the bottom of both. The other species (virginica) – though not identified on the plot per se – typically has the highest petal_length and petal_width. So both these factors may be useful in a prediction model. Conversely, if we had seen these points all overlapping with inconsistent slopes from panel to panel of the plot, we would conclude that these variables would not be good species predictors. It's often helpful to rearrange the order of the class variable ("species" in this example) to take more directly examine fields that correlate to it. View full record
  6. User Guide The Lucd Python Client provides capabilities for data scientists and AI model developers to prototype AI model solutions before uploading them to the Lucd Unity Client for extended training and performance analysis. The Lucd Python Client provides the following features: functions for accessing raw data and other assets from in Lucd for general analysis and custom visualization; functions for uploading user-defined feature transformation operations to Lucd, which can then be applied in the Lucd Unity Client to create a virtual dataset; functions for accessing ingesting data into TensorFlow and PyTorch models, which can be used for prototyping models. Installation The lucd-python-client python package should be installed using the pip command with a python wheel file. Instructions are as follows: Download or clone the lucd-python-client package (unzip if needed) from here: Lucd Python Client Project and open a command prompt and change to the package directory. At a command prompt, type python setup.py bdist_wheel The wheel file will appear in the dist directory. Switch to the dist directory and type pip install <wheel_filename> Requirements Python 3.6.5 is required for custom feature operations to work appropriately. APIs The Lucd Python Client uses python and REST APIs. Code examples using both API types are available in the examples directory of the project. Lucd Python Client API Examples Example code illustrating how to perform tasks such as authenticating to Lucd, performing queries, obtaining virtual datasets and training models resides in the examples directory of the project. Below are specific examples of how to access Lucd data using the client as well as how to create and upload a custom feature transformation operation. Accessing Data from lucd import LucdClient, log from eda.int import asset from eda.int import vds from eda.int import uds from eda.lib import lucd_uds if __name__ == "__main__": username = 'xxx' password = 'xxx' domain = 'xxx' client = LucdClient(username=username, password=password, domain=domain) log.info(f"Connected to Lucd platform.") # queries follow Elasticsearch API.. # See: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl.html query = \ { "query": { "bool": { "must": [ { "bool": { "should": [ { "match_phrase": { "source": "iris" } } ] } }, { "bool": { "should": [] } } ], "filter": [ { "bool": { "filter": [ ] } } ] } }, "size": 2000, "dataset": "iris" } results, http = uds.search(query) print(f"Search Results ({http}):\n{results}\n") hits, stats = client.search_to_dataframe(results) print(f"Search Results:\n{hits.head()}\n") print(f"Search Statistics:\n{stats}\n") all_models, http = client.rest('lucd/model/read', {"uid": username}) print(f"All Models ({http}):\n{all_models}\n") all_vds, http = vds.read({"uid": username}) print(f"All Virtual Datasets ({http}):\n{all_vds}\n") all_assets, http = asset.read({"uid": username}) print(f"All Asset Embeddings ({http}):\n{all_assets}\n") # # Lucd Library Calls to fetch assets and VDSes # # When limiting asset size, you could encounter issues with missing index entries. embeddings_index, embedding_matrix, embedding_size, word_index_mapping, word_index_mapping_padded = \ lucd_uds.get_asset("xxx", limit=100) print(embeddings_index, embedding_matrix, embedding_size, word_index_mapping, word_index_mapping_padded) # When limiting data size, you will encounter delays bring back large amounts of data # over the network, and possibly run the client out of memory. all_vds, http = vds.read({"uid": None}) print(f"All Virtual Datasets ({http}):\n{all_vds}\n") df = lucd_uds.get_dataframe("xxx", limit=100) print(f"Dataframe Data\n{df.head(20)}") client.close() Custom Feature Transformation¶ from eda.int import custom_operation import lucd def create_greater_than_mean_column(df): column_mean = df["flower.petal_length"].mean() df["flower.petal_length_Mean"] = df["flower.petal_length"] > column_mean return df if __name__ == "__main__": client = lucd.LucdClient(domain="xxx", username="xxx", password="xxx", login_domain="xxx" ) data = { "operation_name": "create_greater_than_mean_column_JBstyle", "author_name": "J. Black", "author_email": "j.black@lucd.ai", "operation_description": "Sample operation", "operation_purpose": "add a new column", "operation_features": ["flower.petal_length"], "operation_function": create_greater_than_mean_column } response_json, rv = custom_operation.create(data) client.close() Federated Endpoints To support federated machine learning, much of the high-level rest functionality operates in a federated manner. This means that unless otherwise specified, these actions will be performed/created/deleted on all federates. The features impacted include: Queries EDA trees Custom Ops VDS objects Model definitions FATE Training objects Users may include a block in their JSON specifying which federates to operate on, which looks like the following: "federation": {"federates": ["domain_name1", "domain_name2"]}
  7. User Guide The Lucd Python Client provides capabilities for data scientists and AI model developers to prototype AI model solutions before uploading them to the Lucd Unity Client for extended training and performance analysis. The Lucd Python Client provides the following features: functions for accessing raw data and other assets from in Lucd for general analysis and custom visualization; functions for uploading user-defined feature transformation operations to Lucd, which can then be applied in the Lucd Unity Client to create a virtual dataset; functions for accessing ingesting data into TensorFlow and PyTorch models, which can be used for prototyping models. Installation The lucd-python-client python package should be installed using the pip command with a python wheel file. Instructions are as follows: Download or clone the lucd-python-client package (unzip if needed) from here: Lucd Python Client Project and open a command prompt and change to the package directory. At a command prompt, type python setup.py bdist_wheel The wheel file will appear in the dist directory. Switch to the dist directory and type pip install <wheel_filename> Requirements Python 3.6.5 is required for custom feature operations to work appropriately. APIs The Lucd Python Client uses python and REST APIs. Code examples using both API types are available in the examples directory of the project. Lucd Python Client API Examples Example code illustrating how to perform tasks such as authenticating to Lucd, performing queries, obtaining virtual datasets and training models resides in the examples directory of the project. Below are specific examples of how to access Lucd data using the client as well as how to create and upload a custom feature transformation operation. Accessing Data from lucd import LucdClient, log from eda.int import asset from eda.int import vds from eda.int import uds from eda.lib import lucd_uds if __name__ == "__main__": username = 'xxx' password = 'xxx' domain = 'xxx' client = LucdClient(username=username, password=password, domain=domain) log.info(f"Connected to Lucd platform.") # queries follow Elasticsearch API.. # See: https://www.elastic.co/guide/en/elasticsearch/reference/6.8/query-dsl.html query = \ { "query": { "bool": { "must": [ { "bool": { "should": [ { "match_phrase": { "source": "iris" } } ] } }, { "bool": { "should": [] } } ], "filter": [ { "bool": { "filter": [ ] } } ] } }, "size": 2000, "dataset": "iris" } results, http = uds.search(query) print(f"Search Results ({http}):\n{results}\n") hits, stats = client.search_to_dataframe(results) print(f"Search Results:\n{hits.head()}\n") print(f"Search Statistics:\n{stats}\n") all_models, http = client.rest('lucd/model/read', {"uid": username}) print(f"All Models ({http}):\n{all_models}\n") all_vds, http = vds.read({"uid": username}) print(f"All Virtual Datasets ({http}):\n{all_vds}\n") all_assets, http = asset.read({"uid": username}) print(f"All Asset Embeddings ({http}):\n{all_assets}\n") # # Lucd Library Calls to fetch assets and VDSes # # When limiting asset size, you could encounter issues with missing index entries. embeddings_index, embedding_matrix, embedding_size, word_index_mapping, word_index_mapping_padded = \ lucd_uds.get_asset("xxx", limit=100) print(embeddings_index, embedding_matrix, embedding_size, word_index_mapping, word_index_mapping_padded) # When limiting data size, you will encounter delays bring back large amounts of data # over the network, and possibly run the client out of memory. all_vds, http = vds.read({"uid": None}) print(f"All Virtual Datasets ({http}):\n{all_vds}\n") df = lucd_uds.get_dataframe("xxx", limit=100) print(f"Dataframe Data\n{df.head(20)}") client.close() Custom Feature Transformation¶ from eda.int import custom_operation import lucd def create_greater_than_mean_column(df): column_mean = df["flower.petal_length"].mean() df["flower.petal_length_Mean"] = df["flower.petal_length"] > column_mean return df if __name__ == "__main__": client = lucd.LucdClient(domain="xxx", username="xxx", password="xxx", login_domain="xxx" ) data = { "operation_name": "create_greater_than_mean_column_JBstyle", "author_name": "J. Black", "author_email": "j.black@lucd.ai", "operation_description": "Sample operation", "operation_purpose": "add a new column", "operation_features": ["flower.petal_length"], "operation_function": create_greater_than_mean_column } response_json, rv = custom_operation.create(data) client.close() Federated Endpoints To support federated machine learning, much of the high-level rest functionality operates in a federated manner. This means that unless otherwise specified, these actions will be performed/created/deleted on all federates. The features impacted include: Queries EDA trees Custom Ops VDS objects Model definitions FATE Training objects Users may include a block in their JSON specifying which federates to operate on, which looks like the following: "federation": {"federates": ["domain_name1", "domain_name2"]} View full record
  8. Hau

    0. LUCD Overview.mp4

    From the album: LUCD Unity Client Tutorials

    This is the first video in a nine part video tutorial series for our Lucd 2.0 client. This video provides a summary of features within our application and

HELP & SUPPORT

ABOUT US

Lucd is an AI software platform company that supports multiple industry verticals, allowing for its users to build enterprise-ready AI solutions with Low Code / No Code development practices. Lucd supports the entire AI lifecycle, allowing for the secure fusing of structured and unstructured data, empowering data analysts as well as business professionals to work collaboratively, resulting in reduced time to uncover new opportunities and solutions.

×
×
  • Create New...