9
River Invertebrate Classification Tool (RICT) Machine Learning Build Guide - GB Summer Single Year March 2020 V1

Introduction files/RIVPACS-RICT... · Web viewMachine Learning Build Guide - GB Summer Single Year March 2020 V1 Contents: 1.Introduction3 2.Purpose of this document3 3.Experiment

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction files/RIVPACS-RICT... · Web viewMachine Learning Build Guide - GB Summer Single Year March 2020 V1 Contents: 1.Introduction3 2.Purpose of this document3 3.Experiment

River Invertebrate Classification Tool (RICT)

Machine Learning Build Guide - GB Summer Single Year

March 2020

V1

Page 2: Introduction files/RIVPACS-RICT... · Web viewMachine Learning Build Guide - GB Summer Single Year March 2020 V1 Contents: 1.Introduction3 2.Purpose of this document3 3.Experiment

Contents:1. Introduction 3

2. Purpose of this document 3

3. Experiment 1: GB Prediction and Classification Summer Single Year 4

Version History

1 First version 06/03/2020 Nick Irvine

Page 3: Introduction files/RIVPACS-RICT... · Web viewMachine Learning Build Guide - GB Summer Single Year March 2020 V1 Contents: 1.Introduction3 2.Purpose of this document3 3.Experiment

1. IntroductionRiver Invertebrate Classification Tool (RICT) is a web application that implements the RIVPACS IV predictive model. This tool is maintained by the UK’s environment agencies; Scottish Environment Protection Agency (SEPA), Environment Agency (EA), Natural Resources Wales (NRW) and Northern Ireland Environment Agency (NIEA).

2. Purpose of this documentThe purpose of this document is to outline how a user can build RICT on Microsoft Machine Learning Studio. This document has no dependencies, is a stand alone guide, as such is the only document required to build the GB single year summer prediction and classification experiment on machine learning.

This document should be used as a companion to the RICT technical specification and user guide.

This document outlines how the GB summer single year prediction and classification experiment can be created. Additional documents have been produced to outline how the Northern Ireland single year and GB multi-year experiments can be built on machine learning.

Page 4: Introduction files/RIVPACS-RICT... · Web viewMachine Learning Build Guide - GB Summer Single Year March 2020 V1 Contents: 1.Introduction3 2.Purpose of this document3 3.Experiment

3. Experiment 1: GB Prediction and Classification Summer Single Year

The single year GB summer prediction and classification should be linked as outlined below. A description of each box is explained below.

1. Dataset inputThis should be removed by the user when they start and experiment and then read the input file they wish to use for the experiment.

For the standard experiment published to the gallery the test data set is added here.

To add this data set, upload a dataset by pressing NEW bottom left of the screen then upload the CSV input file.

Select the uploaded file and drag onto the experiment.

This step is explained in the user guide in section 8.

2. Enter Data ManuallyCreated by using options on left – Data Input and Output > Enter Data Manually

When added, left click on the box and the options on the right panel should be:

Has header should be ticked.

Data Format: CSV

1

23

4

5 67

8 9 1000

Page 5: Introduction files/RIVPACS-RICT... · Web viewMachine Learning Build Guide - GB Summer Single Year March 2020 V1 Contents: 1.Introduction3 2.Purpose of this document3 3.Experiment

Data box should read:

Row 1: headerRow 2: parameterRow 3: <leave blank>

3. Prediction Support FilesThese are the support files that are needed to process the R code prediction script block.

This is a zip file which includes all the support files included.

The list of files to be included in the prediction support file zip is:

AirTempGrid.csv DFCOEFF_GB685.DAT DFMEAN_GB685.DAT EndGrp_AssessScores.csv Helperfunctionsv1.R MeanAirTempAirTempRangeASFunction.R PredictionfunctionsV1.R rnrfa_1.4.0.zip TAXAAB.csv TAXAPRAB.csv Test_Data_End_Point_Means_Copy.csv x103EndGroupMeans(FORMATTED).csv

This zip file embedded below

Further details about the function of each support file within the zip is outlined in technical specification.

The zip file is added by the same steps as option 2 above. The zip file is uploaded and then dragged on to the experiment.

4. Execute R Code ScriptThis is the first block of code to execute the prediction R code script.

This block is created by selecting R Language Modules > Execute R script and dragging on to the workspace.

Once the block is created, click the box and on the right hand panel in the R Script box, delete all the content and paste in the code from the file below

Page 6: Introduction files/RIVPACS-RICT... · Web viewMachine Learning Build Guide - GB Summer Single Year March 2020 V1 Contents: 1.Introduction3 2.Purpose of this document3 3.Experiment

Random Seed box should be left empty

R Version should be selected as Microsoft R Open 3.4.4

Boxes 1,2,3 should then be linked to this box as per the diagram at the start of this section.

5. Select Columns in DatasetThis box enables the user to select a different output if desired.

This can be found on the left hand side by selecting Data Transformation > Manipulation and then Select Columns in Dataset.

Once created you need to select all the columns that need to be selected. Click on the box and then the click launch column selector. Then select Begin With > No Columns

Underneath select Include Column Names and enter the following (enter each option individually and then press enter after each addition).

SITE LATITUDE LONGITUDE LOG.ALTITUDE LOG.DISTANCE.FROM.SOURCE LOG.WIDTH LOG.DEPTH MEAN.SUBSTRATUM DISCHARGE.CATEGORY ALKALINITY LOG.ALKALINITY LOG.SLOPE MEAN.AIR.TEMP AIR.TEMP.RANGE SuitCode SuitText BelongsTo_endGrp TL2_WHPT_NTAXA_AbW_DistFam_spr TL2_WHPT_ASPT_AbW_DistFam_spr TL2_WHPT_NTAXA_AbW_DistFam_aut TL2_WHPT_ASPT_AbW_DistFam_aut WATERBODY YEAR SPR_SEASON_ID SPR_TL2_WHPT_ASPT (ABW,DISTFAM) SPR_TL2_WHPT_NTAXA (ABW,DISTFAM) SPR_NTAXA_BIAS

Page 7: Introduction files/RIVPACS-RICT... · Web viewMachine Learning Build Guide - GB Summer Single Year March 2020 V1 Contents: 1.Introduction3 2.Purpose of this document3 3.Experiment

SUM_SEASON_ID SUM_TL2_WHPT_ASPT (ABW,DISTFAM) SUM_TL2_WHPT_NTAXA (ABW,DISTFAM) SUM_NTAXA_BIAS AUT_SEASON_ID AUT_TL2_WHPT_ASPT (ABW,DISTFAM) AUT_TL2_WHPT_NTAXA (ABW,DISTFAM) AUT_NTAXA_BIAS

This should be linked to the left side output of box 4.

6. Execute R Code ScriptThis box is the R code for processing the classification.

This block is created by selecting R Language Modules > Execute R script and dragging on to the workspace.

Once the block is created, click the box and on the right hand panel in the R Script box, delete all the content and paste in the code from the file below

Random Seed box should be left empty

R Version should be selected as Microsoft R Open 3.4.4

Box 6 should be connected to box 9 and 10 as shown in the diagram above.

7. Classification Support FilesThese are the support files that are needed to process the R code classification script block.

This is a zip file which includes all the support files included.

The list of files to be included in the prediction support file zip is:

adjustParams_ntaxa_aspt.csv ClassificationfunctionsV2.R EndGrp_AssessScores.csv observed_aspt.csv observed_ntaxa.csv

The zip file required is embedded below

Page 8: Introduction files/RIVPACS-RICT... · Web viewMachine Learning Build Guide - GB Summer Single Year March 2020 V1 Contents: 1.Introduction3 2.Purpose of this document3 3.Experiment

Further details about the function of each support file within the zip is outlined in the technical specification.

The zip file is added by the same steps as option 2 above. The zip file is uploaded and then dragged on to the experiment.

This box should be connected to box 6 and box 10 as shown in the diagram above.

8. Convert to CSVThis box allows the user to download the prediction output of the experiment.

This is created by selecting Data Format Conversions > Convert to CSV

This block should then be connected to box 5.

9. Convert to CSVThis box allows the user to download the classification output of the experiment.

This is created by selecting Data Format Conversions > Convert to CSV

This block should then be connected to box 6.

10. Execute R Code ScriptThis box is the R code for processing the compare output which can then be used in the compare experiment.

This block is created by selecting R Language Modules > Execute R script and dragging on to the workspace.

Once the block is created, click the box and on the right hand panel in the R Script box, delete all the content and paste in the code from the file below

Random Seed box should be left empty

R Version should be selected as Microsoft R Open 3.4.4

Box 10 should be connected to box 6 and 7 as shown in the diagram above.