NSF ADBC Digitization TCN-TTD Plants, Herbivores, and Parasitoids A Model System for the study of Tri-Trophic Associations Ten months later… presentation

Embed Size (px)

Citation preview

  • Slide 1

NSF ADBC Digitization TCN-TTD Plants, Herbivores, and Parasitoids A Model System for the study of Tri-Trophic Associations Ten months later presentation by Randall Schuh, American Museum of Natural History Rob Naczi, New York Botanical Garden Christiane Weirauch, University of California Riverside Katja Seltmann, American Museum of Natural History, http://tcn.amnh.org Slide 2 The Tri-Trophic Approach Capturing Data for the Nearctic Biota 85% of 11,000 Hemiptera from the Nearctic are herbivorous with high host specificity Bias in plant groups attacked, e.g.,, Pinaceae, Poaceae, Asteraceae, Chenopodiaceae, Rosaceae Some serious agricultural pests (armored scales, mealy bugs, potato leafhoppers, Lygus bugs) Vectors of viral and bacterial diseases (green peach aphid is a vector of over 100 plant viruses) Parasitic Hymenoptera are beneficial as biological control agents Slide 3 MICH MO NYBG EMC WIS MIN KANU ISC COLO MAINE MU TEX ILL ILLS Botanical Institutions Slide 4 MICH MO NYBG EMC WIS MIN KANU ISC COLO MAINE MU TEX ILL ILLS SEINET CCH CPNH Botanical Institutions Botanical Data Providers Slide 5 MICH MO NYBG EMC WIS MIN KANU ISC COLO MAINE MU TEX ILL ILLS SEINET CCH CPNH AMNH CDFA UCRC CAS BPBM MEM CMNH INHS CUIC CSUC TAMU OSAC NCSU SEMC UDCC EMEC UMEC UKIC Botanical Institutions Botanical Data Providers Entomological Collections Slide 6 Project management Steering Committee of 10 PIs + Project Manager Decision-making on overall project goals, directions, and progress Full-time Project Manager at AMNH (Katja Seltmann) Day-to-day project management, technical capability, data analysis, training of entomology partners, vetting and upload of authority files, centralized georeferencing Full-time Project Coordinator at NYBG (Kim Watson) Training of botany partners, barcoding of NYBG specimens, and label-data capture for all partner institutions Slide 7 Entomological Databasing Slide 8 Streamlined Interface for Rapid Data Entry Taxon names Locality data Collection Events Specimen Data Host names Slide 9 Database Attributes Web enabled Open-source software Centralized data storage, backup, and management Database Benefits Single-product management Simplified user training Centralized authority-file management Centralized georeferencing Data aggregation shifted to HUB and DiscoverLife.org Slide 10 Authority Files Botanical Tropicos database used across entire project Entomological Published catalogs and unpublished lists from specialists Objectives Present uniform up-to-date taxonomy Reduce decision making by data-entry personnel Limit entry of new names by data-entry personnel Slide 11 Data Aggregation and Dissemination ------------------------ leveraging DiscoverLife.org Slide 12 Slide 13 Slide 14 Approaches to Outreach AMNH Short Course in Collection Databasing Fundamentals Train graduate-students through participant-support funding Involve students from multiple graduate programs Provide fundamentals, including database options, data structures, unique specimen identification, specimen handling, georeferencing, research tools, data dissemination Undergraduate Research Projects REU projects joining project data to student research involvement Community Outreach http://research.amnh.org/pbi/heteropteraspeciespage/ Slide 15 Rob Naczi New York Botanical Garden Slide 16 Botanical Specimen Imaging Slide 17 Insect Specimen Imaging Image representative specimens for each species Use existing imaging stations at partner institutions About 30% of Hemiptera are already imaged Expect to produce about 20,000 new images Slide 18 Use of OCR for Populating Botanical Records Workflow jpgs of specimen sheets batch-cropped to labels labels saved as new set of jpgs, then exported to ABBYY Fine Reader 11 Corporate Edition overnight, labels batch-processed through ABBYY each OCR output file saved as individual text file tied to barcode no. individual text files merged into Excel spreadsheet, in which data can be searched, grouped, and parsed parsed fields pushed to database Challenges increasing accuracy of parsing hand-written labels (now experimenting with out-sourcing) Slide 19 Data Storage Issues Botany botanical images are valuable products of our digitization efforts, but also challenges, due to storage demands our concern is with long-term storage (archiving) of uncompressed, original images have encouraged home institutions of our partners to step up, but some unable/unwilling our solution for now is storage on portable drives, but this is tenuous fix and not reliable enough for truly archival storage Entomology no major issues Slide 20 Christiane Weirauch University of California Riverside Slide 21 Subcontract Management Setup 7 collaborating institutions, 27 subawards Benefit: long-term data capture across >30 institutions Issues 1) Delays: administrative and accounting issues 2) Database selection: which one to use? 3) Training: onsite versus remote training? 4) Tracking productivity of subawards not using PBI database Solutions/suggestions 1) Streamlined administrative and accounting procedures 2) Encourage use of a default database; more discussion 3) Combination of onsite and remote training and monitoring 4) Regular contact with subawards Slide 22 Unique Specimen Identifiers (USIs) AMNH Matrix-code labels Setup: Matrix codes (barcode scanner) and string of prefix and 8-digit number (human eye) encode the same unique identifier Benefit: Tracking of specimens; connect images to records Format: Prefix (8 characters): acronym and identifier: e.g., UCRC_ENT XXXXXXXX Non-standard USIs: accepted in the database Exceptions: collections that were previously databased without USIs (e.g., Aphidoidea, certain mirid taxa) Slide 23 Collection Staging Organizing, sorting, and identifying specimens in preparation for databasing Importance: highest identification level and accuracy will yield most useful data for future applications Priority: well-curated and well-identified collections TTD: limited budget for staging by experts; very successful for, e.g., Miridae and Membracidae Issue: routine staging more time-consuming than anticipated Possible solution: budget for graduate students or post docs to help with staging (and training/supervision of databasing crew) Slide 24 Tri-trophic concept: Hemiptera, plants, parasitoids Capture of host data New TTD records: 26% with host records (compared to 24% previously databased); added >800 new hosts Challenges of integrating parasitoid data Level of identification of parasitoids (undescribed species; accurate identification requires skilled personnel) Level of host identification (e.g., white fly) Incorporation of host information from secondary sources (e.g., taxonomic literature)? On the right track; prioritize specimens with quality host records & integrate secondary host information Slide 25 Katja Seltmann The American Museum of Natural History Slide 26 Efficiency of Data Capture: Insects Total as of October 17, 2012 = 198,409 Includes Illinois, Texas, and Kansas All 20 subcontracts are digitizing now 53 contributors for ttd-tcn project Numbers from NHCR database (central database at AMNH 11 subcontracts) $20,000 in equipment costs Specimens per min average: 3-3.5min/specimen (range 1.2-6) Cost per specimen: $.93 (includes equipment) Peak in July (more hours digitizing) 65 collecting events on Christmas Day Slide 27 Efficiency of Data Capture: Plants All but three institutions up and running As of October 9, 2012 have 102,651 images 3 of 15 institutions not yet begun 4 plant collections report: $30482.51 equipment costs $.73 cents a specimen image The unmentioned curator volunteerism 4-8 hrs/week depending on institution/taxon ~19 hours a week total Slide 28 Training Methods: Insects (NHCR Database) Curators also training (sexing specimens, database) Online training via Skype Digitizers clubhouse (building community) Online manuals Online videos Remote training Using central db can access quality of data Flag when new name is entered Flag when more than 10 specimens entered in one min by one person Flag when exact duplicate collecting events or localities (check training) Slide 29 Training Methods: Plants Site visits to subcontract institutions Kim Watson, Melissa Tulig Install imaging equipment Personal involvement Slide 30 Quality Assessment of Transformed Records (NHCR) Determination Completeness Note Language (A,B,B) ; (A,A,A) ; (A,C,B) Slide 31 Present total:1487 9134 Canada 14 96 USA 1441 8564 Mexico 32 474 Georeferencing: NHCR database 130,000 specimen records Slide 32 Georeferencing: NHCR database GEOLocate (North America) Discover Life validation Centralized and controlled georeferencing (NYBG, AMNH) Volunteer georeferencing Slide 33 Difficult data Issues: specimen relationships Slide 34 Difficult data Issues: means for curation? Slide 35 Summary and Predictions: over 50,000 locality records from NHCR will reach 1 million new specimen records for insects (harder to predict for plants at the moment) less than $1 a specimen (inclusive) Arthropod (NHCR) data concerns will become more central as other groups come online Slide 36 Thanks to National Science Foundation co-PIs and collaborators http://tcn.amnh.org