24
CPCC CCT Program Distributed Processing I Block1 Wrapping Your Nugget Around Distributed Processing

Block1 Wrapping Your Nugget Around Distributed Processing

Embed Size (px)

Citation preview

CCT 8501 Distributed Processing I

CPCC CCT ProgramDistributed Processing IBlock1

Wrapping Your Nugget Around Distributed ProcessingCPCC CCTIntroductions

Course paperwork - payment

Plan for this module

Credit by exam for CCT 242What is Distributed Prcoessing?According to Access Datas Configuration GuideYou need to get this guide and read ithttp://accessdata.com/downloads/media/Configuring%20Distributed%20Processing%20with%20FTK%203.pdfDistributed processing is a functionality that exists within the FTK 3 application that allows users to create a distributed processing cluster with up to four total nodes (workers) 1 local and 3 distributed. These additional processing nodes (workers) will function together in a cluster to increase productivity and decrease overall processing time

Why do we need it ?Drives are getting huge1-2 TB drives are becoming common3 TB drives availableMedia driven computing society

Artifact counts are very highOver 1 million artifacts is very common

Processing Time is increasing rapidlySome more reasons whyTo counter processing time increase we spend more on stand alone systems

That still take too long

That still become unstable when resource limits are reached.

Keeps gear tied upHow Distributed HelpsSimple as Little Gun versus Big Gun

How Distributed HelpsFaster

More Stable

More efficient

Allow a hardware migration cycle

Real World ResultsWe have a small case (Sluix)4 hours = FRED94 Minutes = Barney44 Minutes = DP

Real World ResultsWe have a medium size case (Testforsafeboot.eo1)FRED = 20 hoursBarney = 7 hours 53 minutesDP = 1 hour 13 minute

Real World ResultsWe have a large size case (HTI-001, Test.001)FRED = CRASH at 75-100 hours repeatableBarney = 28 hoursDP = 3 hours

Terms and DefinitionsThere is a lot to know before you do this

Terminology

Requirements

Technical skills

Well go over eachTermsHeader (Examiner machine)Workers (Helper machines)Oracle (where the database is storedCould be on the examiner machine or separateEvidence share (also called Imaging Server at CPCC)Case share (could be implemented numerous ways)Well discuss all in detail laterWhat it looks like

A Little Better

Terminology - How DP worksHow does distributed processing work?

Evidence processing tasks assigned to the engine by the user are called Jobs.

The FTK application submits the job to the processing engine.

Each job is divided into small packets called Work Units.

Each work unit is handed to a service called ADProcessor.exe (and ADIndexer, if youve chosen to index), which actually does the work. Terminology How DP worksThere are two components in distributed processing: 1. Processing Manager: The Processing Manager embedded in FTK manages Jobs and Work distribution. It also handles status updates and Job progress.

2. Processing Engine: The Processing Engine manages the processing resources of a particular computer/node.

Every machine that participates in a processing cluster runs a Processing Engine.

It decides how many jobs can be concurrently processed by that node.

The processing engine also manages the ADProcessor.exe/ADIndexer.exe that actually do the processing work. Examiner Hardware RequirementsBased on the CPCC recommended ring (more later)Examiner Machine (should be your most powerful machineSee http://accessdata.com/downloads/media/FTK_3x_System_Specifications_Guide.pdfCPCC recommends8 Core or better processor, 12 GB RAM, 64 Bit Win 7, 80 GB or more SSD for OS1 GBps NICOracle Hardware RequirementsBased on the CPCC recommended ring (more later)Oracle Machine (should be a powerful machineSee http://accessdata.com/downloads/media/FTK_3x_System_Specifications_Guide.pdfCPCC recommends8 Core or better processor, 12 GB RAM, 64 Bit Win 7, 80 GB or more SSD for OS160GB SSD or RAID 0 config with spinning drives (4 7200 RPM Vraptors min) NO RAID 51 GBps NIC

WorkersBased on the CPCC recommended ring (more later)Worker Machine (can be a little less powerful machine - if you dont have good ones add what you have as long as it meets minimumsSee http://accessdata.com/downloads/media/FTK_3x_System_Specifications_Guide.pdf

CPCC recommends8 Core or better processor, 8 GB RAM, 64 Bit Win 7, Vraptor or better drive1 GBps NIC

Other requirements.NET 3.5 Service Pack 1 (on the Application ISO, or if connected to the Internet, will attempt to download)

Windows 2008 R2 requires that you manually install 3.5sp1 using the "Roles and Features" tool.

AccessData Processing Engine installation executable

The Evidence Processing Engine (Regular FTK) IS NOT TO BE INSTALLED in distributed mode on the FTK examiner machine. Additional ConsiderationsAccess Data says

The machines that store the evidence and case folder become a bottleneck.

Processing evidence is very disk IO intensive. As a result evidence should be stored on fast drives. With many large machines in a processing group, it is possible that the file sharing service in Windows will run out of kernel memory and fail to provide the evidence data across the network.

If you use the CPCC ring, you will be fine more laterAdditional ConsiderationsAccess Data says

The machine that runs the Processing Manager may become a bottleneck during the discovery phase. Discovery is the process of enumerating all the actual files in a piece of evidence. Information about these files is stored in the database and the Distributed Processing Engines work on processing them. This discovery phase always runs in the Processing Engine located on the same machine as the Processing Manager. Since it produces much of the work that other Processing Engines work on, it needs to be one of the fastest machines (CPU speed) in the processing group.

If you use the CPCC ring, you will be fine more laterAdditional ConsiderationsAccess Data says

Distributed processing produces a lot of network traffic. There is control traffic between the engine components, but primarily the network is used to read evidence and write results to the case folder and database. It is very easy to saturate a gigabit network for extended periods of time while processing a large image. Please use the fastest network technologies available to you, at a minimum 100 Mb switched. NO use 1Gps only !!!!!!!We strongly recommend that the Case folder and image location are on separate drives. AND on a separate machine from examiner more later

If you use the CPCC ring, you will be fine more laterLab ConsiderationsFor maximum throughput we will disable a lot of security stuffFirewalls, A/VPermissions and shares will be VERY openTHE LAB must be isolated from the internet and any corporate lansVlan separation may be ok depending on details