1
1447 Whitewood Ct Sai Charan Paluru +1 984-255-4669 San Jose, CA – 95131 http://www.linkedin.com/in/saicharanpaluru [email protected] Actively seeking full time positions in Computer Architecture & ASIC Design/Verification EMPLOYMENT Mobile Systems Architecture Intern Micron Technology Inc. May 2016 - Present Involved in the research of SOC generated hard silicon traces to analyze memory access patterns in order to optimize future DRAM and non-DRAM memory architectures Involved in extensive analysis of LPDDR4 traffic in order to identify DRAM page architectural enhancements System Architecture – Hardware Engineer Nvidia Inc. July 2014 - July 2015 Involved in the C-Model design of two co-processors namely the Boot and Power Management processor and the Safety and Camera co-processor which are to be featured in future Tegra SOCs Primary language : C++ Involved in the architecture review and implementation of the normal and packet mode based I2C device C-Model which is scheduled to be featured in future Tegra SOCs Involved in the verification of the Clocking and Reset C-Model infrastructure on a Nvidia proprietary full chip simulator Implemented the DMA and Address Space Translation engine C-models in both R5 clusters on a future Tegra SOC System Architecture Analyst, Intern Nvidia Inc. July 2013-December 2013 Was involved in the C-Model design of the AMBA APB Bridge for a next generation SOC which required a good working knowledge of the APB Protocol and C++ Was instrumental in the performance studies of an existing chip in order to help reduce bottlenecks in future chips. This involved measuring various parameters like memory B/W, FPS and latency and comparing them with existing benchmarks Was actively involved in the redesign of the wiring system between the graphics host and its clients for a next generation SOC. This involved a proper understanding of a proprietary hardware debug tool which made use of the observation bus LANGUAGES AND TECHNOLOGIES Programming Languages : C , C++, Python , Perl, Bash scripting, OpenMP, CUDA Simulators worked on : GPGPU Sim, 721 Sim(In house cycle accurate superscalar simulator) Hardware Descriptive Languages / Verification Languages: Verilog, System Verilog Design Softwares : PSpice, Matlab, Cadence Virtuosa, Lab View, Kiel UVision5, Modelsim , Synopsys Design Vision Assembly Languages : LC3 assembly language, MASM 611 – 8086 assembly language EDUCATION current/past Raleigh, North Carolina North Carolina State University GPA : 4.0/4.0 August2015-present MS in Computer Engineering Specializing in Microarchitecture and ASIC Design/Verification with a focus on GPU and CPU architecture Graduate coursework : ASIC Verification, Digital ASIC Design, Advanced Microarchitecture, GPGPU Architecture, Architecture of Parallel Computers, Computer Design Technology Projects LC3 CPU ASIC Verification : Verified the data and control path of a pipelined 5 stage LC- 3 microcontroller with a comprehensive instruction set. Optimum functional coverage was achieved by Constrained Random Testing Bellman Ford HW Accelerator: Built a Verilog based ASIC hardware accelerator for implementing the Bellman Ford algorithm on a given set of inputs of sources and destinations and a given graph Superscalar Processor Renamer : Plugged a renamer class into the guts of an in house cycle accurate superscalar simulator (721 Sim) and modelled industry standard structures such as the Architectural map table, Rename map table, Free list, Active list and Shadow maps Trace Processor : Involved in modelling a trace processor on top of an existing in house cycle accurate superscalar simulator (721 Sim) using a trace cache and a trace predictor for the front end and a distributed execution paradigm in the back end GPGPU Thread Block Scheduling : Implemented alternate thread block scheduling policies like Lazy CTA and Block CTA on the GPGPU Sim Dynamic Instruction Scheduling in an Out of Order Superscalar processor: Modelled a C++ functional simulator for an out-of-order superscalar processor that fetches and issues N instructions per cycle. Perfect branch prediction and perfect caches were assumed Generic Cache Simulator: Implemented a generic C++ cache simulator in which uses a Write Back Write Allocate with an LRU replacement policy and can implement any level of memory hierarchy and validated model with existing traces and validation runs Dynamic Branch Prediction : Modelled a C++ branch prediction simulator for different types of predictors like gshare, bimodal and hybrid predictors and used instruction traces with actual outcomes to study their effects on misprediction rates of the various predictors Multiprocessor Cache Coherence : Modelled a shared multiprocessor simulator using a generic C++ cache class and implemented cache coherence protocols like MESI, MSI and Dragon in the multiprocessor memory system Zuarinagar , Goa, India Birla Institute Of Technology And Science, Pilani, K K Birla Goa campus GPA 3.8/4.0 2010-2014 BS in Electronics and Instrumentation : Coursework : Embedded System Design, Analog and Digital VLSI Design, Digital Electronics & Computer Organization, Microelectronic Circuits, Signals & Systems, Microprocessors Programming & Interfacing

Resume_Sai Charan Paluru_NC State

Embed Size (px)

Citation preview

1447WhitewoodCtSaiCharanPaluru�+1984-255-4669

SanJose,CA–95131http://www.linkedin.com/in/saicharanpaluru�spaluru@ncsu.eduActivelyseekingfulltimepositionsinComputerArchitecture&ASICDesign/Verification

EMPLOYMENTMobileSystemsArchitectureIntern MicronTechnologyInc. May2016-Present

• InvolvedintheresearchofSOCgeneratedhardsilicontracestoanalyzememoryaccesspatternsinordertooptimizefutureDRAMandnon-DRAMmemoryarchitectures

• InvolvedinextensiveanalysisofLPDDR4trafficinordertoidentifyDRAMpagearchitecturalenhancementsSystemArchitecture–HardwareEngineer NvidiaInc. July2014-July2015

• InvolvedintheC-Modeldesignoftwoco-processorsnamelytheBootandPowerManagementprocessorandtheSafetyandCameraco-processorwhicharetobefeaturedinfutureTegraSOCs

• Primarylanguage:C++• InvolvedinthearchitecturereviewandimplementationofthenormalandpacketmodebasedI2CdeviceC-Model

whichisscheduledtobefeaturedinfutureTegraSOCs• InvolvedintheverificationoftheClockingandResetC-ModelinfrastructureonaNvidiaproprietaryfullchipsimulator• ImplementedtheDMAandAddressSpaceTranslationengineC-modelsinbothR5clustersonafutureTegraSOC

SystemArchitectureAnalyst,Intern NvidiaInc. July2013-December2013

• WasinvolvedintheC-ModeldesignoftheAMBAAPBBridgeforanextgenerationSOCwhichrequiredagoodworkingknowledgeoftheAPBProtocolandC++

• Wasinstrumentalintheperformancestudiesofanexistingchipinordertohelpreducebottlenecksinfuturechips.ThisinvolvedmeasuringvariousparameterslikememoryB/W,FPSandlatencyandcomparingthemwithexistingbenchmarks

• WasactivelyinvolvedintheredesignofthewiringsystembetweenthegraphicshostanditsclientsforanextgenerationSOC.Thisinvolvedaproperunderstandingofaproprietaryhardwaredebugtoolwhichmadeuseoftheobservationbus

LANGUAGESANDTECHNOLOGIES

• ProgrammingLanguages:C,C++,Python,Perl,Bashscripting,OpenMP,CUDA• Simulatorsworkedon:GPGPUSim,721Sim(Inhousecycleaccuratesuperscalarsimulator)• HardwareDescriptiveLanguages/VerificationLanguages:Verilog,SystemVerilog• DesignSoftwares:PSpice,Matlab,CadenceVirtuosa,LabView,KielUVision5,Modelsim,SynopsysDesignVision• AssemblyLanguages:LC3assemblylanguage,MASM611–8086assemblylanguage

EDUCATIONcurrent/pastRaleigh,NorthCarolinaNorthCarolinaStateUniversityGPA:4.0/4.0August2015-presentMSinComputerEngineering

• SpecializinginMicroarchitectureandASICDesign/VerificationwithafocusonGPUandCPUarchitecture• Graduatecoursework:ASICVerification,DigitalASICDesign,AdvancedMicroarchitecture,GPGPUArchitecture,

ArchitectureofParallelComputers,ComputerDesignTechnologyProjects

• LC3CPUASICVerification:Verifiedthedataandcontrolpathofapipelined5stageLC-3microcontrollerwithacomprehensiveinstructionset.OptimumfunctionalcoveragewasachievedbyConstrainedRandomTesting

• BellmanFordHWAccelerator:BuiltaVerilogbasedASIChardwareacceleratorforimplementingtheBellmanFordalgorithmonagivensetofinputsofsourcesanddestinationsandagivengraph

• SuperscalarProcessorRenamer:Pluggedarenamerclassintothegutsofaninhousecycleaccuratesuperscalarsimulator(721Sim)andmodelledindustrystandardstructuressuchastheArchitecturalmaptable,Renamemaptable,Freelist,ActivelistandShadowmaps

• TraceProcessor:Involvedinmodellingatraceprocessorontopofanexistinginhousecycleaccuratesuperscalarsimulator(721Sim)usingatracecacheandatracepredictorforthefrontendandadistributedexecutionparadigminthebackend

• GPGPUThreadBlockScheduling:ImplementedalternatethreadblockschedulingpolicieslikeLazyCTAandBlockCTAontheGPGPUSim

• DynamicInstructionSchedulinginanOutofOrderSuperscalarprocessor:ModelledaC++functionalsimulatorforanout-of-ordersuperscalarprocessorthatfetchesandissuesNinstructionspercycle.Perfectbranchpredictionandperfectcacheswereassumed

• GenericCacheSimulator:ImplementedagenericC++cachesimulatorinwhichusesaWriteBackWriteAllocatewithanLRUreplacementpolicyandcanimplementanylevelofmemoryhierarchyandvalidatedmodelwithexistingtracesandvalidationruns

• DynamicBranchPrediction:ModelledaC++branchpredictionsimulatorfordifferenttypesofpredictorslikegshare,bimodalandhybridpredictorsandusedinstructiontraceswithactualoutcomestostudytheireffectsonmispredictionratesofthevariouspredictors

• MultiprocessorCacheCoherence:ModelledasharedmultiprocessorsimulatorusingagenericC++cacheclassandimplementedcachecoherenceprotocolslikeMESI,MSIandDragoninthemultiprocessormemorysystem

Zuarinagar,Goa,IndiaBirlaInstituteOfTechnologyAndScience,Pilani,KKBirlaGoacampusGPA3.8/4.02010-2014BSinElectronicsandInstrumentation:Coursework:EmbeddedSystemDesign,AnalogandDigitalVLSIDesign,DigitalElectronics&ComputerOrganization,MicroelectronicCircuits,Signals&Systems,MicroprocessorsProgramming&Interfacing