View
217
Download
0
Category
Preview:
Citation preview
Disentangling the Data
Organizing, graphing, and analyzing rotorcraft and flight weather data
by
Makenzie B. Allen
A senior thesis submitted to the faculty of
Brigham Young University – Idaho
in partial fulfillment of the requirements for the degree of
Bachelor of Science
Department of Physics and Astronomy
Brigham Young University – Idaho
Fall 2017
BRIGHAM YOUNG UNIVERSITY – IDAHO
DEPARTMENT APPROVAL
of a Senior Thesis submitted by
Makenzie B. Allen
This thesis has been reviewed by the research advisor, research committee, and the
department chair and has been found to be satisfactory.
___________________ _______________________________________________
Date Stephen Turcotte, Advisor
___________________ _______________________________________________
Date Jon Johnson, Committee Member
___________________ _______________________________________________
Date Evan Hansen, Senior Thesis Coordinator
___________________ _______________________________________________
Date Stephen McNeil, Department Chair
vii
ABSTRACT
Disentangling the Data
Organizing, graphing, and analyzing rotorcraft and flight weather data
Makenzie B. Allen
Department of Physics and Astronomy
Bachelor of Science
When an experiment is run, complex and large amounts of data are often
collected. This paper focuses on two projects relating to the organization and
processing of data sets. The first project focuses on data generated from the 2010
UH-60 rotorcraft wind tunnel tests. The use of documentation and Microsoft Excel
has allowed for categories for the data sets to be created. These categories will be
used to generate a catalogue or report with graphs of this data. The second project
focuses on the extraction and processing of weather data from Wunderground.
Python code was written to generate graphs of weather condition frequencies.
ix
ACKNOWLEDGEMENTS
A thanks to NASA Ames Research Center – Aeromechanics Branch for allowing
this research project to be performed: Dr. William Warmbrodt, Dr. Thomas
Norman, Michelle Dominguez, and the Summer 2017 Data Quality and Reporting
Team.
Also, a thanks to Stephen Turcotte, my thesis advisor, Evan Hansen, and my thesis
committee.
xi
Table of Contents ACKNOWLEDGEMENTS ...................................................................... ix
Table of Contents ....................................................................................... xi
List of Figures .......................................................................................... xiii
1 Introduction .......................................................................................... 1
2 UH-60A <Blackhawk> Wind Tunnel Test Data Categorization ............. 2 2.1 Introduction .............................................................................................. 2 2.2 Methods .................................................................................................... 3
2.2.1 The stored WT Data ........................................................................... 3 2.2.2 The Categorization and Organization of the WT Data using Microsoft
Excel. ................................................................................................. 6 2.3 Results ...................................................................................................... 9 2.4 Status ........................................................................................................ 9
3 Updating a Weather Database and Filling in the Holes – The Jet Fumes Project .......................................................................................... 11
3.1 Introduction ............................................................................................ 11 3.2 Methods .................................................................................................. 12
3.2.1 A computational approach for organizing, updating, and processing a (weather) database Data extraction and Web scraping ........................ 12
3.2.2 Calculating Frequencies .................................................................... 14 3.2.3 Changes to the Original Weather Program ......................................... 15
3.3 Results .................................................................................................... 15 3.4 Conclusion .............................................................................................. 15
4 Conclusion .......................................................................................... 16
5 References ........................................................................................... 18
xiii
List of Figures
Figure 2.1 – The UH-60 Blackhawk helicopter at NASA Ames Research Center. • Image credit: Personal Image.
Figure 2.2 – TheUH-60AAirloadsrotorinstalledontheLargeRotorTestApparatus
(LRTA)intheNFAC40-by-80-FootWindTunnelatNASAAmesResearchCenter.
• Image credit: NASA Ames Research Center, Aeromechanics Branch (Norman)
Figure 2.3 - Image of the inside of the RDMS database. The left-hand column
contains all possible channels to extract data from. The right-hand column contains those channels selected to extract data from.
• Imagecredit: NASA Ames Research Center, Aeromechanics Branch Figure 2.6 - A screenshot of an Excel table with the sorting feature pulled up.
• Image credit: Microsoft Excel Figure 2.5 - A Microsoft Excel table used to illustrate the manner in which the
Channels were organized. • Image credit: Microsoft Excel
Figure 2.6 - This is an example of the types of graph to be generated for the Wind
Tunnel Test Data. This particular graph was generated from the Flight Test data. It graphs a specific category of data points versus “Counters”. In the Wind Tunnel Data, the equivalent of “Counters” were “Points”.
• Image credit: NASA Ames Research Center, Aeromechanics Branch Figure 3.1 - A screenshot of a table of weather data collected in Wunderground.
• Image credit: Weather Underground (“Wunderground”, wunderground.com) Figure 3.2 - An example of a .csv-type file. The left-hand side shows a possible layout of
a .csv file. The right-hand side shows how a spreadsheet editor views this file type.
1
1 Introduction
The collection of data for scientific experiments often leads to a large
database with several different pieces and types of data. This data must both be
organized and processed in order for it to be used practically in the future.
This thesis delves into two projects that both required this data processing.
The first is the UH-60 rotorcraft wind tunnel test database. This involves the
usage of Microsoft Excel to categorize the data types in order to create a future
catalogue. The second is the creation of a weather database. This involves the
usage of computer programs to process the raw weather data for graphical
representations.
Each project had its own set of challenges to overcome and procedures to
undertake. These specific challenges and procedures are described within their
respective chapters.
The first step is to retrieve the data. For each project, prior data had been
analyzed to some extent. This data was known to be correct and was to be
replicated in the future. The newer data to be processed and analyzed was always
compared to the older data.
With computational methods, these comparisons are made possible.
Graphs are generated through programs such as python or Microsoft Excel. These
graphs help show the believed answers and the data is given in order to compare.
2
2 UH-60A <Blackhawk> Wind Tunnel Test Data
Categorization
2.1 Introduction
Plans for the testing of the UH-60A Blackhawk rotorcraft’s aerodynamics
were first envisioned in the early 1980s. This vision included testing the limits of this
rotorcraft while in flight as well as within a wind tunnel.
The flight tests (FT) were carried out in flight in 1993. These tests included
various sensors that would collect data. This included sensors upon the blades and
other assemblies. This data was collected and stored within a database. The FT data
was later categorized and graphed.
In 2010, the UH-60A’s rotor was removed from the rotorcraft and placed on
the Large Rotor Test Apparatus within the 40- by- 80- foot wind tunnel at the ARC
Figure 2.1 – The UH-60 Blackhawk helicopter at NASA Ames Research Center.
3
(See Figure 2.2). This data was stored within the Rotor Database Management
System (RDMS Database). These measurements were taken over a series of ‘Runs’
and consist of several ‘Pointers’, or data points in the runs. This data was stored for
future data processing and use. This project focuses on the organization and analysis
of the extracted wind tunnel test (WT) data.
2.2 Methods
2.2.1 The stored WT Data
The collected WT data was collected through Runs, Channels, and Points,
and stored in the Rotor Database Management System (RDMS Database). Each
‘Run’ represents each time a WT test occurred. This means that for every time the
WT was turned on, a new Run was created to store the data.
Figure2.2–TheUH-60AAirloadsrotorinstalledontheLargeRotorTestApparatus(LRTA)intheNFAC40-by-80-FootWindTunnelatNASAAmesResearchCenter.
4
To understand Channels and Points, imagine a camera that takes pictures of
two balls, red and blue, every few seconds. The position of the red ball would vary
with time and would not necessarily be the same as the blue ball’s position in the
same snapshot. Much like these two balls, each Run took snapshots of the
experiment and it progressed for different data sets and types. These snapshots can
be compared to Points. Each Point where data is collected is consistent amongst all
the data set. Channels, then, are comparable to the red and blue balls. The
Figure 2.3 - Image of the inside of the RDMS database. The left-hand column contains all possible channels to extract data from. The right-hand side are those channels selected to extract data from.
5
Channels are various sets of data being taken at a time. For example, one Channel
collected some data for a rotor-blade’s position while another measured the pressure
upon the blade. Each Channel contained a different type of data set or sensor (such
as flap angles, blade lead and lag, and blade pressures). Within the WT data, there
are 1399 different Channels collected.
The 1993 FT also collected data in a similar way with the Runs labeled
Flights, and the Points as Counters. The Channels category was unaffected in this
regard. The Channels collected in both the FT and WT test each had unique data
sets as well as sets that were the same.
After collecting raw data, it is important to process it and organize it so that it
is easier to understand. Previously, the 1993 Flight Tests data has been categorized
graphed. Comparing the Channels and categories from the FT data allowed the WT
data to be categorized. These graphs include more than one related Channel from
the categories. Similar categories needed to be generated for the WT data.
6
2.2.2 The Categorization and Organization of the WT Data using Microsoft Excel.
The RDMS database also includes descriptions of each Channel. These
descriptions include what kind of data the sensor collected and any modifications
that were performed upon this data. For example, some of the channels were time
shifted to zero azimuth (this shifted the data points for various blades to show as if at
the same location in time) or generated as a backup of another data set. Further
descriptions were collected from written copies of documentation from these tests.
Figure 2.4 - A screenshot of an Excel table with the sorting feature pulled up.
7
The channels and their descriptions were placed within a Microsoft Excel
spreadsheet table. This was chosen due to Excel’s ability to easily parse through
columns and select for specific categories. For each column in an Excel table, a filter
is applied to select for the desired data set. This allowed each Channel to be filtered
due to having the same descriptions in a column as others. When a category was
generated, the filters were used to select for only the Channels pertinent to this
category or only the blanks. These filters were applied to each column to select for
various description types. Only the selected categories can be seen on the screen in a
specific column. (See Figure 2.4.) With this capability, any Channels sharing the
same descriptions in a given column could be filtered into or out of a specific
grouping. This allows the channels to be categorized due to similarities in these
descriptions.
Initial categories for the data were drawn from the FT categories. These
included data sets similar to the FT data sets. Other categories were generated due to
similarities between data types. Some categories were borrowed from a
documentation catalogue that categorized many of the Channels within it as well as
provided descriptions for each one.
A catalogue of all the believed parameters was also compared and contrasted.
If any parameter or Channel did not have documentation, it was flagged for another
more familiar with the data sets to document. Collaboration with this individual
aided in determining relationships between various Channels.
A Microsoft Excel spreadsheet was generated to contain and categorize all the
data types from the WT tests. This program was chosen due to its ability to handle
8
large tables and sort data due to category types. Filters were also placed to highlight
any channels that may have been associated with more than one FT channel.
This spreadsheet contains filters and additional information about each
channel. Channel categories were determined through a comparison to the Flight
Test data as well as that of the instrumentation document.
A table containing all WT Channels was generated in Microsoft Excel. For
each Channel, various columns were added to aide in descriptions associated with it.
Two columns were added for the FT Channels associated with the WT Channels
and the FT category it originated from. A Column for the Channel’s category was
also added. The spreadsheet was periodically filtered (using the filter tool on an
Excel table) to determine which Channels had not yet been sorted. Any WT
Channels that had used an FT Channel previously assigned to another were also
flagged.
The numbers of categorized Channels for each category were counted
through Excel’s COUNTIF function. This function will look for a specific string or
number and count each instance of that string within a given range. This is a
Figure 2.5 - A Microsoft Excel table used to illustrate the manner in which the Channels were organized.
9
secondary check to see all Channels that have been categorized as well as a primary
check to see how many Channels are within each category.
2.3 Results
The WT Channels were categorized with four categories used from the FT.
Figure 2.5 shows a list of all categories created for the WT tests. The ‘Channel
Category’ column shows the category created for the data. The number of Channels
within each associated category is listed in the second column. A check to see
whether or not it was a category from the FT was included.
2.4 Status
Admittedly, the categories created and Channels sorted into them are not
without human error. Some of the categories may need to be divided up or merged
with others. Some Channels might have been placed in a category where it does not
fit. However, teamwork will help eliminate these errors and make it better for the
future. Talking with someone who knows the WT tests greatly helped the generation
of these categories, and more input might help catch errors. Future individuals who
work with this set of categories might find it more useful to create subcategories.
Teamwork is important to determine if an error is made and correct it upon its
course. Communication amongst team members is also important to understanding
this data.
The categories used for the FT data helped organize graphical representations
of the data (See Figure 2.6). These have proven useful in understanding the data
better. The UH-60A wind tunnel data will be used to generate similar graphs. These
graphs will be used to further understand the data.
10
Figure 2.6 - This is an example of the types of graph to be generated for the Wind Tunnel Test Data. This particular graph was generated from the Flight Test data. This graphs a specific category of data points versus “Counters”. In the Wind Tunnel Data, the equivalent of “Counters” were “Points”.
11
3 Updating a Weather Database and Filling in the Holes –
The Jet Fumes Project
3.1 Introduction
Since 2008, reports of noxious jet fume odors have been reported from
various buildings at NASA Ames Research Center (ARC). These fumes are viewed
as a possible heath hazard. A weather database was compiled containing weather
information from 1945-2014 in an effort to understand how weather patterns affect
these complaints. This database is useful in determining various weather condition
and wind directions frequencies over the course of a day when compared with the
timing of complaints.
The weather data for 2015 – 2016 needed to be added to this database. The
original weather database was extracted from Wunderground, an online open-source
weather database. However, the code used to extract this raw data had not been
saved. Only the processed data files and program designed to graph the files
Figure 3.1 - A screenshot of a table of weather data collected in Wunderground.
12
remained. The following project explores the extraction and processing of this raw
weather data.
3.2 Methods
Three main programs were created to extract, process, and graphically
represent the Wunderground data. The extraction process takes the raw data from a
Wunderground table (See Figure 3.1) and converts it into .csv file format. The
second step takes this raw data and calculates the averages and frequencies for
weather conditions of each year, each month, times of day, and all years. The final
program takes this processed data and graphs it.
3.2.1 A computational approach for organizing, updating, and processing a
(weather) database Data extraction and Web scraping
The first step required is to retrieve the raw weather data from
Wunderground. The original raw data files had been extracted from Wunderground
as comma-delimited (.csv) files. A .csv file is a file where each data-set/type is
separated by commas and rows. Each row is similar to a row on a table while the
comma-separated items are filled in as columns. (See Figure 3.2.) The code for
extracting and processing these .csv files was not present.
A python script designed to web scrape Wunderground was created to
generate these .csv files. Web scraping is the process of extracting data from a
website. In python, this is achieved through the use of the urllib2 library. Use
urllib2.urlopen(“<url>”), where “<url>” is a string containing the web address. This
returns the webpage in an html format.
13
The BeautifulSoup (also known as bs4, depending on the python version)
python library is designed to easily parse through html-type data. This program
generates a “soup” through the command BeautifulSoup(<html-file/data>, ‘lxml’).
The ‘lxml’ portion is an html parser designed to be faster than the html.parser in
python. This soup generates a string without any newline characters. The soup is
displayed to the screen in a more readable format if subjected to the ".prettify()"
command. This command only allows a print out of the html data to appear for the
user to view it. (In other words, the prettify() command adds a newline character so
it is divided into new lines for each html tag.)
This prettify-ed data was looked at to see where the information that was to
be extracted was. Code was generated to parse the soup-string to extract the .csv-like
data. This was then read into a .csv document using the .csv library in python.
These .csv files were stored within a folder containing all raw data. Within
this directory, subdirectories were created for each year, and within each year, for
each month. The program checks to see if these directories exist through using the
os.walk(<main_directory_path>) command. Os.mkdir(<filePath>), where
Figure 3.2 - An example of a .csv-type file. The left-hand side shows a possible layout of a .csv file. The right-hand side shows how a spreadsheet editor views this file type.
14
<filePath> is the name of the new file, was used to generate a new directory if the
desired directory for the year or month did not exist.
3.2.2 Calculating Frequencies
A separate code was created to determine the frequencies of various wind
directions and weather conditions over specific months of the year as well as during
certain hours of the day. A useful library in python, called pandas, allows the user to
interact with the .csv files in a manner similar to a spreadsheet table through creating
a ‘dataframe’. In a pandas dataframe, the rows and columns are indexed by either
number or a given name. The user can then fill this table with pertinent values. The
.add() command to a pandas dataframe will add values in each identically named
row and column to one another. However, if the location where the item is being
added does not contain a value, such as the value is null, the addition is a little more
complicated to replace the null values.
Os.walk was again used to create folders to store the frequency data. This
allowed the processed files to be written into a database storage based by frequency
type.
The generated graphs for past years were compared to those generated for
those years previously. When they finally matched, the code was determined to
work correctly, and the program added the information for 2016 into it.
As a side note, a change was also made for the program to peek at the
computer’s internal clock and check to see what the month before the current month
is. It then asks the user if they wish to extract the data up to that month as well as
15
analyze it up that month. The weather program will read as much data as has
already been processed and produce the graphs accordingly.
The first challenge to the study of the Weather Database was to understand
what format the raw data had originally been extracted as. Investigation showed
these files were comma-delimited (.csv). These files had originally been obtained
from the online open-source website Wunderground.
3.2.3 Changes to the Original Weather Program
The original weather program only graphed weather data for the years 1945-
2014. This program was slightly modified to allow the computer to check for the last
directory within the database. The program will allow the user to graph these years
without needing to be manipulated by a programmer.
3.3 Results
The associated .csv files were generated in order to show the graphs of the
weather data. These graphs are currently unavailable for viewing. This program
extracts all the data for all months up to the current one, will process this data for
frequencies of various weather conditions, and generate graphs to see the data.
3.4 Conclusion
Computer programming skills were vital to updating and understanding this
database. Graphs generated through this database allow the individuals studying it to
have a greater awareness of what weather trends happen around them at specific times of
year, months, and days.
16
This weather data has proven useful for correlating wind direction, wind
speeds, and times of day with jet fume complaints. With over seventy years worth of
data, this database data may further be used to study the impacts of global warming
in Moffett Field, California.
4 Conclusion
Collecting, storing, and preparing data for future use is important within the
field of physics. These databases allow individuals to develop an understanding of
the information collected and further helps in the investigations of fields. However,
this data can be unwieldy to study and understand. In order to process the data, it
must be first stored in a manner that reflects its usage. This data needs to be sorted
into categories that allow for it to be utilized in the future. Computer programs
allow for this data to be stored and sorted in timely manners.
Sorting and categorizing data helps data be grouped in a way that allows users
to find and understand the pertinent data. Processing this data helps the users to
further understand through a graphical representation and also allows for quicker
data analysis.
Even with the aid of computers, human error is present in processing and
sorting data. Each project required teamwork to minimize these types of error.
This teamwork included consulting those who were more knowledgeable
about various parts of the project. For example, in the UH-60 project, the categories
were created based off of information given by those who worked on the original
flight test data or were around for the wind tunnel data collection. This process
17
allowed the data to be sorted more easily than it would have without anyone else’s
input. For the Wunderground project, the process of teamwork also helped. This
involved giving the correct data files to the one. Versions of earlier graphs were also
given. Direction was given through someone who understood the website better and
found a portion that was an easier access point than the one that was being used.
This individual helped with understanding what a .csv file was and allowed for
allowed freedom to figure out how to create the required .csv files for the program to
run.
18
5 References
Allen, M. (2017) Unpublished Manuscript. Untangling the Data: The Jet Fumes
Project — The Wunderground Database.
Davis, S. J. (1981, March). Predesign Study For a Modern 4-Bladed Rotor for The
RSRA. National Aeronautics and Space Administration.
Norman, T. R., Shinoda, P., Peterson, R. L., & Datta, A. (2011, May). Full-Scale
Wind Tunnel Test of the UH-60A Airloads Rotor.
UH-60A AIRLOADS WIND TUNNEL TEST SUMMARY. (n.d.). Retrieved from
Aeromechanics - NASA Ames Research Center:
https://rotorcraft.arc.nasa.gov/Research/Programs/uh_60_test_summary.ht
ml
Weather Underground. (n.d.). Retrieved from Weather Undergorund:
www.wunderground.com
Recommended