28
FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

Embed Size (px)

Citation preview

Page 1: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

1

FTP versus HTTPS in EOSDIS Data Access

WGISS 40 – September 30, 2015

Andrew Mitchell

Page 2: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

2

Agenda• User Registration System – URS

– Earthdata Login

• Requiring Registration for Data Access at EOSDIS – FTP/HTTP Comparison

• URS Guidance and Policy

• FTP retirement at Data Centers– Lessons Learned

• Backup: File Transfer Protocol (FTP/HTTP) – Engineering Perspective– Performance Study

Page 3: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

3

NASA USER REGISTRATION – EARTHDATA LOGIN

Page 4: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

4

Earthdata Login

Page 5: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

5

Capturing User’s Area of Interest

Page 6: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

Study Areas & Application DomainsNASA - Primary study area* ESA - Primary Application

Domain*Air sea interactionAtmospheric aerosolsBiological OceanographyCloudsCryospheric studiesGeophysicsGlobal biosphereHuman dimensions of global changeHydrologic cycle Land processesPhysical OceanographyPolar processesRadiation budgetSea iceTroposheric chemistry Upper atmospheric composition Upper atmospheric dynamics Other

AtmosphereSea-IceGeodesyGeologyHazardsHydrologyIceLand Environment MethodsOceanographyRenewable ResourcesTopographic MappingOtherCalibration/ValidationCostal Zones

6

Page 7: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

7

Federated User Identity Study• Performing a study of other (non

OAuth2) Single Sign -On technologies that will allow Earthdata Login to become interoperable with user registration systems from other systems and agencies.

Page 8: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

Architecture

LDAP storeLDAP store

LDAP proxy (via LDAP store)LDAP proxy (via LDAP store)

HTTP-accessible RESTish API

HTTP-accessible RESTish API

FTP clientsFTP clients

HTTP clientsHTTP clients

Web-based user maintenance

Web-based user maintenance

Page 9: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

9

REQUIRING REGISTRATION FOR DATA ACCESS AT EOSDIS

FTP and HTTP comparison

Page 10: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

Impact of requiring authentication with FTP at DAACs

Advantages Disadvantages

Minimal impact to existing users Multiple flavors deployed at the data centers (5 different ftp servers)

Minimal impact to data centers No direct support for LDAP authentication on some of the flavors.

No changes to firewall rules or similar configuration

Not authenticated securely: some flavors unable to support secure authentication.

*Direct support for anonymous access

Prohibited at LP DAAC due to DoI regulations

Maturity of capability / protocol Does not integrate well with REST API for support of OpenID or OGC

10

Page 11: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

Impact of requiring authentication with HTTP at DAACs

Advantages Disadvantages

Comprehensive support from the user community: protocol is well established and mature, all data centers use the same http server (apache)

End user scripts will have to change, as will manual access to the files they access

Modules can be applied to support many extensions and metrics gathering unavailable to certain ftpds

Data center configurations will have to change (on the firewall and the apache server)

Easily accommodates a REST API and provides well established LDAP modules for simple configuration and integration

DAACs custom code will have to change

Permitted as a transfer protocol by the DoI

Data Center customizations and extensions will need to be modified

Supports a secure authentication mechanism (https)

11

Page 12: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

12

URS GUIDANCE & POLICY

Page 13: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

13

Guidance for EOSDIS DAACs, Subsystems And ApplicationsPurpose: To provide guidance and clarify the integration requirements for the URS into EOSDIS systems and components.

Scope: This guidance applies to all EOSDIS DAACs, subsystems (ECHO, GCMD, Earthdata, GIBS, etc.) and related EOSDIS services and applications including (Reverb, ASTER GDEM Explorer, ASF Vertex, etc.).

• Guidance: URS will be implemented by DAACs, subsystems and related services for the following capabilities: – Downloading science data files from HTTP, HTTPS and FTP services.– Web services and tools allowing access to science data files (e.g. OPeNDAP, Web Coverage

Services, analysis tools, DAAC-unique ordering tools).– Online collaboration and comment tools (e.g. Wikis, Forums, Code Repositories).– Other tools and services that currently have optional or required user registration.

• Registration is NOT required:– Read-access to Web pages and documentation.– Data discovery services such as Reverb, Earth Data Search Client (ESDC), Global Change

Master Directory keyword services, CMR and DAAC unique search clients. • Note: This portion of the policy applies up until the point where science data downloads are performed or

write operations such as saving search parameters, inputting or updating metadata records are performed.

Page 14: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

14

Evolution and Transition Planning

• URS is available and this guidance will go into immediate effect.– A staggered approach will be utilized to implementing URS throughout

DAACs, subsystems and applications.– Schedules and transition plans for implementation will be negotiated

between effected systems and ESDIS.

• Milestones and Timeline– In 2015, HTTPS Access with URS 4 (SSO) must be available for

all current equivalent FTP/HTTP Access. – DAACs, subsystems and applications are allowed to run HTTPS

access and FTP/HTTP* access in parallel

Page 15: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

15

FTP RETIREMENT AT DATA CENTERS

Lessons Learned

Page 16: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

16

Near Real Time Data Access (LANCE)HTTPS File Distribution Requirements for LANCE

• LANCE Elements shall integrate with the URS and restrict access to NRT data to users with valid URS accounts.

• URL structure should be decided by the data providers• From a users perspective, it should be possible to get all the files

simply by using curl or wget, – eg :  wget -r  https://foo.nasa.gov/data/OMI/OMTO3/2007/05/11– which would download all the OMTO3 data files and the Manifest  for the

date 2007/05/11.

– To get the entire month use:wget -r https://foo.nasa.gov/data/OMI/OMTO3/2007/05 – To get the entire year I could use:wget -r -nd https://foo.nasa.gov/data/OMI/OMTO3/2007

Page 17: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

17

Page 18: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

18

LP DAAC migration to HTTP• The LP DAAC switched from FTP to HTTP for data access on

June 4, 2013. This change was advertised on the LP DAAC Web site as a News item. For users who do not regularly visit our page, we encourage them to consider subscribing to the RSS News Feed (https://lpdaac.usgs.gov/news_feed) so as not to miss out on future announcements.

The News Item for the FTP to HTTP is available at (https://lpdaac.usgs.gov/lp_daac_discontinue_anonymous_ftp_june_4_2013). Note: The cURL command handles http and has been used by some to update their scripted access to Data Pool.

• LP DAAC provides a good model for HTTPS data distribution https://lpdaac.usgs.gov/data_access/data_pool

Page 19: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

19

User Feedback“I think that the data should be delivered by a ftp server, because in my case, here in PARAGUAY the internet signal is not stable. During downloads, my connection was interrupted many times forcing me to restart the request process and download it again.”

“We used to receive order by email as ftp, currently it is only http, which is taking more time in downloading, can we go back to ftp option ?”

“The problem I have with the http protocol is I don't know how to automate my wget script to get new data. With ftp I can use a wildcard at the end of the full file path. With the current naming of the .hdf files, MYD11C1.A2013153.005.2013155051730.hdf

I don't know the filenames ahead of time, so I cannot even use a brute force, name every file to get approach. Is there some way you can recommend to automatically get these data? Can I request an automatic push to my incoming ftp site? “

Page 20: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

20

Summary

• Understanding that many of our users use scripts to get data from our anonymous FTP servers, this will require social as well as technical changes.  

• We are gathering use cases and lessons learned from other DAACs in addition to providing ‘recipes’, reference software to automate authenticated HTTPS downloads, bulk download web clients, user tutorials and documentation.

Page 21: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

21

Summary

• URS is also being enhanced to work with multiple web services. (e.g. OGC, OAI-PMH, OpenDAP, REST/SOAP).

• How to get HTTPS directory listings fast:

https://wiki.earthdata.nasa.gov/display/HDD/HTTP+Data+Distribution+Home

Some DAACs will be exempt from the HTTP requirement (via waivers)

– Our CDDIS DAAC is serving over 1.8M files and 380 Gbytes/day to over 13K distinct users ftp.

Page 22: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

22

FILE TRANSFER PROTOCOL ENGINEERING PERSPECTIVE

Backup - FTP versus HTTP

Page 23: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

23

FTP/HTTP Comparison FTP HTTP

Contains notion of file format: allows transfers of data to be ASCII or binary (+)

Always sends data binary (neutral)

No metadata is provided with files (-) HTTP provides metadata with files (+)Does not provide headers since no metadata is transferred (-)

Transfers with headers that contain information such as last modified date, character encoding, server name and version, etc (+)

FTP allows requesting multiple files to get transferred in parallel using the same control connection (-)

Supports pipelining - clients are able to ask for the next transfer before the previous one has added (+)

Since FTP doesn't utilize pipelining, new TCP connections are required for each transfer, so performance metrics are affected (-)

Pipelining allows multiple documents to get sent without a round-trip delay between documents, which helps with speed optimization (+)

Clients must send commands to the servers to respond, and a single transfer can involve a large series of commands. This has a negative impact since there is a round-trip delay for each command, as retrieving a single FTP file can easily get up to 10 round-trips. (-)

Uses one request and one response for each document (+)

Uses two connections where the second connections uses dynamic port numbers. Requires firewall admins to understand FTP at the application protocol layer to work well (-)

 

If both parties are behind Network Address Translations, you cannot use FTP (-)

 

Since firewalls need to understand FTP to open ports for the secondary connection, there is a huge problem with encryption (FTP-SSL, or FTPS) since the control connection is sent encrypted and firewalls cannot interpret the commands that deal with creating the second connection (-)

 

Not as many options available to prevent FTP from sending passwords as plain text (-)

HTTP does not send passwords as plain text (+)

Page 24: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

24

FTP/HTTP Comparison (con’t)

FTP HTTPResumed transfers for FTP that start beyond 2GB position has been known to cause trouble (-)

Supports more advanced byte ranges (+)

FTP must create a new connection for each new data transfer. Repeatedly doing this is bad for performance due to new handshakes/connections all the time (-)

Client can maintain a single connection to a server and keep using that for any amount of transfers (+)

Does not use chunked encoding (-) Utilizes chunked encoding, where the party sends a stream of data blocks until this is no more data to send, then sends a zero-size chunk to signal the end of it. (+)

FTP uses plain closing of ther connection, which makes it more difficult to detect premature connection shutdowns (-)

Chunked encoding helps in granting the ability to detect premature connection shutdowns (+)

FTP offers an official "built-in" run length encoding that compresses the amount of data to send, but not by a great enough amount on ordinary binary data (neutral)

Allows client and server to negotiate and choose among several compression algorithms (+)

FTP supports "third party transfers" wherein a client is allowed to ask a server to send data to a third host, a host that isn't the same as the client. This is typically disabled in modern FTP servers due to security implications (-)

Does not support "third party transfers" (FXP) (+)

Many FTP servers do not have the ability to support IPv6 (-)

HTTP supports IPv6 (+)

Cannot do name-based virtual hosting at all (-) Easily host many sites on the same server that are all differentiated by name (+)

FTP has commands for listing directory contents of the remote server (+)

Concept does not exist in HTTP (-)

FTP has not been standardized for proxies, so this functionality is generally done in lots of different ad-hoc approaches (-)

HTTP has built-in support for proxies natively. (+)Legend

Performance (speed)  Security  

Page 25: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

25

FILE TRANSFER PROTOCOL PERFORMANCE STUDY

Backup

Page 26: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

26

Study Background

• Sending files over a high-speed network doesn’t guarantee that the end-to-end performance will match the network capacity or meet user expectations. When transferring data, network latency (round-trip time or RTT) and packet loss can impact the transmission rate in conjunction with the file transfer protocol used, and the characteristics and tuning parameters of the end systems.  

• EOSDIS performed a study of a set of file transfer protocols from ESDIS Networks to determine how each one performed in different network environments– All protocols studied use TCP for transport

Page 27: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

27

Study Summary• High speed networks don’t come with high speed end-to-end

performance guarantees– File transfer protocol performance impacted by file size, host buffer size

and TCP behavior• Network latency (round-trip time, RTT) and packet loss

• Most common file transfer protocols were designed when network capacity was much less than today– FTP over TCP/IP was developed in the 1980s– Single TCP stream

• New file transfer protocols are designed to better adapt to changes in high speed network environments– Multiple, parallel TCP streams

• Other strategies are being employed to increase performance– Increasing packet size– Encrypting only sensitive data

Page 28: FTP versus HTTPS in EOSDIS Data Access WGISS 40 – September 30, 2015 Andrew Mitchell 1

28

Study Conclusions

• No single file transfer protocol works best in every network environment

• Data delivery requirements should be used to determine choice of file transfer protocol– Multi-stream protocols (bbFTP and GridFTP) are best at sending larger

files over WANs (long RTT, higher packet loss)– Efficient, single stream protocols (FTP, HTTP) work best at sending

smaller files over LANs (short RTT, lower packet loss)– Encryption processing software overhead lowers throughput

• Increased CPU load