Upload
herotest
View
213
Download
0
Embed Size (px)
Citation preview
8/13/2019 Bd Sqltohadoop3 PDF
1/13
Copyright IBM Corporation 2013 Trademarks
SQL to Hadoop and back again, Part 3: Direct transfer and live
data exchange
Page 1 of 13
SQL to Hadoop and back again, Part 3: Direct transfer
andlive data exchange
Martin C. Brown([email protected])
Director of Documentation
22 October 2013
Big data is a term that has been used regularly now for almost a decade, and it along
with technologies like NoSQL are seen as the replacements for the long-successful
RDBMS solutions that use SQL. Today, DB2, Oracle, Microsoft SQL Server MySQL, and
PostgreSQL dominate the SQL space and still make up a considerable proportion of the overallmarket. In this final article of the series, we will look at more automated solutions for migrating
data to and from Hadoop. In the previous articles, we concentrated on methods that take
exports or otherwise formatted and extracted data from your SQL source, load that into Hadoop
in some way, then process or parse it. But if you want to analyze big data, you probably don't
want to wait while exporting the data. Here, we're going to look at some methods and tools that
enable a live transfer of data between your SQL and Hadoop environments.
View more content in this series
Using SqoopLike some solutions we've seen earlier, Sqoop is all about taking data, usually wholesale, from a
database and inserting that into Hadoop in the format required for your desired use. For example,
Sqoop can take raw tabular data either a whole database, table, view, or query and insert
it into Hadoop using a native JSON-style format, CSV format, tab-delimited format, or Sqoop can
import it to a format suitable for using the data in Hive or HBase.
The elegance of Sqoop is that it handles the entire extraction, transfer, data type translation, and
insertion process for you, in either direction. Sqoop also handles providing you have organized
your data appropriately the incremental transfer of information. This means you can perform a
load and 24 hours later, you can perform an additional load that imports only the rows that have
changed.
InfoSphere BigInsightsInfoSphere BigInsights makes integrating between Hadoop and SQL databases much
simpler as it provides the necessary tools and mechanics to export and import data between
different databases. Using InfoSphere BigInsights you can define database sources, views,
queries and other selection criteria, and then automatically convert that into a variety of
formats before importing that collection directly into Hadoop (see Resourcesfor more
information).
http://www.ibm.com/legal/copytrade.shtmlhttp://www.ibm.com/developerworks/views/bigdata/libraryview.jsp?search_by=sql+hadoop+backmailto:[email protected]://www.ibm.com/developerworks/views/bigdata/libraryview.jsp?search_by=sql+hadoop+backmailto:[email protected]://www.ibm.com/developerworks/ibm/trademarks/http://www.ibm.com/legal/copytrade.shtml8/13/2019 Bd Sqltohadoop3 PDF
2/13
developerWorks ibm.com/developerWorks/
SQL to Hadoop and back again, Part 3: Direct transfer and livedata exchange Page 2 of 13
For example, you can create a query that extracts the data and populates a JSON array with
the record data. Once exported, a job can be created to process and crunch the data before
either displaying it, or importing the processed data and exporting the data back into DB2.
Download BigInsights Quick Start Edition,a complimentary, downloadable version of
InfoSphere BigInsights.
Sqoop is an efficient method of swapping data, since it uses multithreaded transfers to extract,
convert, and insert the information among databases. This approach can be more efficient for
data transfer than the export/import methods previously shown. The limitation of Sqoop is that it
automates aspects of data exchange that, if made configurable, could be better tailored to your
data and expected uses.
Importing to Hadoop using Sqoop
Sqoop works very simply by taking all the data in a table (effectively SELECT * FROM tablename),
or through the query submitted, then submits this data as a MapReduce load job that writes the
content out into HDFS within Hadoop.
The basic Sqoop tool accepts a command, import, then a series of options that define the JDBC
interface, along with configuration information, such as the JDBC driver, authentication information,
and table data. For example, here's how to import the Chicago Bus data from a MySQL source:
$ sqoop import --connect jdbc:mysql://192.168.0.240/hadoop --username root --table
chicago.
Sqoop likes to use the primary keys of the table data as an identifier for the information because
each row of data will be inserted into HDFS as a CSV row in a file. The primary key is also the
better method to use for append-only data, such as logs. Using the primary key is also handy
when performing incremental imports because we can use it to identify which rows have alreadybeen imported.
The output of the command actually goes a long way to describe the underlying process (see
Listing 1).
Listing 1. Output of the sqoop importcommand$ sqoop import --connect jdbc:mysql://192.168.0.240/hadoop
--username root --table chicago
13/08/20 18:45:46 INFO manager.MySQLManager: Preparing to use a MySQL
streaming resultset.
13/08/20 18:45:46 INFO tool.CodeGenTool: Beginning code generation
13/08/20 18:45:47 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `chicago` AS t LIMIT 1
13/08/20 18:45:47 INFO manager.SqlManager: Executing SQL statement:
SELECT t.* FROM `chicago` AS t LIMIT 1
13/08/20 18:45:47 INFO orm.CompilationManager: HADOOP_MAPRED_HOME
is /usr/lib/hadoop-mapreduce
13/08/20 18:45:47 INFO orm.CompilationManager: Found hadoop core jar at:
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
Note: /tmp/sqoop-cloudera/compile/2a66b88e152785acb3688bb530daa957/chicago.java
uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/08/20 18:45:49 INFO orm.CompilationManager: Writing jar file:
/tmp/sqoop-cloudera/compile/2a66b88e152785acb3688bb530daa957/chicago.jar
13/08/20 18:45:49 WARN manager.MySQLManager: It looks like you are
http://www.ibm.com/developerworks/downloads/im/biginsightsquick/8/13/2019 Bd Sqltohadoop3 PDF
3/13
ibm.com/developerWorks/ developerWorks
SQL to Hadoop and back again, Part 3: Direct transfer and livedata exchange Page 3 of 13
importing from mysql.
13/08/20 18:45:49 WARN manager.MySQLManager: This transfer can be faster!
Use the --direct
13/08/20 18:45:49 WARN manager.MySQLManager: option to exercise a
MySQL-specific fast path.
13/08/20 18:45:49 INFO manager.MySQLManager: Setting zero DATETIME
behavior to convertToNull (mysql)
13/08/20 18:45:49 ERROR tool.ImportTool: Error during import: No primary key
could be found for table chicago. Please specify one with --split-by or performa sequential import with '-m 1'.
[cloudera@localhost ~]$ sqoop import --connect jdbc:mysql://192.168.0.240/hadoop
--username root --table chicago
13/08/20 18:48:55 INFO manager.MySQLManager: Preparing to use a MySQL
streaming resultset.
13/08/20 18:48:55 INFO tool.CodeGenTool: Beginning code generation
13/08/20 18:48:55 INFO manager.SqlManager: Executing SQL statement:
SELECT t.* FROM `chicago` AS t LIMIT 1
13/08/20 18:48:55 INFO manager.SqlManager: Executing SQL statement:
SELECT t.* FROM `chicago` AS t LIMIT 1
13/08/20 18:48:55 INFO orm.CompilationManager: HADOOP_MAPRED_HOME
is /usr/lib/hadoop-mapreduce
13/08/20 18:48:55 INFO orm.CompilationManager: Found hadoop core jar at:
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
Note: /tmp/sqoop-cloudera/compile/3002dc39075aa6746a99e5a4b27240ac/chicago. java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/08/20 18:48:58 INFO orm.CompilationManager: Writing jar file:
/tmp/sqoop-cloudera/compile/3002dc39075aa6746a99e5a4b27240ac/chicago.jar
13/08/20 18:48:58 WARN manager.MySQLManager: It looks like you are importing
from mysql.
13/08/20 18:48:58 WARN manager.MySQLManager: This transfer can be faster!
Use the --direct
13/08/20 18:48:58 WARN manager.MySQLManager: option to exercise a
MySQL-specific fast path.
13/08/20 18:48:58 INFO manager.MySQLManager: Setting zero DATETIME
behavior to convertToNull (mysql)
13/08/20 18:48:58 INFO mapreduce.ImportJobBase: Beginning import of chicago
13/08/20 18:48:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing
the arguments. Applications should implement Tool for the same.
13/08/20 18:49:00 INFO db.DataDrivenDBInputFormat: BoundingValsQuery:
SELECT MIN(`id`), MAX(`id`) FROM `chicago`
13/08/20 18:49:00 INFO mapred.JobClient: Running job:
job_201308151105_0012
13/08/20 18:49:01 INFO mapred.JobClient: map 0% reduce 0%
13/08/20 18:49:51 INFO mapred.JobClient: map 100% reduce 0%
13/08/20 18:49:53 INFO mapred.JobClient: Job complete:
job_201308151105_0012
13/08/20 18:49:53 INFO mapred.JobClient: Counters: 23
13/08/20 18:49:53 INFO mapred.JobClient: File System Counters
13/08/20 18:49:53 INFO mapred.JobClient: FILE: Number of bytes read=0
13/08/20 18:49:53 INFO mapred.JobClient: FILE: Number of bytes
written=695444
13/08/20 18:49:53 INFO mapred.JobClient: FILE: Number of read operations=0
13/08/20 18:49:53 INFO mapred.JobClient: FILE: Number of large read
operations=013/08/20 18:49:53 INFO mapred.JobClient: FILE: Number of write operations=0
13/08/20 18:49:53 INFO mapred.JobClient: HDFS: Number of bytes read=433
13/08/20 18:49:53 INFO mapred.JobClient: HDFS: Number of bytes written=97157691
13/08/20 18:49:53 INFO mapred.JobClient: HDFS: Number of read operations=4
13/08/20 18:49:53 INFO mapred.JobClient: HDFS: Number of large
read operations=0
13/08/20 18:49:53 INFO mapred.JobClient: HDFS: Number of write operations=4
13/08/20 18:49:53 INFO mapred.JobClient: Job Counters
13/08/20 18:49:53 INFO mapred.JobClient: Launched map tasks=4
13/08/20 18:49:53 INFO mapred.JobClient: Total time spent by all maps in
occupied slots (ms)=173233
13/08/20 18:49:53 INFO mapred.JobClient: Total time spent by all reduces in
8/13/2019 Bd Sqltohadoop3 PDF
4/13
developerWorks ibm.com/developerWorks/
SQL to Hadoop and back again, Part 3: Direct transfer and livedata exchange Page 4 of 13
occupied slots (ms)=0
13/08/20 18:49:53 INFO mapred.JobClient: Total time spent by all maps waiting
after reserving slots (ms)=0
13/08/20 18:49:53 INFO mapred.JobClient: Total time spent by all reduces
waiting after reserving slots (ms)=0
13/08/20 18:49:53 INFO mapred.JobClient: Map-Reduce Framework
13/08/20 18:49:53 INFO mapred.JobClient: Map input records=2168224
13/08/20 18:49:53 INFO mapred.JobClient: Map output records=2168224
13/08/20 18:49:53 INFO mapred.JobClient: Input split bytes=43313/08/20 18:49:53 INFO mapred.JobClient: Spilled Records=0
13/08/20 18:49:53 INFO mapred.JobClient: CPU time spent (ms)=24790
13/08/20 18:49:53 INFO mapred.JobClient: Physical memory (bytes)
snapshot=415637504
13/08/20 18:49:53 INFO mapred.JobClient: Virtual memory (bytes)
snapshot=2777317376
13/08/20 18:49:53 INFO mapred.JobClient: Total committed heap usage
(bytes)=251133952
13/08/20 18:49:53 INFO mapreduce.ImportJobBase: Transferred 92.6568 MB
in 54.4288 seconds (1.7023 MB/sec)
13/08/20 18:49:53 INFO mapreduce.ImportJobBase: Retrieved 2168224 records.
Once transferred, the data is stored, by default, as comma-separated values, and added to a
directory named after the table that is imported, with the data split into select sizes (see Listing 2).
Listing 2. Storing the data
$ hdfs dfs -ls chicago
Found 6 items
-rw-r--r-- 3 cloudera cloudera 0 2013-08-20 18:49 chicago/_SUCCESS
drwxr-xr-x - cloudera cloudera 0 2013-08-20 18:49 chicago/_logs
-rw-r--r-- 3 cloudera cloudera 23904178 2013-08-20 18:49 chicago/part-m-00000
-rw-r--r-- 3 cloudera cloudera 24104937 2013-08-20 18:49 chicago/part-m-00001
-rw-r--r-- 3 cloudera cloudera 24566127 2013-08-20 18:49 chicago/part-m-00002
-rw-r--r-- 3 cloudera cloudera 24582449 2013-08-20 18:49 chicago/part-m-00003
To change the directory where the information is stored, use --target-dirto specify the directorylocation within HDFS.
The file format can be explicitly modified using command-line arguments, but the options are
limited. For example, you can't migrate tabular data into a JSON record with Sqoop.
A more complex alternative is to use the SequenceFile format, which translates the raw data into
a binary format that can be reconstituted within the Java environment of Hadoop as a Java
class, with each column of the table data as a property of each instantiated class record. As an
alternative, you can use Sqoop to import data directly into an HBase- or Hive-compatible table.
Importing using a query
Wholesale table transfers are useful, but one of the primary benefits of the SQL environment is the
ability to join and reformat the input into a more meaningful stream of columnar data.
By using a query, you can extract entire tables, table fragments, or complex table joins. I tend to
use queries when the source data is from multiple SQL tables and I want to crunch the data as a
single source table within Hadoop.
8/13/2019 Bd Sqltohadoop3 PDF
5/13
ibm.com/developerWorks/ developerWorks
SQL to Hadoop and back again, Part 3: Direct transfer and livedata exchange Page 5 of 13
To use a query, the --queryargument must be specified on the command line. The query must
include a WHEREclause that includes the variable $CONDITIONS; this is automatically populated by
Sqoop to be used when splitting the source content (see Listing 3).
Listing 3. Using the --query argument
$ sqoop import --connect jdbc:mysql://192.168.0.240/hadoop --username root \
--query "SELECT log.id,log.daterec,sensor.logtype,sensor.value FROM log
JOIN sensor on (sensor.logid == log.id) WHERE $CONDITIONS"
The basic process is the same; we're just being limited about the data being exchanged. Internally,
Sqoop merely executes the query and takes the tabular output.
A good technique to use here is to remember that the size of the data being transferred is
(comparatively) meaningless. Also bear in mind that during processing within Hadoop, you will only
have access to the information in the files that are imported; you won't be able to run a join or other
lookup to find the information you need that would normally exist in a multi-table SQL environment.Therefore, duplication of information (for example, repeats of a string, ID, or date identifier column)
on a row basis that you might ordinarily dedupe, can safely be repeated.
Incremental imports
The incremental importis an attempt by Sqoop to handle the fact that source data is unlikely to be
static. The process is not automatic, and you must be prepared to keep a record of the last data
that was imported.
The incremental system operates in two ways, either using a lastmodified approach, or using an
append approach:
The lastmodifiedapproach requires changing your SQL table structure and application
as it performs a comparison on a date that is then used to determine which records have
changed since the last import was made. This is best used for data from the SQL side that is
updated, but you must adapt your application and database structure to include a column that
contains the date and time when the record was inserted or updated. Most databases include
a timestamp data type for exactly this purpose.
The appendapproach uses a simpler check field. This can be used in a number of ways,
but the most obvious is one where an auto-incremented column is used to hold data and
is, therefore, better suited to data that is permanently appended, rather than created and
updated. Another option is to use a column that is updated to a new value for each insert or
update, but this requires hoop jumping that is more complex than the auto_incrementvalue.
For either system, the fundamental approach is the same: tell Sqoop which column contains the
data to be checked and the check value, then import as normal. For example, to import all the
data since the original Chicago bus data import, we specify the auto_incrementID of the last row
imported (see Listing 4).
8/13/2019 Bd Sqltohadoop3 PDF
6/13
developerWorks ibm.com/developerWorks/
SQL to Hadoop and back again, Part 3: Direct transfer and livedata exchange Page 6 of 13
Listing 4. Specifying the auto_incrementID$ sqoop import --connect jdbc:mysql://192.168.0.240/hadoop --username root \
--table chicago --check-column id --incremental append --last-value=2168224
...
13/08/20 19:39:01 INFO tool.ImportTool: Incremental import complete!
To run another incremental import of all data following this import, supply the
following arguments:13/08/20 19:39:01 INFO tool.ImportTool: --incremental append
13/08/20 19:39:01 INFO tool.ImportTool: --check-column id
13/08/20 19:39:01 INFO tool.ImportTool: --last-value 4336573
13/08/20 19:39:01 INFO tool.ImportTool: (Consider saving this with
'sqoop job --create')
One useful feature of the incremental process is that the job outputs the command-line values you
would need to use for the next import, but it's easier to save this as a job (see Listing 5).
Listing 5. Saving output as a job$ sqoop job create nextimport --incremental append \
--check-column id --last-value 4336573 \ --connect jdbc:mysql://192.168.0.240/hadoop \
--username root --table chicago
Now you can run the next import by running $ sqoop job --exec nextimport.
The incremental import process is great for those jobs that aggregate data from your SQL store
into Hadoop over time, while deleting the active data in the SQL store. The basic premise here
can be used to do near-live updates of information into Hadoop from an SQL source by easily
transferring the information across.
Exporting to SQL using SqoopThe export process just converts the data in Hadoop back into a table. Sqoop achieves this exportby loading the data into a staging table and importing the data into the target table. The target
table has to exist, and the structure has to match the information being exported from Hadoop (see
Listing 6).
Listing 6. Sqoop exportcommand$ sqoop export --connect jdbc:mysql://192.168.0.240/hadoop --username root
--table chicago2 --export-dir=chicago
13/08/20 20:08:44 INFO manager.MySQLManager: Preparing to use a MySQL
streaming resultset.
13/08/20 20:08:44 INFO tool.CodeGenTool: Beginning code generation
13/08/20 20:08:46 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `chicago2` AS t LIMIT 1
13/08/20 20:08:47 INFO manager.SqlManager: Executing SQL statement:
SELECT t.* FROM `chicago2` AS t LIMIT 1
13/08/20 20:08:47 INFO orm.CompilationManager: HADOOP_MAPRED_HOME
is /usr/lib/hadoop-mapreduce
13/08/20 20:08:47 INFO orm.CompilationManager: Found hadoop core jar at:
/usr/lib/hadoop-mapreduce/hadoop-mapreduce-client-core.jar
Note: /tmp/sqoop-cloudera/compile/5f6d818f5d78c0e4349b5fc3924f87da/
chicago2.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
13/08/20 20:08:49 INFO orm.CompilationManager: Writing jar file:
/tmp/sqoop-cloudera/compile/5f6d818f5d78c0e4349b5fc3924f87da/chicago2.jar
13/08/20 20:08:50 INFO mapreduce.ExportJobBase: Beginning export of chicago2
8/13/2019 Bd Sqltohadoop3 PDF
7/13
8/13/2019 Bd Sqltohadoop3 PDF
8/13
8/13/2019 Bd Sqltohadoop3 PDF
9/13
ibm.com/developerWorks/ developerWorks
SQL to Hadoop and back again, Part 3: Direct transfer and livedata exchange Page 9 of 13
predefine jobs and structures and how the interface of the data exists between the two systems,
then use this definition to run explicit transfers of the data. Sqoop2 is still in its early stages (the
first stable version was made available in March 2013), and convenience features like Hive
and HBase support are not complete. The real benefit will come with a forthcoming UI, which is
expected toward the end of the 2013.
Incorporating Hadoop in your application
For performing more regular, measured transfer of data, the best approach is to make the data
exchange part of your application framework. This is the only way to be sure that the exchange of
information is wired correctly into your application workflow. One solution is to use Hive as your
best approach, since it allows SQL queries to be used directly on the two systems.
However, this kind of integration is risky is because of the potential for bad data. The main benefit
of using SQL is the transactional nature that guarantees data has been written to disk. Exporting
(Sqoop, CSV, etc.) the data is much safer and works much better with the append-only nature of
HDFS and Hadoop in general.
The exact method is dependent on your application, but writing the same data to both systems is
not without its pitfalls. In particular:
Writing data to two databases is complicated. To guarantee that no data is lost, you would
need to confirm both transactions and reject the write for the application to resubmit in the
event that either the SQL or Hadoop write had failed.
Updates are more complicated in Hadoop. HDFS is append-only by nature, so you would
have to handle the process by writing a correction record into HDFS, then using MapReduce
to compress the insert and update operations during processing. Deletes are fundamentally the same as updates: We just kill the data during MapReduce
processing.
Any changes (updates or deletes) may cause problems if you are doing streaming updates
and compression of the data in Hadoop.
If you are reading the data back into SQL in a summarized form for near-line processing,
you'll need to plan the process (as described in Part 2) very carefully.
The data life cycle between SQL and Hadoop of export, process, import is generally much easier
to handle (disk space issues aside). If you need near-live transfer, then the Sqoop incremental
transfer is much more efficient.
Conclusion
Live data transfer between SQL and Hadoop is not a sensible option, but with Sqoop, we can do
the next best thing by using incremental updates to load the most recent data into Hadoop in a
regular fashion. The alternative live updates through your application is so incredibly risky
from a quality and reliability of data point of view that it should be avoided. Regular swapping of
information between SQL and Hadoop is safer and allows for the data to be managed during the
transfer more effectively.
http://www.ibm.com/developerworks/library/bd-sqltohadoop2/index.htmlhttp://www.ibm.com/developerworks/library/bd-sqltohadoop2/index.html8/13/2019 Bd Sqltohadoop3 PDF
10/13
developerWorks ibm.com/developerWorks/
SQL to Hadoop and back again, Part 3: Direct transfer and livedata exchange Page 10 of 13
Throughout this "SQL to Hadoop and back again" series, the focus has been on trying to
understand that life cycle and how the transfer and exchange of information operates. The format
of the data is relatively simple, but knowing how and when to effectively exchange the information
and process it in a way that matches your data needs is the challenge. There are lots of solutions
out there to move the data, but it is still up to your application to understand the best way to make
use of that exchange.
http://www.ibm.com/developerworks/views/bigdata/libraryview.jsp?search_by=sql+hadoop+back8/13/2019 Bd Sqltohadoop3 PDF
11/13
8/13/2019 Bd Sqltohadoop3 PDF
12/13
developerWorks ibm.com/developerWorks/
SQL to Hadoop and back again, Part 3: Direct transfer and livedata exchange Page 12 of 13
Find resources to help you get started with InfoSphere BigInsights, IBM's Hadoop-based
offering that extends the value of open source Hadoop with features like Big SQL, text
analytics, and BigSheets.
Follow these self-paced tutorials (PDF)to learn how to manage your big data environment,
import data for analysis, analyze data with BigSheets, develop your first big data application,
develop Big SQL queries to analyze big data, and create an extractor to derive insights fromtext documents with InfoSphere BigInsights.
Find resources to help you get started with InfoSphere Streams, IBM's high-performance
computing platform that enables user-developed applications to rapidly ingest, analyze, and
correlate information as it arrives from thousands of real-time sources.
Stay current with developerWorks technical events and webcasts.
Follow developerWorks on Twitter.
Get products and technologies
Get Hadoop 0.20.1from Apache.org.
Get Hadoop MapReduce.
Get Hadoop HDFS.
Download InfoSphere BigInsights Quick Start Edition, available as a native software
installation or as a VMware image.
Download InfoSphereStreams, available asa native software installation or as a VMware
image.
Use InfoSphere Streams on IBM SmartCloudEnterprise.
Build your next development project with IBM trial software, available for download directly
from developerWorks.
Discuss
Ask questions and get answers in the InfoSphere BigInsights forum.
Ask questions and get answers in the InfoSphere Streams forum.
Checkout the developerWorks blogsand get involved in the developerWorks community.
IBM big data and analyticson Facebook.
https://www.facebook.com/IBMbigdataanalyticshttp://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/blogs/http://www.ibm.com/developerworks/communityhttp://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000002409http://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000002409http://www.ibm.com/developerworks/downloads/im/streams/cloud.htmlhttp://www.ibm.com/developerworks/downloads/im/streams/cloud.htmlhttp://www.ibm.com/developerworks/downloads/im/streams/index.html/http://www.ibm.com/developerworks/downloads/im/streams/index.html/http://www.ibm.com/developerworks/downloads/im/streams/index.html/https://www.facebook.com/IBMbigdataanalyticshttp://www.ibm.com/developerworks/communityhttp://www.ibm.com/developerworks/blogs/https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000001664http://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000002409http://www.ibm.com/developerworks/downloads/http://www.ibm.com/developerworks/downloads/im/streams/cloud.htmlhttp://www.ibm.com/developerworks/downloads/im/streams/index.html/http://www.ibm.com/developerworks/downloads/im/biginsightsquick/http://hadoop.apache.org/hdfs/releases.htmlhttp://hadoop.apache.org/mapreduce/releases.htmlhttp://hadoop.apache.org/common/releases.htmlhttp://www.twitter.com/developerworks/http://www.ibm.com/developerworks/offers/techbriefings/http://www.ibm.com/developerworks/bigdata/streams/index.htmlhttp://www.ibm.com/e-business/linkweb/publications/servlet/pbi.wss?CTY=US&FNC=SRX&PBL=GC19-4104-00http://www.ibm.com/developerworks/bigdata/biginsights/index.html8/13/2019 Bd Sqltohadoop3 PDF
13/13
ibm.com/developerWorks/ developerWorks
SQL to Hadoop and back again, Part 3: Direct transfer and livedata exchange Page 13 of 13
About the author
Martin C. Brown
A professional writer for over 15 years, Martin (MC) Brown is the author and
contributor to more than 26 books covering an array of topics, including the recently
published Getting Started with CouchDB.His expertise spans myriad development
languages and platforms: Perl, Python, Java, JavaScript, Basic, Pascal, Modula-2, C,
C++, Rebol, Gawk, Shellscript, Windows, Solaris, Linux, BeOS, Microsoft WP, Mac
OS and more. He currently works as the director of documentation for Continuent.
Copyright IBM Corporation 2013
(www.ibm.com/legal/copytrade.shtml)
Trademarks
(www.ibm.com/developerworks/ibm/trademarks/)
http://www.ibm.com/developerworks/ibm/trademarks/http://www.ibm.com/legal/copytrade.shtml