12
Aug - Dec 2019 TRINITY INSTITUTE OF PROFESSIONAL STUDIES Dwarka, Sector-9, New Delhi Advisors Dr. R.K. Tandon Chairman, TIPS, Dwarka Ms. Reema Tandon Vice Chairperson TIPS, Dwarka Editor-in-Chief Dr. Barkha Bahl Editorial Board Prof. (Dr.) Sunil Kumar Khatri Director, AIIT, Amity University, Noida Prof. Prashant Johri Director, Galgotia University Prof. Naveen Kumar Associate Professor, IGNOU Prof. (Dr.) Saurabh Gupta HOD (CSE) Dept, NIEC Ms. Ritika Kapoor Assistant Professor, TIPS, Dwarka Artificial Intelligence and Its role in modern era Internet of things Cloud Computing for Big Data Analytics ETL tools for Data Warehousing Vol 5, Issue 2 Trinity Tech Review 3 6 7 10 Director, TIPS Dwarka

TRINITY INSTITUTE OF PROFESSIONAL STUDIES Trinity Tech … · Prof. Naveen Kumar Associate Professor, IGNOU Prof. (Dr.) Saurabh Gupta HOD (CSE) Dept, NIEC Ms. Ritika Kapoor Assistant

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: TRINITY INSTITUTE OF PROFESSIONAL STUDIES Trinity Tech … · Prof. Naveen Kumar Associate Professor, IGNOU Prof. (Dr.) Saurabh Gupta HOD (CSE) Dept, NIEC Ms. Ritika Kapoor Assistant

Aug - Dec 2019

TRINITY INSTITUTE OF PROFESSIONAL STUDIES Dwarka, Sector-9, New Delhi

Advisors Dr. R.K. Tandon Chairman,

TIPS, Dwarka

Ms. Reema Tandon Vice Chairperson

TIPS, Dwarka

Editor-in-Chief Dr. Barkha Bahl

Editorial Board

Prof. (Dr.) Sunil Kumar KhatriDirector, AIIT, Amity University, Noida

Prof. Prashant JohriDirector, Galgotia University

Prof. Naveen Kumar Associate Professor, IGNOU

Prof. (Dr.) Saurabh GuptaHOD (CSE) Dept, NIEC

Ms. Ritika KapoorAssistant Professor, TIPS, Dwarka

Artificial Intelligence and Its role in modern era

Internet of things

Cloud Computing for Big Data Analytics

ETL tools for Data Warehousing

Vol 5, Issue 2

Trinity Tech Review

3

6

7

10Director, TIPS Dwarka

Page 2: TRINITY INSTITUTE OF PROFESSIONAL STUDIES Trinity Tech … · Prof. Naveen Kumar Associate Professor, IGNOU Prof. (Dr.) Saurabh Gupta HOD (CSE) Dept, NIEC Ms. Ritika Kapoor Assistant

Vol 5 Issue 2 Page 1

Trinity Tech Review Aug Dec 2019

An ISO 9001:2008 Certified Institution

Sector-9, Dwarka, New Delhi-110075

www.tips.edu.in, [email protected]

Disclaimer: The views and opinions presented in the articles, case studies, research work and

other contributions published in Trinity Tech Review (TTR) are solely attributable to the

authors of respective contributions. If these are contradictory to any particular person or

entity, TTR shall not be liable for the present opinions, inadequacy of the information, any

mistakes or inaccuracies.

Copyright © March 2015 Trinity Institute of Professional Studies, Dwarka. All rights reserved. No part

of this publication may be reproduced, distributed, or transmitted in any form or by means, including

photocopying, recording, or other electronic or mechanical methods, without the prior written

permission of the under mentioned.

Trinity Institute of Professional Studies

(Affiliated to Guru Gobind Singh Indraprastha University, Delhi)

Ph: 45636921/22/23/24, Telefax : 45636925

TRINITY INSTITUTE OF PROFESSIONAL STUDIES

“A+” Ranked Institution by SFRC, Govt. of NCT of Delhi.Recognised under section 2(f) of the UGC Act, 1956

& NAAC Accredited “B++” Grade Institution

Affiliated to Guru Gobind Singh Indraprastha University, Delhi)

Page 3: TRINITY INSTITUTE OF PROFESSIONAL STUDIES Trinity Tech … · Prof. Naveen Kumar Associate Professor, IGNOU Prof. (Dr.) Saurabh Gupta HOD (CSE) Dept, NIEC Ms. Ritika Kapoor Assistant

Page 2

STATEMENT ABOUT OWNERSHIP AND OTHER DETAILS OF TTR/TMR

FORM 5 (RULE 8)

1. Printer's Name : Dr. R.K. Tandon

Nationality : Indian

Address : Trinity Institute of Professional Studies

Sector-9, Dwarka, New Delhi 110075

2. Place of Publication : Delhi

3. Periodicity of Publication : Quarterly

4. Publisher's Name : Dr. R.K. Tandon

Nationality : Indian

Address : Trinity Institute of Professional Studies

Sector-9, Dwarka, New Delhi 110075

5. Editor's Name : Dr. Barkha Bahl

Nationality : Indian

Address : Trinity Institute of Professional Studies

Sector-9, Dwarka, New Delhi 110075

6. Name and Address of the : CHAIRMAN

individual who owns the Trinity Institute of Professional Studies

journal and partners or Sector-9, Dwarka, New Delhi 110075

shareholders holding more

than one per cent of the

capital.

7. Hosted at (url) : www.tips.edu.in

I, Dr. R.K. Tandon, hereby declare that the particulars given above are true to the best of my

knowledge and belief.

Dr. R.K. Tandon

Vol 5 Issue 2

Trinity Tech Review Aug Dec 2019

Page 4: TRINITY INSTITUTE OF PROFESSIONAL STUDIES Trinity Tech … · Prof. Naveen Kumar Associate Professor, IGNOU Prof. (Dr.) Saurabh Gupta HOD (CSE) Dept, NIEC Ms. Ritika Kapoor Assistant

Page 3

AI was founded as an academic discipline in 1956. Since then, AI techniques have become an essential part of the technology industry. Different types of AI-powered robots are being developed in different parts of the world, including the US, China, Japan, Korea and India. As per reports, two-thirds of global investments in AI poured into China. This led to the AI industry grow 67 per cent last year alone.

AI is becoming a disruptive force that is redefining the modern industry. This article features some exciting applications of AI, along with a glimpse into the future, illustrating how AI will continue to transform industries and our lives.

AI technologiesLatest AI technologies include natural language generation, speech recognition, virtual agents, machine learning, deep learning, biometrics and AI-optimised hardware. AI experts break AI down into three broad categories: artificial narrow intelligence (ANI), artificial general intelligence (AGI) and artificial super intelligence (ASI). Complex and intelligent algorithms and good sensory systems could make AI robots perform even better. With improved machine learning and deep learning algorithms, future AI could be much more efficient, powerful and smarter than the present ones.

AI finds many useful applications in Internet-related technologies, such as digital marketing, creating and generating online content, digital advertising, Web

Applications of AIThere are many emerging applications of AI. We can find AI in robotics, healthcare, education, businesses and on your mobile devices, to name a few. We have narrowed down the list of AI applications to a few, each accompanied with a glimpse into the future, illustrating how AI will continue to transform industries and our lives.

Virtual assistants

Internet applications

These are basically software agents that provide a wide variety of services. Amazon Alexa, Google Assistant and Siri are some of the most popular AI assistants. Many virtual assistant software are now being installed in smartphones as well to serve you in a better way.

Web designing

Consumers are already using chatbots to chat with friends and colleagues without waiting for a long time for a response. Chatbots automate responses to potential buyers' frequently asked questions and provide them a way to search for the product or service they are looking for. Natural learning processing and machine learning techniques are used by these bots to find the correct responses. Many brands have started using these techniques to communicate with their prospective customers through messenger applications like Facebook Messenger, WhatsApp and Slack.

This field has revolutionised modern businesses. While the amount of information on potential consumers grows, AI-related technology will be of utmost importance when making data-based decisions. AI helps find people and customers based on their interests, demographics and other aspects to learn and detect the best audience for particular brands.

There are areas where content created by AI can be useful and help attract visitors to a website. AI can also write reports and news based on data and information. Hundreds of articles can be created with AI technology quickly, which can save a lot of time and resources.

searches, Web designs, chatbots, the Internet of Things (IoT) and others.

Online searches

Digital marketing

Content creation

Old ways of performing online searches no longer stay true. Two use cases using AI that have revolutionised Internet searches and search engine optimisation (SEO) are voice search and Google's algorithm called RankBrain. These have changed the way marketeers create and optimise their Web content.

With AI, websites could exist without the help of programmers and designers. Applications, such as Grid, use AI to design websites based on the information provided by users like images, text, calls-to-action, etc. AI can make websites look professional in very little time and at a much lower cost.

Chatbots

A��������� I����������� ��� I�� ���� �� ������ ���S����� B�����

A�������� P��������, CS � IT D���., TIPS D�����

Vol 5 Issue 2

Trinity Tech Review Aug Dec 2019

Page 5: TRINITY INSTITUTE OF PROFESSIONAL STUDIES Trinity Tech … · Prof. Naveen Kumar Associate Professor, IGNOU Prof. (Dr.) Saurabh Gupta HOD (CSE) Dept, NIEC Ms. Ritika Kapoor Assistant

Page 4

The IoT

The main concern in today's digital world is cyber security. Malware and virus attacks are common in the cyber world. There is a constant threat of data security to not just individuals or corporates but also government sectors. AI along with machine learning is used for the protection of data. AI allows you to automate the detection of threat and combat without the involvement of humans. AI has been used for password protection and authenticity detection.

Finance and economics

Art and design

Cyber Security

AI is used to manage huge data flows and storage in the IoT network. With high-speed Internet networks and advanced sensors integrated into microcontrollers (MCUs), AI along with the IoT is creating a new wave of disruptive technologies. With the explosion of the IoT, there are problems regarding data storage, delay, channel limitation and congestion in networks. One solution to solve these is to use AI in data mining, managing and controlling the congestion in networks. Techniques used in AI include fuzzy logics and neural networks in conjunction with the IoT network.

Wall Street, financial district of the US, uses complex computer programs to do heavy jobs. These programs run on their own. A few years ago, stock markets plummeted, taking down a trillion dollars worth of market value, due to a malfunctioning ANI program. Also, for example, when you deposit a cheque using your mobile banking app, it runs through a refined ANI system that can read the cheque much faster than humans. When we shop online, we are essentially feeding data into ANI systems.

AI has been used to algorithmically generate objects that can be rendered digitally. It can generate new patterns with high speed, good efficiency and verisimilitude. Algorithm-driven design tools help you construct user interfaces, content and personalise user experiences.

Publishing tools such as Readymag and Squarespace have greatly simplified the work to the extent where you can get many high-quality templates and designs without having to pay for a designer. There are many other algorithm-driven design tools for graphic design including identity, drawing and illustrations.AI solution providers offer tools and libraries to manipulate images and photos. Soon, AI will drive the next generation of apps for visual arts and creative designs. An AI model, called CRAFT (Composition, Retrieval and Fusion Network), is being

Education

AI-based robo-readers are being used in essay grading in schools. The approach involves pairing human intelligence with AI to improve the overall grading system and help students accomplish more.

Defence Advanced Research Projects Agency (DARPA), USA, is working on its Artificial Intelligence Exploration (AIE) programme, which is a key component of the agency's broader AI. AI in space exploration is gathering momentum, too. Over the next few years, new missions would be taken up by AI as we voyage to the Moon and the planets, and explore possibilities in Space. AI is also being used in NASA's next rover mission to planet Mars. National Geographic Society and Microsoft are partnering to explore how AI can help us understand, engage and protect Earth.AI is helping the oil and gas industry to preserve the ecosystem while discovering new resources. Robotics and AI have been replacing traditional research and exploration methods in ocean science and technology.

developed by researchers from University of Illinois, USA, and Allen Institute for Artificial Intelligence, USA. It can convert provided text descriptions into video clips of an animated series. To simplify, AI matches videos with word descriptions, builds a set of parameters and generates scenes.

Exploration and research

AI is being used to improve education systems. Traditional techniques might be replaced by personalised, adaptive learning to tailor individual students' strengths and weaknesses. Machine learning can be used to identify students and focus extra resources on weaker ones.

Automotive

Video gamesYou might already know about Deep Blue, IBM's chess-playing supercomputer, which beat international grandmaster Garry Kasparov in the late 1990s. Chinook, a program developed at University of Alberta, Canada,

ANI-assisted automotive technology is being employed in driverless cars. Recently, IBM developed an IoT for automotive—a program to eliminate driver errors through connectivity. Since many accidents are caused by human errors, researchers are trying to find ways to minimise human errors using AI algorithms.In the future, your car is likely to have well-packaged complex computer programs. An automated vehicle uses AI, sensors and global positioning system coordinates to drive itself without a human operator.

Vol 5 Issue 2

Trinity Tech Review Aug Dec 2019

Page 6: TRINITY INSTITUTE OF PROFESSIONAL STUDIES Trinity Tech … · Prof. Naveen Kumar Associate Professor, IGNOU Prof. (Dr.) Saurabh Gupta HOD (CSE) Dept, NIEC Ms. Ritika Kapoor Assistant

Page 5

AI and robotics are enabling new military capabilities and strategies including intelligence, surveillance and even nuclear weapon systems. AI is used in autonomous weapons and sensing systems. There is a lot of military AI research and development going on around the world to keep fast-paced advances in machine learning.

India and AIAs per a report by Accenture, AI holds the potential to add US$ 957 billion or 15 per cent of India's current gross value by 2035. Union Budget 2018 announcements include national programmes to conduct research and development on technologies like machine learning, AI and others. NITI Aayog, the nation's think-tank and premier policy-making body, is working on new technologies for the development of the economy.The government is working on AI initiatives to put India on the global map with regards to AI, and to promote it in health, education and agricultural sectors. India-made Miko by Emotix, a companion robot, is an example of technologies that incorporate AI. Miko engages, educates and entertains children besides talking to and playing games with them. It is equipped with answers to basic ques t ions re la ted to genera l knowledge and academics.There is a humanoid robot called Rashmi, developed recently in India. It is the first Hindi-speaking robot who is hosting a show on Red FM since December 2018.

can beat any human player at the game of checkers. There is also a computer program called Maven for scrabble game. These are perfect examples of AI. More recently, AlphaZero, an ANI developed by DeepMind, won 100 games in a row against the world's current best chess program. Almost every modern video game has an AI component. Video game reviews are based on the quality of AI.

A new type of AI algorithm is being used by Google's Medical Brain to make predictions about the likelihood of death among hospital patients. This technology is the latest attempt to revolutionise healthcare. AI programs are being developed and used in diagnoses, treatments, drug development, and patient monitoring and care. Today, most doctors use ANI programs. These assist doctors in accurately diagnosing cancer and various other diseases. Then, there is a robot chemist that uses machine learning to study new molecules and reactions.

Healthcare

Military

The way forward

Sophisticated Applications of AI

AI is still in the developing stages. The market is not easily quantifiable and yet there are plenty of opportunities available for AI. There is hope that AI applications will keep serving humans in the most beneficial way going forwards.

AI is becoming a disruptive force that is redefining modern industry. Importance of AI technology is being felt across a broad spectrum of industries. From voice-powered personal assistants like Alexa to technologies such as behavioural algorithms, suggestive Internet search algorithms and autonomous vehicles, there is a lot of scope for applications of AI today. Robots built to look and act like humans are getting a lot of attention, and making splashy headlines and appearances. As far as India is concerned, initiatives by the government and its roadmap for national AI programmes along with private players are expected to bring a revolution to the Indian industry.

Sophia, an artificial intelligence (AI) humanoid, was in the news recently for becoming the first robot ever to have a nationality. In October 2017, Sophia was granted Saudi Arabian citizenship. This AI-powered robot is famous for speaking at the United Nations, and interviewing celebrities and world leaders. Sophia, developed by Hanson Robotics, is an example of the most sophisticated AI-powered robots built by humans in recent times. It can imitate human gestures, facial expressions, and make conversations in the form of answering certain questions and initiating discussions on predefined topics. China developed its first human-like female robot called Jia Jia in 2016, at its University of Science and Technology. Then, there is Erica, a Japanese female robot created by Hiroshi Ishiguro Laboratories, who is considered the most beautiful robot in the world.

Vol 5 Issue 2

Trinity Tech Review Aug Dec 2019

Page 7: TRINITY INSTITUTE OF PROFESSIONAL STUDIES Trinity Tech … · Prof. Naveen Kumar Associate Professor, IGNOU Prof. (Dr.) Saurabh Gupta HOD (CSE) Dept, NIEC Ms. Ritika Kapoor Assistant

Page 6

Devices and objects with built in sensors are connected to an Internet of Things platform, which integrates data from the different devices and applies analytics to share the most valuable information with applications built to address specific needs.

For example, if I own a car manufacturing business, I might want to know which optional components (leather seats or alloy wheels, for example) are the most popular.

That includes an extraordinary number of objects of all shapes and sizes – from smart microwaves, which automatically cook your food for the right length of time, to self-driving cars, whose complex sensors detect objects in their path, to wearable fitness devices that measure your heart rate and the number of steps you've taken that day, then use that information to suggest exercise plans tailored to you. There are even connected footballs that can track how far and fast they are thrown and record those statistics via an app for future training purposes.

The Internet of Things is the concept of connecting any device (so long as it has an on/off switch) to the Internet and to other connected devices. The IoT is a giant network of connected things and people – all of which collect and share data about the way they are used and about the environment around them.

How does it work?

These powerful IoT platforms can pinpoint exactly what information is useful and what can safely be ignored. This information can be used to detect patterns, make recommendations, and detect possible problems before they occur.

Using Internet of Things technology, I can:

Drill down into the available sales data to identify which components are selling fastest;

Automatically align sales data with supply, so that popular items don't go out of stock.

The information picked up by connected devices enables me to make smart decisions about which components to stock up on, based on real-time information, which helps

What is the Internet of Things?

Use sensors to detect which areas in a showroom are the most popular, and where customers linger longest;

Scenario #2: IoT in transport

me save time and money.

With the insight provided by advanced analytics comes the power to make processes more efficient. Smart objects and systems mean you can automate certain tasks, particularly when these are repetitive, mundane, time-consuming or even dangerous. Let's look at some examples to see what this looks like in real life.

Scenario #1: IoT in your homeImagine you wake up at 7am every day to go to work. Your alarm clock does the job of waking you just fine. That is, until something goes wrong. Your train's cancelled and you have to drive to work instead. The only problem is that it takes longer to drive, and you would have needed to get up at 6.45am to avoid being late. Oh, and it's pouring with rain, so you'll need to drive slower than usual. A connected or IoT-enabled alarm clock would reset itself based on all these factors, to ensure you got to work on time. It could recognize that your usual train is cancelled, calculate the driving distance and travel time for your alternative route to work, check the weather and factor in slower travelling speed because of heavy rain, and calculate when it needs to wake you up so you're not late. If it's super-smart, if might even sync with your IoT-enabled coffee maker, to ensure your morning caffeine's ready to go when you get up.

Having been woken by your smart alarm, you're now driving to work. On comes the engine light. You'd rather not head straight to the garage, but what if it's something urgent? In a connected car, the sensor that triggered the check engine light would communicate with others in the car. A component called the diagnostic bus collects data from these sensors and passes it to a gateway in the car, which sends the most relevant information to the manufacturer's platform. The manufacturer can use data from the car to offer you an appointment to get the part fixed, send you directions to the nearest dealer, and make sure the correct replacement part is ordered so it's ready for you when you show up.

I������� �� T�����

A�������� P��������, CS � IT D���., TIPS D�����N��� A�������

Vol 5 Issue 2

Trinity Tech Review Aug Dec 2019

Page 8: TRINITY INSTITUTE OF PROFESSIONAL STUDIES Trinity Tech … · Prof. Naveen Kumar Associate Professor, IGNOU Prof. (Dr.) Saurabh Gupta HOD (CSE) Dept, NIEC Ms. Ritika Kapoor Assistant

Page 7

While infrastructure forms the base layer and requirement for any kind of development or usage, developers may also require pre-deployed and pre-configured IT resources. This gives them a complete environment to directly work upon, which saves time and effort. Some of the most popular products in this category are .NET-based environment provided by Microsoft Azure10 and a Python-based environment by the Google App Engine.

Delivery Models

1.Infrastructure-as-a-Service (IaaS) In this delivery model, cloud-based tools and interfaces allow access and management of infrastructure-centric IT resources. Therefore, reserves like hardware, network, connectivity and raw IT resources are included in this model. The user is free to configure these resources in the manner that he or she desires. Popular cloud providers of IaaS are Rackspace9 and Amazon EC2 .

It is evident from the discussion that the cloud provider offers services to the consumers. 27 Obviously, these IT services need to be packaged into combinations to make self-provisioning possible. This pre-packaging of services is called the cloud delivery model. There are three main delivery models used as standards, which have been described below. However, several other delivery models have been created on the basis of applications and functionality that they support. Some examples of these models include Database-as-a-Service, Security-as-a-Service and Testing-as-a-Service, apart from many others.

2.Platform-as-a-Service (PaaS)

According to this definition, Cloud Computing is a technology that allows on-demand, convenient and ubiquitous network access to computing resources that can be configured with minimal requirement of management and interaction with the service provider. The NIST definition also mentioned the five key characteristics, deployment models and delivery models for Cloud Computing. There are three main components of the Cloud Computing Ecosystem namely, end-user or consumer, distributed server and data center. The cloud provider provisions the IT resources to the end-user with the help of distributed server and data center. Cloud era

1. Public Cloud

The Community Cloud is an adaptation of the Public Cloud. The only difference between these two types of deployments is that the community cloud restricts access to the cloud services. Only a small community of cloud consumers can use the services of this cloud. Moreover, such a cloud may involve joint ownership of the community and a third-party cloud provider.

The cloud environment has three main characteristics namely, size, ownership and access. On the basis of these three characteristics, the deployment model for the cloud environment is determined.

2.Community Cloud

3. Private Cloud

Apart from the three formal models used for delivery of cloud solutions, the user also has the option to use any combinations of these models. In fact, all the three models may also be combined and provisioned to the end-user.Deployment Models

In such a cloud environment, the access is open to all and a third-party cloud provider owns the cloud. Therefore, creation and maintenance of the cloud environment is the sole responsibility of the cloud provider. The size of the cloud is large owing to the 'accessible to everyone' nature of it.

When an individual or organization owns a cloud and limits the access of the cloud to the members of the organization only, the deployment model is known as Private Cloud. The size of such a cloud is much smaller and staff is employed for performing administration and 29 management of the cloud. It is also important to mention that in such a deployment model, the cloud provider and consumer, both are the owners of the cloud.

The shared cloud service can also host software solutions that can directly be used by consumers on need basis. Products like Google Docs11, which is a shared service and provisions documentation software and storage to consumers, are examples of SaaS.

3.Software-as-a-Service (SaaS)

Hybrid Models

C���� C�������� ��� B�� D��� A��������D�. V���� T�����

A�������� P��������, CS � IT D���., TIPS D�����

Vol 5 Issue 2

Trinity Tech Review Aug Dec 2019

Page 9: TRINITY INSTITUTE OF PROFESSIONAL STUDIES Trinity Tech … · Prof. Naveen Kumar Associate Professor, IGNOU Prof. (Dr.) Saurabh Gupta HOD (CSE) Dept, NIEC Ms. Ritika Kapoor Assistant

What makes Cloud Computing an Ideal Match for Big Data?

#HADOOP VARIABLES START

export JAVA_HOME="/usr/lib/jvm/java-1.8.0-openjdk-amd64"

Big Data Analytics and Cloud Computing Architecture

There are many platforms used for perform Big Data operation on Cloud but the best platform for Hadoop on cloud is CloudEra. In case of CloudEra operating system Hadoop is already installed. This cloudera supports a HDFS file system in Cloudera. Big data solutions have two fundamental requirements. The size of the data is 'big'. Therefore, in order to store this data, a large and scalable storage space is required. Moreover, the standard analytics algorithms are computing-intensive. Therefore, an infrastructural solution that can support this level of computation is needed. The cloud meets both these requirements well. Firstly, there are several low-cost storage solutions available with the cloud. Besides this, the user pays for the services he or she uses, which makes the solution all the more cost effective. Secondly, cloud solutions offer commodity hardware, which allows effective and efficient processing of large datasets. It is because of these two reasons that Cloud Computing is considered an ideal infrastructural solution for big data analytics. Installation Hadoop on Cloud based and Linux platform:

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native

export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"#HADOOP VARIABLES END

export YARN_HOME=$HADOOP_INSTALL

export PATH=$PATH:$HADOOP_INSTALL/sbin

export HADOOP_INSTALL="/home/swastik/Desktop/hadoop/hadoop-2.7.2"

export HADOOP_MAPRED_HOME=$HADOOP_INSTALL

export PATH=$PATH:$HADOOP_INSTALL/bin

export HADOOP_COMMON_HOME=$HADOOP_INSTALL

export HADOOP_HDFS_HOME=$HADOOP_INSTALL

So that this command is used to install Hadoop HDFS System on the cloud and OS. Suppose if you want to create a Hadoop node so that we use that command for setting a Name node in HDFS.

hadoop namenode -format

start-all.sh

Hadoop on the Cloud One of the most popular frameworks used for big data computing is Hadoop. It is an implementation of MapReduce that allows distributed processing of large, heterogeneous datasets. There are many solutions that allow moving of Hadoop to the cloud. Cloudera is the best example for solving problem of Big Data on Cloud. It is a platform which is used to solve the problem of Big Data.

localhost:50070

After that we start shell by using this command:

Now checking this Hadoop installation in Web browser:

Page 8Vol 5 Issue 2

Trinity Tech Review Aug Dec 2019

Page 10: TRINITY INSTITUTE OF PROFESSIONAL STUDIES Trinity Tech … · Prof. Naveen Kumar Associate Professor, IGNOU Prof. (Dr.) Saurabh Gupta HOD (CSE) Dept, NIEC Ms. Ritika Kapoor Assistant

Ÿ Cost Effectiveness Running any software on the cloud reduces the cost of the operation considerably. Therefore, this model provides a cost-effective solution for Hadoop-based applications.

Ÿ Scalable Resource Procurement One of the key benefits of the cloud paradigm is that it allows scalable provisioning of resources. This feature is best put to use by the Hadoop framework.

Ÿ Support for Efficient Batch Processing Hadoop is a batch-processing framework. In other words, jobs submitted to Hadoop are collected and sent for execution at a fixed and temporal basis. Therefore, the resources are not loaded with work all the time. The cloud allows on-demand resource provisioning. Therefore, resources can be procured and elastically increased and decreased in capability depending on the phase of Hadoop operation, making the solution cost-effective yet efficient.

Ÿ Handles Variable Requirements of Hadoop Ecosystem Different jobs performed by Hadoop vary in the quality and quantity of resources they require. For instance, some jobs require I/O bandwidth while others need more memory support. Whenever a physical setup for Hadoop is created, homogenous machines that can support the highest capabilities are used, resulting in wastage of resources. The cloud solution allows formation of a cluster of heterogeneous machines to support the heaviest tasks yet optimize usage.

Ÿ Processing Data Where It Resides Most of the organizations keep their data on the cloud. Therefore, processing it on the cloud makes more sense. Migrating the data to a different execution environment will not only be time-consuming, but it will also be inefficient. The cloud can be used for creation of different types of clusters, with each component of the cluster having a different configuration and characteristics. The available options for running Hadoop in the Cloud environment have been described below.

Providers like Hortonworks, Cloudera and Big Insights offer Hadoop distributions, which can be deployed and run on public clouds provided by Rackspace, Microsoft

Some of the 30 popular solutions are the ones provided by Amazon's Elastic MapReduce12 and Rackspace. There are several reasons why running Hadoop on the cloud is gaining immense popularity, some of which have been listed below.

Deploying Hadoop in Public Cloud

Azure and Amazon Web Services16. Such a configuration is typically referred to as 'Hadoop-as-a-Service'. The issue with such solutions is that they use Infrastructure-as-a-Service (IaaS) provided by the cloud providers. In other words, the IT resources being used are shared between many customers. This gives the user little control over the configuration of the cluster. As a result, there is no concept of rack awareness that the user can configure and access. Besides this, the availability and performance of the cluster are also dependent on the VM (Virtual Machine) that is being used. The user is required to install and configure Hadoop on all of the available options, other than Amazon EMR. Amazon EMR provides what can be referred to as 'MapReduce-as-a-Service'. The users can directly run MapReduce jobs on the Amazon EMR-powered cluster, making the whole exercise fairly simple and easy for the developer. If the developer does not wish to use HDFS as the default storage solution, then Hadoop cluster can also be used with S317. Although, S3 is not as efficient as HDFS, it provides some unparalleled features like data loss protection, elasticity and bucket versioning. Some applications may require the implementation of these features, which makes S3 an irreplaceable storage solution. Besides this, Hadoop with S3 may be the storage solution of choice for organizations that already have their data stored on S3. A commercially available solution that uses this configuration is Netflix.

Deploying Hadoop in Private Cloud The private cloud allows the user to have better control over the configuration of Hadoop in the cloud. Several cloud providers like IBM's SmartCloud18 for deployment of Info Sphere Big Insights and IBM's PureData19 System offer PaaS solutions, which provide a pre-built setup for convenient deployment of Hadoop. The advantages of using Hadoop on a private cloud are as follows:

Ÿ Better control and visibility of the clusterŸ Better mitigation of data privacy and security concerns.

There are some commands are used to show the private data on Cloud in HDFS:

Ex: hadoop fs -put abcd.txt /hive

Key Considerations for Deployment There [2] Educational intelligence: Applying cloud-based big data analytics to the Indian education sector, Samiya Khan; Kashish A. Shakil; Mansaf Alam

Page 9Vol 5 Issue 2

Trinity Tech Review Aug Dec 2019

Page 11: TRINITY INSTITUTE OF PROFESSIONAL STUDIES Trinity Tech … · Prof. Naveen Kumar Associate Professor, IGNOU Prof. (Dr.) Saurabh Gupta HOD (CSE) Dept, NIEC Ms. Ritika Kapoor Assistant

[3] Comparison Study of Different NoSQL and Cloud Paradigm for Better Data Storage Technology Pankaj Lathar (CBP Government Engineering College, India), K. G. Srinivasa (CBP Government Engineering College, India), Abhishek Kumar (M. S. Ramaiah Institute of Technology, India) and Nabeel Siddiqui (M. S. Ramaiah Institute of Technology, India) are obvious advantages of running Hadoop on the Cloud. However, it is important to understand that this does not come without problems and potential issues. Some of the things that must be paid heed to before using Hadoop on the cloud are given below.

Ÿ The security provided by the Hadoop cluster is very limited in its capability. Therefore, the security requirements and criticality of data being shared with the Hadoop cluster need to be carefully examined, in advance.

Ÿ Hadoop can never be viewed as a standalone solution. When it comes to designing big data analytics applications, you will need to look beyond the Hadoop cluster and see if the cloud solution supports

Ÿ Typically, Hadoop runs on Linux. However, Hortonworks also provides a Hadoop distribution that works with Windows and is available on Microsoft's Azure Cloud. It is important to identify the operating system requirements and preferences before choosing a Cloud-based Hadoop solution.

Ÿ An important consideration that is usually overlooked is data transmission. Is the data already on the cloud or will it have to be loaded from an internal system? If the application needs transferring of data from one public cloud to another, some transmission fees may apply.

[2] Educational intelligence: Applying cloud-based big data analytics to the Indian education sector, Samiya Khan; Kashish A. Shakil; Mansaf Alam

Ÿ Using the VM-based Hadoop cluster may suffer from performance issues. These arrangements are good solutions only for development and testing purposes or unless performance is not an issue.

References[1] Big Data Computing Using Cloud-Based Technologies, Challenges and Future Perspectives , S Khan, KA Shakil, M Alam

visualization tools like Tableau and R, to serve your purpose in totality.

[3] Comparison Study of Different NoSQL and Cloud Paradigm for Better Data Storage Technology Pankaj Lathar (CBP Government Engineering College, India), K. G. Srinivasa (CBP Government Engineering College, India), Abhishek Kumar (M. S. Ramaiah Institute of Technology, India) and Nabeel Siddiqui (M. S. Ramaiah Institute of Technology, India)

Page 10

Data Warehouse (DW) defined by Inmon [1] as “collection of integrated, subject-oriented databases designated to support the decision making process” aims to improve decision process by supplying unique access to several sources. Data Warehouse contains historical as well as current data. It uses ETL tool i.e Extraction, Transformation and Loading tool to extract data from various sources, transform that data to desired state by cleaning it and loading it to a target datawarehouse. .ETL tools are very important for evaluation of Business Intelligence. Selection of right ETL tool is a fundamental step in achieving your strategic goals. The ETL process requires active inputs from various stakeholders including developers, analysts, testers, top executives and is technically challenging.

2. ETL Process ETL (Extract, Transform and Load) is a process in data

1. Introduction warehousing responsible for pulling data out of different source systems and placing it into a data warehouse. ETL Process involves the following:

Ÿ Cleaning the data (e.g., mapping NULL to 0 or "Male" to "M" and "Female" to "F" etc.)

1. Extraction: - Extracting the data from different source systems is converted into one consolidated data warehouse format which is ready for transformation purpose.

2. Transformation:- Transforming the data into a standard format may involve the following tasks:

Ÿ Joining together data from different sources (e.g., lookup, merge)

Ÿ Applying new business rules (so derivations, e.g., calculating new dimensions and measures)

Ÿ Filtering the data (e.g., selecting only specific columns to load)

ETL ����� ��� D��� W����������R����� K�����

A�������� P��������, CS � IT D���., TIPS D�����

Vol 5 Issue 2

Trinity Tech Review Aug Dec 2019

Page 12: TRINITY INSTITUTE OF PROFESSIONAL STUDIES Trinity Tech … · Prof. Naveen Kumar Associate Professor, IGNOU Prof. (Dr.) Saurabh Gupta HOD (CSE) Dept, NIEC Ms. Ritika Kapoor Assistant

(c) The Data Warehouse

Ÿ A m a z o n R e d S h i f t : A m a z o n R e d s h i f t i s Datawarehouse tool. It is a simple and cost-effective tool to analyze all types of data using standard SQL and existing BI tools. It also allows running complex queries against petabytes of structured data.

With the evolution of Business intelligence, ETL tools have undergone advances and there are three distinct generations of ETL tools.

The First-generation ETL tools were written in the native code of the operating system platform and would only execute on the native operating system. The most

5. Evolution of ETL

(b) An intermediate Data Processing Area (DPA) where the cleaning and transformation of the data takes place

Ÿ MarkLogic: MarkLogic is a data warehousing solution which makes data integration easier and faster using an array of enterprise features. It can query different types of data like documents, relationships, and metadata.

Ÿ Oracle: Oracle is the industry-leading database. It offers a wide range of choice of Data Warehouse solutions for both on-premises and in the cloud. It helps to optimize customer experiences by increasing operational efficiency.

(a) Data Sources hosting the data production systems that populate the data warehouse,

Increasingly there is a need to support and make business decisions on operational data. The current ETL process needs to move away from periodic refreshes to continuous updates. In traditional ETL tools, loading is done periodically during the downtime and during this time no one can access the data in data warehouse. The separation between querying and updating clearly simplifies several aspects of the data warehouse implementation, but has a major disadvantage that the data warehouse is not continuously updated. One approach to the general architecture of a near real time data warehouse consisting of the following elements:

4. Real-time data warehousing and ETL techniques

3. ETL ToolsImprovado: Improvado is a "data pipeline" -- pulling data from your marketing platforms (Facebook, Google Analytics, Ad Servers, CRMs and email platforms, etc.) and "piping" it into any data warehouse or visualization tool you choose. Brands and agencies love using Improvado because it saves them hours of manual reporting time and millions in wasted marketing spend.

Third Generation ETL tools have a distributed architecture with the ability to generate native SQL. This eliminates the hub server between the source and the target systems. The distributed architecture of third generation tools reduces the network traffic to improve the performance, distributes the load among database engines to improve the scalability, and supports all types of data sources. Third Generation ETL uses relational DBMS for data transformations. In this generation the transformation phase does processing of data rather than row by row as in second generation ETL tools.

commonly generated code was COBOL code because the first generation data was stored on mainframes. These tools made the data integration process easy since the native code performance was good but there was a maintenance problem.

[3 ]h t tp s : / /www.sp r ingpeop le . com/b log /da t a -warehousing-essentials-what-is-etl-tool-what-are-its-benefits/

Second generation ETL tools have proprietary ETL engines to execute the transformation processes. Second generation tools have simplified the job of developers because they only need to know only one programming language i.e. ETL programming. Data coming from different heterogeneous sources should pass through the ETL engine row by row and be stored on the target system. This was a slow process and this generation of ETL programs suffered from a high performance overload.

[2] h t tps : / /www.guru99.com/et l -ext rac t - load-process.html

[1] W. Inmon D. Strauss and G.Neushloss, “DW 2.0 The Architecture for the next generation of data warehousing”, Morgan Kaufman, 2007.

[4] https://www.dataintegration.info/etl

6. References

Page 11Vol 5 Issue 2

Trinity Tech Review Aug Dec 2019