View
2
Download
0
Category
Preview:
Citation preview
Kirill: This is episode number 43 with AI Researcher Deblina
Bhattacharjee.
(background music plays)
Welcome to the SuperDataScience podcast. My name is Kirill
Eremenko, data science coach and lifestyle entrepreneur.
And each week we bring you inspiring people and ideas to
help you build your successful career in data science.
Thanks for being here today and now let’s make the complex
simple.
(background music plays)
Welcome, welcome, welcome to the SuperDataScience
podcast. Hope you're having a great week, and today we've
got a very interesting guest, Deblina Bhattacharjee. She is
calling in from Seoul, which is South Korea, and she is an AI
researcher working at one of the universities there, or doing
her degree at one of the universities there. And a very, very
interesting conversation that we had. It's all about AI, all
about artificial intelligence, the different types of algorithms,
different types of tools, different types of problems. So in this
podcast, you will learn what an optimization problem is and
the different approaches to the optimization problem. You
will also learn about the important tools for a data scientist
to learn now to prepare for the future of the field of data
science and artificial intelligence. You'll also learn about the
important techniques which are going to be valued in the
near future.
And of course, Deblina will tell us about the research project
that she is doing. Very interesting, it's about artificial
intelligence, but it's different to neural networks. It's a
different approach. It's not inspired by the human brain, it's
inspired by something else. And what exactly, you'll find out
inside this podcast.
And of course, we'll talk about many, many other things. So
we'll talk about Hadoop, we'll talk about Strata, Scala,
Spark, all the different tools, all the different applications,
and you will even see how Deblina's algorithm can be and is
used in health to actually help people have better and
healthier lives and even sometimes save people's lives.
So there we go. That's what this podcast is all about. And
can't wait for you to check out all the interesting and
insightful and even cool concepts that we're going to be
discussing. And without further ado, I bring to you Deblina
Bhattacharjee.
(background music plays)
Welcome everybody to the SuperDataScience podcast. Today
I've got a very interesting and exciting guest with us, Deblina
Bhattacharjee. Deblina, how are you going, and where are
you calling from today?
Deblina: Hello Kirill. Thanks for inviting me to be a part of this
podcast today. I'm doing great and I'm calling from Seoul
right now, in South Korea.
Kirill: Yeah, in South Korea. Wow, we've never had anybody from
South Korea on this podcast. What brings you to South
Korea?
Deblina: What happened was I was working on this automated
intelligence project for healthcare during my Bachelors. So I
just sent out the project proposal to a couple of universities
for pursuing my higher education. So at that time, one of the
universities -- I got offers, but then there was this particular
university which was exactly working on what I wanted to
work on in the future. And also, the kind of opportunities
that Seoul was giving me was really enticing. So I chose
Seoul and ended up as an AI Researcher over here.
Kirill: Ok, that's very cool. And we'll get to that part in a second.
But just out of curiosity, do you speak Korean? How do you
get by in Seoul?
Deblina: Yeah, you require Korean.
Kirill: Yeah?
Deblina: Yeah, you do require. But then my Korean is really bad. I am
just getting my grip. It has been a year since I started
learning Korean, so yeah.
Kirill: Do you know like kamsahamneeda?
Deblina: Oh yeah, kamsahamneeda, that's like thank you.
Kirill: Kamsahamneeda! I also know how to count, I think. [Counts
in Korean.]
Deblina: Oh yeah, exactly! Wow!
Kirill: That's pretty much all. How long did it take you to learn
Korean?
Deblina: So I told you I started like a year back. I know how to read
and write, but then my vocabulary is like really bad. So I
need to like pick up words whenever I come across people,
and there is this blank look that they give me and I give
them when we really can't get across what we want to say to
each other. So yeah, things happen. But then I pick up a bit
from here, and then I listen to people talking. I manage.
Kirill: Ok, alright, that's pretty cool. That's awesome to hear, and
it's a big jump to learn a new language, to move to a country
for your dream work. That's awesome. But give us a bit of
background. Where did you start? You obviously didn't start
in Korea. Where did you start and how did your life take you
here? What events happened in your life, what did you study
in high school, and just walk us through how you ended up
in Korea.
Deblina: Ok, so what happened was when I was like 8, my granddad,
he left me a treasure of close to 40 books on mathematics
puzzles and those things on pattern analysis that we used to
solve as kids. So that really drew my interest towards the
field of math and science. And those books were by the
famous Indian mathematician Shakuntala Devi, I don't
know whether you have heard of her or not, but those books
were really something and it drew me towards that field, and
I used to relentlessly solve patterns and used to look for
patterns around, and basically do anything that's related to
numbers, which is all data. And lots of finding patterns out.
So yeah, machine learning happened.
Also, the second thing that happened was at around 2003,
my dad gifted me a computer. So I was taken aback by the
amazing stuff and awesome, cool stuff that I can build using
the computer. So I started doing my pet projects at around
14, I guess. After that, I used to take part in the National
Olympiads. So one of the national cyber-olympiad in my
country --
Kirill: Sorry, this is in India, right?
Deblina: In India, yeah. So I topped it.
Kirill: You topped it. Congratulations. That's awesome. 14.
Deblina: Yeah. Thank you so much. Yeah, it's been a great journey
since then, and at 14, after that, I believed that maybe I
could code and take this up as a career. And the Bachelors
happened, and thereafter my Masters.
Kirill: Nice.
Deblina: In Machine Learning and AI. Yeah.
Kirill: That's awesome. And so what languages did you start to
code in when you were 14?
Deblina: Ok so the first language I started to code in was C, which is -
-
Kirill: Yeah, me too! That was my favourite!
Deblina: Yeah, exactly! After C, Java, and I used to do C and Java
with the advent of the standard template library. And C++, I
started coding with C++.
Kirill: Okay, beautiful. But your Bachelors, did you also do that in
C, C++ or did you move on to other languages?
Deblina: During my Bachelors, it was really diverse, because
depending on what I’m building, what class I’m taking, I
used to switch between languages because again it was like
a course requirement. So it ranged from everything —
sometimes I was doing C, C++, C#, sometime just using the
platform of Visual Studio exploring everything to F#. And
then I got into Python and R, I think in my junior year.
That’s the third year in my Bachelors. After that, all these
database related languages, too, SQL-related languages and
Hadoop. Yeah, I used to do all of them.
Kirill: Okay. And in your Bachelors, you said you studied machine
learning. Is that correct?
Deblina: Okay, so for Bachelors I didn’t have a specialty because in
India you need to study approximately 42 courses. You have
to do all of them, but at the end, in my senior year, you have
these electives. So, during that time, I went through
whatever can be the possible choices which is related to data
crunching and applying algorithms or models to solve them.
Machine learning was the best thing which was coming close
to it. And not only machine learning, I was always interested
in building intelligence systems. So I wanted to do
something really cool in artificial intelligence, so I took that
up and thought of doing my Masters.
Kirill: Okay. And just before the podcast, you were telling me about
how you scored this opportunity in Seoul and I think that
can be very useful to some of our students, or some of our
listeners who are still learning and maybe want to pursue a
Masters. Tell us a bit more. How did you go about finding
this great opportunity for yourself in Seoul? Did it just fall
on you?
Deblina: No, what happened was I used to always look up — I was
always a part of the communities online which are related to
machine learning and data science. I’m strictly speaking
about communities like “Data Science Central” and different
kind of opportunities on academic fronts, all those postings.
So I came across every possible lab, because for a Masters,
what you need to do is you need to not only file an
application to the university, but also send a separate
application to your supervisor, under whom you will be
working, and also to a different lab with respect to your
specialty.
Because of that, I screened across 200-300 opportunities
and finally, I struck gold at the 250th one, I don’t really
remember. (Laughs) I just saw — this lab, the work ranges
from designing intelligent traffic systems to modelling smart
cities, building intelligent health care solutions, everything
which is related to variable sensors, machine to machine
communication, and Internet of Things basically.
This was what I wanted to do because this is a very high
level overview of the names that I just said. But internally
what we do is real-time big data analytics and along with
that, building algorithms or even tuning our models to build
these systems up. This was all I wanted to do and this lab
was a perfect fit and there are so many opportunities in
Seoul and the best part is that this country has amazing
technological advancement. Coming from India, I didn’t
really know, and I don’t even think most of the people
around the world know, what this country has to offer.
Everything is really well-organized. Only language is a bit of
a barrier, but everything else is super fine.
Kirill: Okay. Wow, that’s really cool. Why were you always so
interested in artificial intelligence?
Deblina: As I said, I loved pattern analysis. And after that what
happened was, when I was building my project which I
started off at around the time I was doing my Bachelors, it
was something like an automated health care solution. And I
used to see this everywhere. Something like fever, or any
kind of first-time diagnosis that a person wants, and he is
not having a doctor around or doesn’t have the luxury of
visiting a doctor. So for such people I wanted to create a
really affordable or, if possible, free automated health care
solution which can be like your on-call doctor and you can
use that platform and type in whatever is your problem and
get the first-hand diagnosis.
There have been such projects by Microsoft and the likes out
in the world. But I also wanted to form a recommendation
engine for nearby doctors, just in case of emergency. So I
made that and then I thought, “Okay, now that I can do
that, why not AI and contribute further to this field?” So
that’s how AI happened.
Kirill: Wow, that’s really cool. Tell us then about what are you
doing now in your research. You mentioned that you’re
about to graduate very soon, right?
Deblina: Yeah, just two months away.
Kirill: Oh, wow! Congratulations. It must have been a very long
journey.
Deblina: Yeah.
Kirill: All right. So, what are you doing in your research?
Deblina: As I said, in my lab we work from modelling all these
intelligence systems and basically designing this smart city
concept which is right now happening around the world. My
work specifically, if you ask us to, is to evaluate the different
models and techniques in machine learning and apply them
to solve these problems. These problems range from
sometimes just automating a traffic system or the health
care, but the thing is we need to integrate this all together
and make it an end-to-end system, the vision is a city of
future and totally smart.
So my work is to make sense out of the data that we receive
in real time, which is huge, and also design solutions,
sometimes mathematical models for solving these problems.
So first, what I do is I evaluate whatever existing techniques
are and sometimes I come up with my own models. Recently
I developed an entire algorithm from scratch and its
inspiration is quite interesting. If you might ask, I would tell
more about it.
Kirill: Yeah. Tell us more about it. So you basically built a whole
library, is that right?
Deblina: Yeah, I built a whole library. First of all, the entire design
because it wasn’t there. So I designed the model because
before building a library, I need to make sense of it
mathematically so that they completely understand.
Kirill: Yeah. What language was this library in?
Deblina: This was basically in C because the kernel of any language
is always C, the lowest kernel that it’s built on. So in order
to understand properly, I always start with C. So the thing
that happened was — if you look around and you look at the
way how trees branch themselves out, you would see that
there’s a pattern in how they branch themselves out.
Kirill: Yeah. I guess it depends on the type of tree, but yeah.
Deblina: Actually, not even the type of tree. You can even look at a
cactus, any species of tree in that entire kingdom, the plant
kingdom. You can see there’s a pattern of branching. That
pattern is basically the Fibonacci series, which is 1-1-2-3-5-
8-13. What happens is, the ratio of the two numbers, if you
would divide it, it becomes a golden ratio which is prevalent
everywhere, like even how our galaxy spirals out. So from
there, I thought “Wait a minute. These trees do not have a
brain, so to speak, they do not look intelligent. So how do
they know exactly in which direction, angle to grow in in
such dynamic environments, something that they don’t
know?” The environment can be really non-stationary. I’m
using really technical terms—
Kirill: Yeah. Basically, how do the trees know what the Fibonacci
ratio is?
Deblina: Yeah, exactly. And somehow they just gain that overall
stability. Even if they’re slanted above the ground, they still
have the stability. So I decided to dig in about their
mechanism and what works, and then something blew my
mind. I didn’t know this, but they can communicate, see,
hear, and even have a memory of 40 days. It’s all there, the
biologists have researched it. The best thing is that they can
learn and they have 13 different discovered sensors and we
humans just have 5. So, the kind of sophistication that
these trees have is just mind-blowing. I just decided to
model their intelligence and design an algorithm based on
this. So it’s just strictly nature-based, like any other nature-
based algorithms of soft computing. So I built this algorithm
and I made it to solve optimization problems in the
applications that we work in our lab.
Kirill: Okay. So how well is it doing? Is it beating other algorithms?
Deblina: Yes, yes, perfectly. With respect to accuracy, it’s definitely
beating other algorithms but not so much with respect to
time. We already have better algorithms because the speed
of any such nature-inspired algorithms is hindered a bit
because it has an enormous number of parameters on which
such algorithms are based. So, the parameter tuning is
required, and that takes a bit of time.
Kirill: Okay. All right, just for everybody out there, I just wanted to
say, in terms of Fibonacci numbers, I have already heard
about them, but if you haven’t, what Deblina is describing is
very interesting, they are all over the world indeed. And the
golden ratio, if you divide those numbers, it’s basically 1, 1,
and then you just keep adding. So 1+1 is 2, 2+1 is 3, 3+2 is
5, 5+3 is 8, 8+5 is 13 and so on. And if you divide one
number by the other and you take the limit of that, it will be
1.1618 something. Basically, that number 1.1618 is called a
golden ratio. You can see it all over the world.
Basically, right now, if you pause this podcast and you take
a ruler and you measure the distance between the tip of
your middle finger to your wrist, so your hand, and then you
measure the distance between your wrist and your elbow, I
think it is, you will find that the ratio between them is
exactly 1.1618. How crazy is that? Even us humans, we are
designed by that ratio. And the fact that trees grow based on
that ratio is no coincidence. It’s just anything that is
natural, like starfish grow in 1.1618, galaxies spiral in
1.1618.
There’s lots of debate about which galaxies spiral like that
and which don’t but nevertheless, you can see it all over the
world. It’s a real mystery, but I’m not surprised when
Deblina says that an algorithm that is based on the golden
ratio can outperform others just because it takes into
account something that is so fundamental and all around
us. Yeah, that’s pretty interesting. But when you developed
this algorithm, and you’re saying you’ve come up with some
applications, can you talk us a bit more through the
applications or possible applications of this library that
you’ve written?
Deblina: Okay, so what I’ve done, it’s basically an optimization
algorithm so you can solve optimization problems using this
algorithm. Whenever I have presented my papers based on
this algorithm, there have been a lot of curious eyes around
and equally questionable minds. Some of them couldn’t
really get a grip on it. I totally understand that. They just
said it might be for Law School. So after the results that I
presented, and there were successful demonstrations where
I just selected an application like medical imaging and I
processed numerous CT scan images to find the location and
area of growth of tumours.
That was one application that I did and the results were
phenomenal because I just got it presented at one of the top
AI conferences in the United States this year and it was well
received. I’ve also applied it to other applications because we
do a lot of sensor data processing, so to find the optimal
features from that sensor data I have used this algorithm of
mine.
Kirill: Okay, that’s very interesting. And I want to slowly start
getting into the more applied area of artificial intelligence
and data science. To start off, can you please describe for
our listeners what is an optimization problem?
Deblina: So an optimization problem is — there are two types of
optimization. One is local optimization and the other is
global optimization. So when you’re looking for something
and you know what will be the result, the final result
becomes your global optimal solution. But when you move
towards that trajectory to get that final result, you get across
some local best results—
Kirill: Local maximums, yeah?
Deblina: Yeah, local maximums, exactly. I’m trying to just break it
down in a non-technical manner, which is a bit difficult for
me.
Kirill: Thank you for that. So, you have a global maximum which
you’re trying to find, but you have local maximums that are
possibly going to look like the global maximum and you
might think that that is the best option.
Deblina: Yeah. That’s what basically any optimization algorithm does.
It finds those solutions. Again, there are different kinds;
single optimization where you just have one objective like,
“Okay, I need to go to the grocery and get some stuff and I
need to get this product.” So that’s like one objective. Multi-
objective optimization becomes like, how many objectives
you are addressing. So it’s like, “I will go to the grocery store,
get that product, but then it has to be of the minimum
possible price.” I have two kinds of things to look onto, so
that becomes multi-objective. These algorithms have
sometimes conflicting objectives, like something increases
and something decreases, sometimes both are increasing so
you have a maximization problem, many different objectives
and then based on that, the algorithms are built.
Kirill: Okay. All right. That makes sense. And then the more
objectives you have — for instance, when you have one
objective like “Get to the store,” then you have lots of
different ways to get there. You have lots of different paths
that you can take. That’s an optimization problem.
Deblina: Exactly, yeah.
Kirill: But then when you have multiple objectives, for instance,
“Get to a store and buy the cheapest butter,” you have so
many more. You have so many different types of butter to
choose from, so many different paths to take, and then you
can also go to different stores. That’s even three optimization
problems, but basically, is it correct that you have to
multiply all of the options? It’s not just a simple addition, it’s
a multiplication of all of the options.
Deblina: It’s a multiplication of the options and then you have to form
a single function of all those options together.
Kirill: Hence something called “the curse of dimensionality,” right?
Like, if it takes you 0.01 seconds to solve one optimization
problem, then when you have a thousand of them, it doesn’t
mean it’s going to take you like 10 seconds to solve those
problems. You have to multiply that. It’s going to take you
like a million years to solve them altogether. That’s why it’s
such a big deal in artificial intelligence that you cannot just
brute force through these problems. Even given the
computational power that we have now, you just cannot
simply brute force through optimization problems. You have
to come up with smart ways of solving them.
Deblina: Yeah, because brute force for such kind of a problem would
definitely take two million years for sure. You have so many
parameters to take care of.
Kirill: Yeah, exactly. And the funny thing is that, our whole lives,
all of our lives, whatever we are doing in life, is an
optimization problem. You have to get to school, what time
do you wake up in the morning, do you pick up your kids
and then you go to work or do you go to work and then pick
up your kids. In what order and how do you do certain
things, what routes and paths you take, that’s all an
optimization problem.
And funnily enough, if we want to build artificial intelligence
that can rival us in terms of intellect, it has to be able to
solve optimization problems as good as we do. As humans,
somehow we can solve these optimization problems. Natural
selection has given us this ability and this amazing tool
called the brain which allows us to solve these optimization
problems. I think that’s why we’re trying to build these
neural networks, these deep learning techniques, because
they can mimic the human brain in the hopes that then
robots will be able to do the same. Is your algorithm based
on neural nets or are you taking a different approach?
Deblina: My algorithm is not based on neural nets because clearly
trees do not have a brain and a central nervous system.
Kirill: That makes sense.
Deblina: Yeah. But then, I have totally worked on something that you
just mentioned – natural selection. So, guided by natural
selection, there’s a continuous reinforcement loop of penalty
and reward and the system builds on that. You know how
natural selection works, right? I do something, it’s a good
thing for me, and I will continue doing that. If it’s a bad
thing, I won’t do that. That’s the thing. My algorithm, the
library that’s built is just that. There’s a loop, an underlying
reinforcement algorithm, which guides this for natural
selection. That’s how it functions. But I do understand how
neural nets and all possible deep learning techniques work
on because again, they are inspired from a human brain.
Their inspirations are different but the basic natural
selection theory is the same for both of them.
Kirill: Okay. Very, very interesting. And have you ever compared
your algorithm solving an optimization problem against a
neural network solving the same optimization problem?
Deblina: Yes, definitely. I have to do that because I get questioned at
conferences. I have compared it to other existing algorithms.
I would name a few, if you may.
Kirill: Yeah, sure.
Deblina: Okay. The particle swarm optimization, the artificial bee
colony optimization, and the ant colony optimization are
some of the really great optimization algorithms which are
out there from soft computing field. And in the deep learning
field, I have compared it to — because obviously there are
training and testing involved, that’s a different approach for
the deep learning field. Again, I need to subdivide and show
why I’m applying it to deep learning. That time, I compared
it with recurrent neural networks, and I think the last time I
compared it in one of my applications it was with restricted
Boltzmann machines.
One of the things that I saw, and it was quite controversial
in one of my presentations, it was like recurrent neural
networks and it was also having a fuzzy logic base with that
neural network scheme. With increasing number of
generations of run, I mean, when it was running for more
than 400 generations – I’m talking about image processing –
it did not cover the exact regions of interest on that image.
But rather the contour that we wanted to select on an
image, it got scattered all over. And this was deep learning
doing it. So somehow there was some problem, but then
when I did it with my algorithm, maybe because it was
having a continuous feedback loop—I mean, I know deep
learning has that, but this was more based on experiences.
So this library took more time but it gave better results, I
mean exactly the regions of interest on the image. So that’s
how I compared it.
Kirill: Interesting. You’re saying that your contour was contiguous?
Deblina: Yeah.
Kirill: Okay. Very interesting. I thought the deep learning area for
image recognition is convolutional neural nets?
Deblina: Yeah, it is, but then what I did was I used this particular
fuzzy neural net system. Exactly, it was fuzzy convolutional
neural nets.
Kirill: Fuzzy convolutional neural nets. Okay.
Deblina: Yeah, exactly. So that was the one which I compared it with.
Maybe because I was working with R last night and I got
confused—anyway, that was how I did it. So, the contour of
interests were scattered for FCNN and not for the algorithm
which we developed.
Kirill: Cool. So your algorithm is pretty up at the top there.
Interesting. We might be studying that very soon. If you
write a Python library, maybe.
Deblina: Yeah, sure. Maybe. (Laughs) I will do that.
Kirill: All right, cool. And once you finish your degree in two
months, where do you think that will take you?
Deblina: Right now, I really don’t know because I’m totally focusing
on this graduation stuff. I’ve been writing my thesis. And
after that I’m headed to Intel for two months in their R&D
section for some work on Internet of Everything. That’s IOx,
a new thing. And after that I will just be open to
opportunities. I don’t really know, but I would definitely love
to learn and just keep doing what I’m doing.
Kirill: Okay. And do you have some kind of a dream, some problem
that you want to solve in the world using artificial
intelligence?
Deblina: As I told you, for me, health care is one of the things that I
really want to get out there and totally automate it. I’m not
saying like perform surgeries and stuff, but for the first-time
diagnosis, or even helping the doctors, making their work
easy. So that would be great, so that the time for diagnosis
is saved considerably and the accuracy of your prognosis,
when you’re doing it, that will be much better. I want to do
that.
As of now, I haven’t really thought long-term what I’m going
to do, but it’s going to be everything related to AI. Another
problem is that right now, the field of AI has a lot of
capabilities like NLP, text and speech, knowledge recovery,
image processing, separately, but none of the work that we
have done with respect to all the work around the world
happening right now. We need to integrate it and make it as
one single system. That hasn’t been done as of even 2017.
The day that we build an end-to-end AI system, that would
be great, with all these functionalities. That would be the
ultimate aim of any AI researcher.
Kirill: Okay. That’s a big undertaking, definitely. Let’s talk a bit
about where the field of AI and data science is going in
general. From what you’ve seen around the world and from
the research you’ve done, what do you think the future is of
artificial intelligence?
Deblina: As far as I know, from the opinions of the scientists and the
researchers with whom I’ve met in conferences around the
world, what all of us are thinking is that the field of data
science is headed towards a fusion with intelligence systems
to create smart cities of the future. That’s the main vision. I
also strongly believe that with the on-going research with
real-time big data and Internet of Everything right now, data
science is going to explode in the future with a lot of stuff
happening. As of today – I just read this this morning when I
got up – Strata and Scala have been replaced by Hadoop
already—I’m sorry, Strata and Scala have replaced Hadoop.
Kirill: (Laughs) Just a bit of a different direction.
Deblina: Yeah. And there are these DataOps tools which are being
developed to help data engineers, like DevOps tools which
used to be previously for all the developers. Now there are
DataOps. They have been built by companies like Nexla and
DataKitchen. It’s really great, how the data field is
progressing. And also the automated predictive analytics,
which is the thing which is happening right now. This
predictive analytics had been automated last year and the
data robot was created and people were like, “Okay, by 2025
everyone is going to be out of jobs.” But then it was a bit
soon to say, because the data robot as of now just speeds up
model development for any model that you’re building, it’s
like the one-stop solution to speed up whatever you’re
implementing in the industry. It has a long way to go,
definitely, it’s a budding field. Both AI and data science
together is going to be really powerful in the future.
Kirill: Yeah, I agree. I don’t think data scientists will be out of a
job. I think that’s going to be the last industry to go.
Deblina: I really don’t feel so because as of yesterday there were more
than 4 million jobs out there for data scientists.
Kirill: Wow! Everybody listening, do you hear that? 4 million!
Deblina: Yeah, anyone with skills like Python, R, SQL, Hadoop, you’re
good to go. You are looking at good jobs straight down the
line for 25 years, a stable career.
Kirill: Stable and explosive career. What would you say are the
most important tools for data scientists?
Deblina: As I said, definitely Python, R, SQL. Spark right now
because obviously it has taken over Hadoop. Basically the
entire scikit-learn/numpy/TensorFlow of Python. If you can
do that, that would be great. So these are some tools, and
even I use that on quite a regular basis. Among the
techniques, if you might ask, there are clustering regression,
neural nets and decision trees. Most importantly, there are
two things, which is support vector machines and ensemble
learning, that you need to learn if you really want to get into
data science because all the companies out there work with
ensemble learning. Everything is an ensemble.
Kirill: Okay. That’s important. And why would you say SVMs are
an important tool?
Deblina: Yeah, SVMs are really powerful and they work very
differently than the existing clustering or regression
techniques. The way how they work is really beautiful, the
accuracy of the results that they get because of that
mechanism. And from that accuracy, it has been applied to
a lot of product designing, modelling in most of the corporate
sectors that I’ve come across. So that’s why it’s viewed as an
important tool in your career.
Kirill: All right, give us a five sentence breakdown of how SVMs
work. What is their main advantage?
Deblina: Okay, say a set of data is there and you need to classify it
into two classes. So, for example—Kirill, if you could give me
two classes?
Kirill: Apples and oranges.
Deblina: Okay, so apples and oranges. Great! Your machine needs to
know—
Kirill: I like to participate in your examples. I can see how you’re a
great researcher.
Deblina: Okay. (Laughs) So, you have a bunch of apples/oranges
combinations and your machine should classify which one is
apple and which one is orange. That’s the objective. So
thereafter, the next step is how will the machine know that.
The model of SVM, what it does is it builds something called
a hyperplane. To be non-technical, I would say that’s the
margins between the two classes. So those margins, what
happens is any other algorithm would find a similarity, like
which is an orange for class ‘orange’ and which is an apple
for class ‘apple’. But what SVM does is, among the apples, it
will select which has the most similarity with an orange, just
the opposite. And with the orange, it will select which has
the most striking resemblance with an apple. It finds out the
outliers or the mistakes in a very non-technical manner and
puts that as your margins. And based on that, the remaining
data is classified. That’s how SVM works.
Kirill: Yeah. It’s very counterintuitive if you’re thinking about it in
terms of the other algorithms, where they look for the most
apple-y apple or the most orange-y orange, and then they
build their classes based on that.
Whereas here, you’re looking for the really cool orange which
actually looks like an apple and really a rebel apple, which
looks like an orange, and based on that you’re like, “Oh, so
those are my boundaries.” And then you’re like – bam!
Hyperplane in-between them and that’s it. That’s a
completely different approach to classifying.
Deblina: Yeah.
Kirill: Okay. That’s cool. Another interesting thing you
mentioned—just to summarize for the guys listening, tools of
the future are Python, R, SQL, Spark, scikit-learn and
TensorFlow, and techniques of the future are clustering,
regression analysis, neural networks, support vector
machines and ensemble learning among others.
And other interesting things you mentioned, and these are
just from before on this show, Strata and Scala are replacing
Hadoop and Spark has taken over Hadoop. Can you go into
a bit more detail on that? Like, Hadoop is such a trendy
buzzword, everybody wants to learn Hadoop. Does that
mean that listeners on this show shouldn’t be learning
Hadoop and they should be learning Strata, Scala and Spark
instead?
Deblina: I wouldn’t say that, but right now—again, it depends on
what the listeners want to do and what they’re looking for
with their model to solve. But why I’m saying that Strata,
Scala and Spark have replaced Hadoop is because right now
what the researchers are doing, in all these conferences that
I travelled to, I saw that Hadoop has been there for quite
some time. And right around from 2007 to today, it has
almost been replaced by these technologies and the
companies are also looking towards these technologies,
obviously for the real-time analysis of these technologies.
Right now, I don’t think that listeners should just stop
learning Hadoop because even I use Hadoop on a regular
basis. But I find Spark much easier, and I find it has more
parts to it rather than Hadoop strictly because of the real-
time processing that it can do with big data. I don’t know so
much about how companies or even academic organizations
are using Strata and Scala because I don’t have full
knowledge of that, but I can speak for Spark for sure.
Kirill: Okay. That’s very interesting. And what would you say to
somebody out there who runs a business who is using
Hadoop right now? Do you think they should start
considering switching to Spark, or are they fine for the next
couple of years?
Deblina: That depends on what business that person is running, but
definitely you should be starting to make a transition to
Spark. I strongly feel so. Again, it’s not a personal opinion;
it’s like speaking the minds of everyone who I’ve come across
in the past year.
Kirill: Let’s say they’re running an online store, so they have a lot
of OLTP/OLAP type of things. What’s the main advantage of
Spark over Hadoop?
Deblina: Okay, if they’re running an online store, basically it’s more
neater, it’s nicer in the way how it works with respect to
handling and processing the data and also the kind of
intuition it has towards modelling the data into different—if
you’re looking towards classification and stuff like that, put
into clusters, Spark is better. Those are certain advantages. I
don’t know so much about speed and stuff because right
now even I am in a jiffy, like “What should I be using?
Hadoop or Spark?” Right now I’m trying my hands on both.
So the moment I get to a proper thing, I will put that up on
my LinkedIn profile.
Kirill: Okay, sounds good. We’ll be looking forward to that. I’ve got
a few quick questions for you, rapid-fire type of questions.
Are you ready?
Deblina: Okay.
Kirill: All right. What’s the biggest challenge you’ve ever had as a
data scientist or machine learning expert or AI researcher?
Deblina: That would be handling unstructured data from all possible
sources and giving it a proper structure. That’s very
important.
Kirill: Okay. That’s a very deep challenge. I can totally appreciate
that. What’s a recent win that you can share with us that
you’ve had in your role, something that you’re proud of?
Deblina: It would be the completion of my recent project, the
intelligent health care system of those CT scans detection
that I presented in that conference of artificial intelligence in
the United States.
Kirill: Do you think that can have a real world application and
soon we will be using those?
Deblina: Yeah, because a project got acquired by a hospital.
Kirill: Oh, nice. Very cool.
Deblina: Yeah, it’s giving results.
Kirill: Congratulations. That’s awesome.
Deblina: Thank you.
Kirill: It reminds me of the podcast with Damian Mingle, I think it
was number 13, where he came up with a machine learning
algorithm to predict sepsis. It’s always very cool to see
people using artificial intelligence for good.
Deblina: Yeah, I remember that. I heard that.
Kirill: Yeah, that’s awesome. Thank you. Now I have two people
who are saving lives. That’s awesome. You never mentioned,
what’s the name of your library that you’ve developed, if we
can look it up or something later on?
Deblina: I told you it’s not on Python yet, so—
Kirill: Yeah, but even in C, did you give it a name like a codename?
Cobra or something like that?
Deblina: No, because right now I haven’t put up the name. It’s still in
the beta version. So once I do that, I will keep you posted.
Kirill: All right, sounds good. Next one is, what’s your one most
favourite thing about being in the field of data or being a
data scientist, being an AI researcher?
Deblina: I really like the power that we have to build awesome and
cool stuff with data, making machines to think more like us,
and it’s just the beginning. We can create a huge impact on
creating a smarter tomorrow. Take, for example, Alexa and
Google Home – it’s just the beginning. I really like that about
our field, and also how in demand it is among the various
sectors around the world.
Kirill: Yeah, totally. But on that, this is kind of my question that I
really wanted to get your opinion on after we warmed up
with everything else. What do you think about a lot of people
saying that AI is a threat, that not only are we going to
develop smart homes or smart cities and help people in
health care, but actually we’re going to create super
intelligence or artificial general intelligence which will have a
prerogative of its own which will eventually decide that
humans are not meant to be on this planet? What are your
thoughts about that?
Deblina: I totally understand that because even we keep thinking that
and that’s one of the issues that I discussed whenever there
are meet-ups and business things. I second that. I feel,
however, it can be solved by going through—okay, we need a
stricter security. You know how it’s coming because just in
2017 every person’s personality can be assessed from his
online data. So imagine from all the data sensor services and
commodities that a single person uses, how easy it will be to
know everything about a person. And I’m not talking about
an end-to-end integrated AI bot. That’s too much into the
future. I’m just talking about simple intelligence machines.
That is equally powerful and dangerous. So what we need is
a data architect or scientist with a solid information security
background. He will be really indispensable. If we can build
a security mechanism around it, it’s good to go, yeah.
Kirill: All right. That’s important. It’s good that you have the
confidence that we’ll be safe. (Laughs)
Deblina: Yes. (Laughs) You need to stay positive. Whatever you’re
doing, you need to just enjoy it and think that it’s going to
work. That’s how I look at it.
Kirill: That’s true. Okay, it’s been a real pleasure having you on the
show. Thank you so much for coming on.
Deblina: Thank you so much, Kirill.
Kirill: So how can our listeners follow you or maybe connect with
you, maybe even ask you some questions if they’d like to
learn more about your career?
Deblina: Okay, I’m on LinkedIn. I go by my name – Deblina
Bhattacharjee. And you can also connect with me via Gmail.
I go by deblinafordata@gmail.com, so I would be definitely
open to sharing ideas, discussions, basically learn. Yeah.
Kirill: Beautiful. Thank you. And one last question I have for you
today is, what is a book that you can recommend to our
listeners to become better in the space of data science or
artificial intelligence?
Deblina: Okay. This book would be the one with which I started off:
“An Introduction to Statistical Learning” by James Witten
Hastie. In my opinion, it was a great read. The book is free
and it’s just so good. Also, if you might allow me, I would
just recommend another book which is “Applied Predictive
Modelling” which has thorough examples and explanations.
So, these two books, yeah.
Kirill: Okay, beautiful. So, “An Intro to Statistical Learning” and
“Applied Predictive Modelling.” Actually, I also wanted to
mention or reiterate that author that impacted you at the
very beginning of your journey. If anybody is interested to
see how Deblina started out, her name was Shakuntala
Devi, right?
Deblina: Yeah.
Kirill: Can you pronounce that for us? How do you pronounce it
correctly?
Deblina: Shakuntala.
Kirill: Shakuntala Devi. Do you think they’re available in English,
like those pattern recognition books?
Deblina: Yeah, all of them are in English.
Kirill: Okay. I’m really curious to check that out. You know, it’s
always interesting to go back to the source, where everything
started out.
Deblina: Yeah, sure.
Kirill: Yeah. Once again, thank you so much for coming on the
show and sharing all this wealth of knowledge with all of our
listeners.
Deblina: Thank you so much for inviting me, Kirill. It has been a
pleasure.
Kirill: So there you have it. I hope you enjoyed today’s
presentation. It was quite an overwhelming discussion,
actually. There was lots of interesting things. You can tell
that Deblina is very well-versed and very knowledgeable
about all of these subjects and has a lot of experience in all
these different tools and techniques. And it was just a great
pleasure that she was able to share these things with us.
Perhaps for me the biggest takeaway for me from this
episode was what Deblina said about her algorithm and how
it’s structured. I always thought that neural networks are
the most powerful thing and they’re the endgame for
humanity in terms of artificial intelligence, but in reality it
actually turns out that it’s not. It’s very interesting that such
a forward-looking researcher like Deblina has chosen a
different approach, an approach inspired—other than by
human consciousness and the human mind, Deblina chose
an approach inspired by the kingdom of plants and the
natural selection that has been happening there.
Based on some of the tests that she’s done, her algorithm is
performing at least as good as the existing ones out there.
Basically, it shows that there are lots of avenues for artificial
intelligence, not just neural networks, and also kind of
underlines how broad this field is and how many
opportunities there are and how interesting it can be. So
pretty much, as long as you have the passion and have the
drive to learn the programming skills that you need, the
world is your oyster. You can come up with any type of
inspiration and code that and see how that goes.
So there we go. That was our podcast on artificial
intelligence. I hope you got some very valuable takeaways. If
anything, now you know which tools to focus on and which
techniques to study to prepare for the world of the future.
You can find all of the resources mentioned on this podcast
including ways to connect with Deblina at
www.superdatascience.com/43. Also there, you can get the
transcript for this episode. And definitely make sure to
connect with Deblina on LinkedIn and follow her career.
And by the way, I just had a look at Shakuntala Devi, and
she’s considered a human computer. This is a person who
can multiply 13 digits by each other within like several
seconds. We’re going to include that in the show notes as
well. I think that could be a very interesting thing to have a
look at as well. And on that note, thank you so much for
being here. I really appreciate you and I can’t wait to see you
next time. Until then, happy analyzing.
Recommended