My works in gitub, etc

20161216

Works in github, etc (except code for DRL of Montezuma‘s Revenge. See other slide for that)

Takayoshi Iitsuka (Staff Service Engineering, Hitachi Ltd OB)

20161216

20161216

Analysis of the intermediate layer of VAE (1)

Usually intermediate layer of VAE (Variational Auto Encorder) is

visualized by 2D figure like following (MNIST example).

But, generally, the dimension of intermediate layer is much higher.

High dimensional analysis of the structure of intermediate layer

looks like important.

https://github.com/Itsukara/vae-hidden-layer






20161216


By experiment, it turned out that only 11 dimensons are active in

intermediate layer even though the dimension of intermediate layer is 30. (experiment code does not inlude terms for sparseness)







20161216


Approximated the ditribution of each numeric

character images by 30D sphere. => 10.7% error







20161216


Approximated the ditribution by mulivariate normal

distribution (spheroid) => 4.8% error (much better!)







20161216

Analysis of the structure of MNIST data set

Assumtion: Simple stucture of the intermediate layer of VAE

comes from simpleness of the structure of MNIST data set.

Result: CONFIRMED Assumption!

784 (=28 * 28) dimensional space of MNIST data set has rather

simple structure that can be approximated by 10 spheriods. https://github.com/Itsukara/vae-hidden-layer

In this analysis, 50,000 images

have been used.

In previous analysis, 10,000

images have been used and

another analysis by 50,000

images become 5.8% error.

So, original structures in 784D

space have almost same

compactness as in 30D

structure in intermediate layer

of VAE. This result looks like

natural because VAE is

unsupervised learning and no

additional information.






20161216

Scripts to fully utilize GCP preemptible VM

Backgroud: GCP (Google Cloud Platform) preemptible VM is very

cheap (costs 1/3), but it may stop any time.

=> Some control is mandatory.

Published scripts in github fully utilize GCP preemptible VM

for people who try my A3C+OHL code.

Effect: The scripts enable the full use of IT resource.

It can use 4VPU x 8 VM with free trial condition (2 month , $300).

https://github.com/Itsukara/async_deep_reinforce/tree/master/gcp-preemptible-VM-instaces

4VPU

4VPU

4VPU

4VPU

4VPU

4VPU

4VPU

4VPU

1VPU

GCP preemptible VMs (2 months, $300 free trial)

AWS VM

(1 year free trial)

Periodically, watch VMs and re-start

stopped VMs (once per 1 min.), create

web page summarizing the status of

training (once per 5 min.)








20161216

K-means classification of MNIST dataset

In Do2dl research group, we read the book on AI and there was a

explanation of k-means classificaition method. I said that it might be

interesting to apply k-means method to MNIST dataset.

Because nobody other than me have a time, I wrote the code for that

and uploaded it to github.

Actually, there was a chapter written on much sophisticated

classification method EM-algorithm in following pages.

I compared both results. When starting with random images, k-means

was better (50% correct) than EM-algorithm (less than 50% correct).

When starting with images created from center of 20 images of each

number, EM-algorithm become better (71.5%) than k-means (60.5%)

https://github.com/Itsukara/ml4se



20161216

Tools for Renaming titles in BD-recorder

Backgroud: My BD-recorder has a web interface to rename titles in it.

But it takes time to rename many titles.

Developed tools to rename titles in BD-recorder using renaming rules.

Renaming rule replace strings/regexp in titles to another string.

https://github.com/Itsukara/diga-rename

1. Determine new titles by setting

renaming rules (renaming rules are automatically saved and

reused again)

2. Automatically rename titles in BD-recorder

using web interface of BD-recorder




20161216

Executable version of Python Tutorial

After I learned python using Python tutorial, I felt that it might be

convinient if I could execute examples in tutorial directly.

So, I extracted the examples in tutorial as python scripts and

published them on github 1 week after I started learning of python.

They can be edited and executed directly.

They output example code and thier execution result (including error)

https://github.com/Itsukara/Python-Tutorial-Scripts

After that, I learned juypyter notebook. So, I coverted entire tuturial to

jupyter notebook and published it on github 1 week after.

https://github.com/Itsukara/Python-Tutorial-Ipython























20161216

Program template for scraping with NodeJS and Selenium

I developed a tool to download content of an internet school

(dotinstall.com) to enable offline study, and published it on github and

informed on my twitter.

I got claim from dotinstall.com. So, I deleted it immediately.

But I think that the general program framework for scraping is usefull

for many people and is not illegal. So, I published program template of

scraping with NodeJS and Seleinum on github.

https://github.com/Itsukara/Selenium-Scraping-Template






20161216

Pico-os of MicroPython

I got ESP8266 (very cheap microprocessor with WIFI, i.e. less than

$5), on which MicroPython was embedded (The presenter of

introductory semintor of Python gave 2 ESP8266 to audience).

Unfortunately, the method the presenter wrote MicroPython to

EEPROM of ESP 8266 was not complete (I clould not write any file to

filesystem on EEPROM).

So, I build an environment to write to EEPROM by myself and re-

wrote MicroPython to ESP8266.

I measured the performance of MicroPython on EEPROM too.

(Roughly 1/1000 of Intel CPU. Memory size is also roughly 1/1000)

To make experiment on MicroPython easy, I wrote very very small

interface library to use MicroPython on ESP8266. I named it “Pico-os“

and uploaded it to github.

I wrote the article in blog and twitter too.

https://github.com/Itsukara/MicroPython-pos





20161216

1000 times speedup of re-calculation in big EXCEL sheet

There was a very big EXCEL sheet containing over 120,000 lines.

Re-culculation of the sheet took several hours.

They re-calcalate the sheet 13 times every month and it needs

human intervention in the middle of re-calculation.

I investgated the expressions in cells and found that repeated use of

COUNTA and VLOOKUP with perfect match mode look like the

cause. (Both founctions need O(n) time when they search n lines)

I reduced the use of COUNTA as onece and used VLOOKUP with

approximate mode (The latter needs only O(log(n)) time. Actually

more complex expression is needed)

By these alternation, the time of re-cauclation become few seconds.

(More than 1,000 times speedup)

20161216

Thank you for listening.

Technology

My works in gitub, etc