61
FeatFinder Presentation Amazon Web Services / Microsoft Developer Contest

FeatFinder Presentation Amazon Web Services / Microsoft Developer Contest

Embed Size (px)

Citation preview

FeatFinder Presentation

Amazon Web Services / Microsoft

Developer Contest

The Idea (1/3)

• Have you heard about the six degrees of separation Theory ?

“Six degrees of separation is the theory that anyone on earth can be connected to any other person on the planet through a chain of acquaintances that has no more than five intermediaries.” Definition from Wikipedia

The Idea (2/3)

In 1967, Stanley Milgram devised a way to test the theory, which he called "the small-world problem". He randomly selected people in the American Midwest to send packages to a stranger located in Massachusetts, several thousand miles away.The senders knew the recipient's name, occupation, and general location.They were instructed to send the package to a person they knew on a first-name basis who they thought was most likely, out of all their friends, to know the target personally.That person would do the same, and so on, until the package was personally delivered to its target recipient.It only took (on average) between five and seven intermediaries to get each package delivered.

The Idea (3/3)

FeatFinder purpose is to study the relations between Musical Artists in a funny way.

A major part of the general public don’t like abstract mathematical concepts.

In order to produce a

really interesting application,

4 sub-concepts have been implemented :

Sub-Concepts (1/4)

The FeatFactor is the « Distance » between two different artists

Examples :• Madonna have made a song featuring

Britney Spears• Britney Spears have made a song featuring

Jamie Foxx• FeatFactor(Madonna, Britney Spears ) = 1• FeatFactor(Madonna, Jamie Foxx ) = 2

Sub-Concepts (3/4)

The FeatWay is the shortest path possible between two different artists

Exemple: How to go from Céline Dion to 50 Cent ?

Sub-Concepts (3/4)

FeatRing is a path that gets fromany given Artist to himself(without visiting two times the same Artist)

Exemple:

Feat Ring for Madonna (with size 7)

Sub-Concepts (4/4)

With the FeatGraph, you can easily view details from the general Graph and have a graphical representation of it.

The Demo !

• Enough theory, it’s time to watch it for real!

• Note : FeatFinder is made of 7 different modules. This demo only shows the “visible” part. All the “calculation” part will be shown later in this presentation

• Open the file “FeatFinder-demo.htm”

on the root archive folder

How does it works ?

• Ok, we’ve seen what it looks like.

• Let's see how FeatFinder works !

• Note : If you’re from the Marketing Department, you can directly jump to the next section .

How does it works ?

• FeatFinder has been developed in C# and uses an SQL Server Express DataBase.

How does it works ?

• FeatFinder is structured in modules to facilitate independent development

1 – Business Objects

1 – Business Objects

• Those classes are representing objects like an Artist, an Album, a Featuring, etc.

• They are used to store information and share it between modules.

1 – Business Objects

• This part have been very quickly developed, mainly by making full use of the power of IntelliSense and Autocompletion.

2 – Web Services

2 – Web Services

• FeatFinder uses the power of the 3 Web Services offered by Amazon which were available at the launching of the developer’s contest.

2 – Web Services

• Amazon E-Commerce Service – Amazon E-Commerce Service is used to

get informations about artists and albums.

– This informations are usedin the Client Interface andduring the calculationof the graph.

2 – Web Services

• Alexa Web Information Service– This Web Service is used in the Client

Interface to give informations to the user.

2 – Web Services

• Alexa Web Information Service

– Advantage: Alexa offers an impressive amount of informations on websites collected by an intense crawling of the web, and data gathered with the Alexa Toolbar.

– Problem: Search results are not highly pertinent. Searching for an artist name will rarely return the official website as the first result.

– Solution: Using the artist name and adding the “.com” extension gives good results in most cases.

2 – Web Services

• Alexa Web Information Service

– Problem #2: Website’s thumbnails are not currently available through Alexa Web Information Service.

– Solution #2: Immense thanks to Geoff Mack from Alexa for providing me a way to access this thumbnails.

2 – Web Services

• Amazon Simple Queue Service (Beta)– This web service has been used to share data

between modules.

• Disadvantages :– As this service is still in Beta, there is a limit of

4,000 queue entries at a time. FeatFinder sometimes enqueues data by “packs” of ten in order not to be limited by this maximum value.

– A request can take seconds to be processed. This can be annoying, particularly when a project such as FeatFinder extensively use the remote Queue.

2 – Web Services

• Amazon Simple Queue Service (Beta)

• Advantages :– FeatFinder can easily be “exploded” in

different places in the world, the simple queue service gives data persistence.

– The application can be brutally stopped. When the application will be restarted, the same state will be recovered because data is remotely stored.

2 – Web Services

• This project was my first occasion to use Web Services. I’ve been surprised by the facility with which it’s possible to integrate it in 3 steps :

1. Adding the web reference in Visual Studio

2 – Web Services

2. Setting parameters

3. Getting the result

3 – FeatDB : the DataBase

3 – FeatDB : the DataBase

• Creating and accessing data has been done in 3 steps.

1. Create tables structure with the designer

3 – FeatDB : the DataBase

2. CreateDataSetandSQLquerieswith thedesigner

3 – FeatDB : the DataBase

3. Nothing easier than writing the code !

3 – FeatDB : the DataBase

• The Database is stored in an “mdf” file, so that the deployment can be instantaneous.

4 – FreeDB

4 – FreeDB

• A large number of Amazon’s music products contains tracks information …

4 – FreeDB

• … But only some of the most recent albums contains Featuring’s data.

• The choosen solution uses FreeDB’s data to retrieve informations about featurings.

4 – FreeDB

• Problem : When extracted, FreeDB files takes 1.7Gb of disk space.

• A lot of stored data are not interesting:

4 – FreeDB

• Solution : FeatFinder contains a module that change the data format :

• Result :– Only interesting tracks and albums

are stored. (Total size is now only 6Mb)

– A search in the local FreeDB is now extremely rapid. (0,06 seconds)

5 – Calculation

5 – Calculation

• FeatFinder is based on a big graph linking Artists of the database.

5 – Calculation

Note: this graph image isa capture from bonus.svg

You can view this graph and zoom on it using Intenet Explorer

and this plugin

5 – Calculation

• Calculation is the service that generate this graph, using Amazon E-Commerce Data, FreeDB and the remote Queue Service.

• Here is a brief overview of how it works :

5 – Calculation

• Step1 : Read the next artist from the remote queue

• Step2 : Get the list of albums from Amazon

Missy Elliott

5 – Calculation

• Step3 : Take the first Album and search for it in FreeDB

• Step4 : Extract Featuring information :– Store it in the DataBase– Put unknown Artists in the remote queue

5 – Calculation

• Step5 : Here is the new state of the remote queue :

• And the local album queue :

5 – Calculation

• Step6 : Do the same for each album of the local queue and each artist of the remote queue.

• When the remote queue is empty, calculation is finished

• This simple algorithm only takes 2 hours to fill the graph with 2,500 artists.

6 – The BootStrap

6 – The BootStrap

• We’ve seen how to calculate the graph when we have artists waiting in the remote queue.

• But where to start ?

• BootStrap’s goal is to initialize the remote queue with a lot of artist names.

6 – The BootStrap

• This names can come from a lot of different sources, but should be stored in XML

• For example, I’ve used data from the Amazon’s “Best of 2005” page.

7 – GraphBox

7 – GraphBox

• GraphBox is the modules that “manipulates” the graph.

• This graph is unweighted and undirected.

• All algorithms used in FeatFinder (FeatRing, FeatGraph, FeatFactor and FeatWay) are implemented in this module.

7 – GraphBox

• The hardest algorithm to implement was the FeatRing.

• First I made a depth-first search algorithm. This one was very efficient in small graphs. But with a large amount of Artists, this algorithm can be very long for finding a ring.

• Now the Feat Ring uses a breadth-first search algorithm. This one is very quick in large graphs, but can use a lot of memory.

8 – Client Interface

8 – Client Interface

• The User interface has been developed in WinForms using the graphical designer of Visual Studio

8 – Client Interface

• Note relative to the contest rules :– The client interface uses 2 open

sources libraries : Netron Graph lib for displaying graphs, and TreeView for displaying FeatFactors

– The authors of this libraries gave me an explicit confirmation that using this 2 libraries were conform to the Amazon developer contest official rules

Archive organisation

• Here is the folder contents of the archive :

BIN directory contains all the executables

FeatFinder directory contains all the source codeThere is one solution file (FeatFinder.sln)And one project for each module

Archive organisation

• How to run the User Interface ?1. Open Common.dll.config and

replace values with valid Amazon Access Key Identifiers

2. Double-click on FeatFinderForm.exe

Archive organisation

• How to run the Calculation Service ?1. Open Calculation.exe.config

and write the valid path of your FreeDB directory

2. Double-click on Calculation.exe

Possible Improvements

• FeatFinder is a technological demonstration that illustrate the original idea.

• The project is fully functional

• But some concessions have been done due to the lack of time.

Possible Improvements

• Here is two possible improvements :1. Use a WebService between Graphbox

and the User interface

Possible Improvements

• With this amelioration, we can have 1 server and several clients in different computers in the world. All users will access the same centralized database.

Possible Improvements

• Here is two possible improvements :2. Next to the first improvement, we can

easily add another User Interface written with WebForms in ASP.

Possible Improvements

• With this amelioration, FeatFinder can be used all over the world with a lightweight client, such as a Web Browser.

Conclusion

• FeatFinder was my first experience in .Net. I’ve been impressed by the simplicity of using WebServices, WinForms and Database with Visual Studio.

• Thank you for reading this presentation• Now it’s time to view FeatFinder in real.• I hope you will enjoy using this

application !