38
www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

Embed Size (px)

Citation preview

Page 1: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

Prepared by:Stephen EdmondsDecember 2004

Developing the Monash Research Directory

Page 2: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

2

What is it?

• A searchable web based directory of research publications and researchers at Monash University.

• Developed using perl and open source modules.

Page 3: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

3

Search form

Page 4: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

4

Author search results

Page 5: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

5

Publication search results

Page 6: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

6

Author details

Page 7: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

7

Publication details

Page 8: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

8

Why?

• Each year the research activities at Monash University produce a significant amount of output in the form of:– Journal articles– Books– Conference papers– and more…

• Unfortunately only a limited number of people are aware of the full range of output.

Page 9: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

9

Why?

• A publicly available directory could potentially raise the profile of research activities at the University.

• Additionally the Monash Research Directory would be the first of a series of research oriented tools for:

– Researchers at Monash

– People interested in research

Page 10: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

10

Initial requirements

• Publicly available through the Monash website.• Restricted access interface through the

my.monash staff and student portal.• Utilise existing information from systems

around the University.• Present the most up to date information

possible.• Only display research output generated by

current staff members of the University.

Page 11: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

11

Research Master

• A commercial product used to track research activities around the University.

• Information regarding the research activities is entered by representatives from each faculty within the University.

• Within Research Master one module contains details of the research output.

Page 12: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

12

Research Master

• … and another contains details of the authors of the research output.

• 30,000 publications covering 8 years.• 25,000 distinct authors.• The information is stored in an Oracle

database for use with a client application.

Page 13: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

13

Monash Directory Service

• Contains an entry for each current student or member of staff of the University.

• Automatically updated from a number of sources such as the payroll system or the internal telephone directory.

• Staff members have the ability to enter additional information into their entry such as:

– Research interests– Professional associations– Biography– Photograph (as a JPEG)

• A standard LDAP service.

Page 14: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

14

Public Monash website

• Farm of linux boxes running Apache web servers

• Perl CGI is one of many technologies available.

Page 15: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

15

my.monash portal

• A integrated view of the University for both staff members and students.

• Uses HTML::Mason, a dynamic web site authoring system written in perl.

Page 16: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

16

The problem so far…

• Two backend systems:– Research Master (Oracle database)

– Monash Directory Service (LDAP service)

• Two frontend environments:– my.monash portal (perl through

HTML::Mason)

– Public website (perl CGI)

Page 17: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

17

The problem so far…

• Some kind of glue is required between these four systems:

Page 18: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

18

And the answer was…

• A module or set of modules.• Written in perl.

Page 19: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

19

But how?

• The preliminary analysis showed that an author:– Has a variety of details.

– Relates to one or more publications.

• While a publication:– Has a variety of details.

– Relates to one or more authors.

Page 20: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

20

But how?

• This data can be represented by a simple hierarchy:

Page 21: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

21

But how?

• This complete encapsulation of business logic within classes means that the usage code is simply:

my $research = Monash::ResearchDirectory->new( ... );

if ($research->search('name' => ‘john smith’)) { foreach my $author ($research->authors()) { print $author->name(), "\n";

foreach my $publication ($author->publications()) { print $publication->title(), "\n"; } }}

Page 22: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

22

Publication data issues

• The data contained within the Monash Directory Service is clearly defined.

• However the data stored in Research Master for a publication can vary from category to category

• … and even from year to year.

Page 23: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

23

Publication data issues

Page 24: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

24

Publication data issues

• A solution was to retrieve the field labels from the database and then generalise the access methods on the publication class:

foreach my $field ($publication->fields()){ my ($label, $value) = $publication->field($field);

if ($value) { print $name, "\t", $value, "\n"; }}

Page 25: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

25

Internals

• As already stated the act of encapsulating as much business logic as possible in the classes means that the CGI script and HTML::Mason component aspects become trivial.

• At first it appeared to be the opposite case for the internals of the classes

• … however it fortunately did not become as complicated as feared.

Page 26: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

26

Publication title search

• Walkthrough of some of the interesting part of the publication title search process when the following call is made:

$research->search('name' => ‘john smith’);

Page 27: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

27

Querying Research Master

• Simplified by being able to query the backend Oracle database directly.

• A compromise between performance and maintenance resulted in a single SQL query.

• Unfortunately information is now duplicated in the results …

Page 28: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

28

Querying Research Master

• … which can be selectively ignored during processing:

while (my $row = $sth->fetchrow_hashref('NAME_lc')){ my $author = $self->_find_or_create_author($row); my $publication = $self->_find_or_create_publication($row);

$author->add_publication($publication); $publication->add_author($author);}

Page 29: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

29

Querying the Monash Directory Service

• A filter is constructed from the results obtained by querying Research Master:

• Which is then used to query the Monash Directory Service using Net::LDAP

my @numbers = map { $_->employeenumber() || () } $self->authors();

my $ldap_filter = q{(|} . join q{}, map { qq{(employeenumber=$_)} } @numbers . q{)} ;

Page 30: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

30

Correlating results

• Results from the Monash Directory Service are then attached to the appropriate author object:

foreach my $author ($self->authors()){ my $entry = $self->_get_ldap_entry($author->employeenumber());

$author->set_ldap_entry($entry) if $entry;}

Page 31: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

31

Correlating results

• The publications which do not have at least one current staff member of the University as an author are now removed from the results:

foreach my $publication ($self->publications()){ unless (grep { $_->is_monash() } $publication->authors()) { $self->destroy_publication($publication); }}

Page 32: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

32

Correlating results

• Finally all the authors without any publications are removed from the results:

foreach my $author ($self->authors()){ unless ($author->publications()) { $self->remove_author($author); }}

Page 33: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

33

Results

• At this point the object represents sufficient objects to enable the search results to be displayed:

$research->search('name' => ‘john smith’);

foreach my $author ($research->authors()){ print $author->name(), "\n";

foreach my $publication ($author->publications()) { print $publication->title(), "\n"; }}

Page 34: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

34

Limitations

• At no point do the author or publication objects in existence represent the entire Research Directory.

• Which means that a fresh search is required for the various pages in the interface.

• Not such of an issue due to the stateless nature of the web.

Page 35: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

35

Complicated scientific formula in titles

• Plain text:– 2]

• Rich text formatted:– {\rtf1\ansi\deff0{\fonttbl{\f0\fswiss Arial;}{\f1\fnil\fcharset2 Symbol;}} \viewkind4\uc1\

pard\lang1033\f0\fs24 2] \fs18 Unprecedented \f1\fs24 m-h\up5\fs14 2:\up0\fs24 h\up5\fs14 2\up0\f0\fs18 - pyrazolate coordination in [\{Yb(\f1\fs24 h\up5\f0\fs14 2\up0\fs18 - \f1\fs24\'a6\f0\fs18 Bu\dn5\fs14 2\up0\fs18 pz)(\f1\fs24 m\f0\fs18 -\f1\fs24 h\up5\f0\fs14 2\up0\fs18 :\f1\fs24 h\up5\f0\fs14 2\up0\fs18 -\f1\fs24\'a6\f0\fs18 Bu\dn5\fs14 2\up0\fs18 pz)(thf)\}\dn5\fs14 2\up0\fs18 ] \par }

• Correctly rendered:– 2] Unprecedented μ−ηη- pyrazolate coordination in

[{Yb(η2- ƒBu2pz)(μ-η2:η2-ƒBu2pz)(thf)}2]

Page 36: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

36

Complicated scientific formula in titles

• Unfortunately this cannot be reliably rendered using HTML.

• The perl module RTF::HTML::Converter is able to convert the RTF above to:

– 2] Unprecedented m-h2:h2- pyrazolate coordination in [{Yb(h2 - ¦Bu2pz)(m-h2:h2 -¦Bu2pz)(thf)}2]

• While not perfect it is a significant improvement and deemed satisfactory.

Page 37: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

37

Conclusion

• A practical example of how perl can be used to draw information from two sources, one a commercial application, and present the information in two similar but disparate environments.

• All by using two widely used modules:– DBI (and DBD::Oracle)– Net::LDAP

• And a third publicly available module:– RTF::HTML::Converter

Page 38: Www.monash.edu.au Prepared by: Stephen Edmonds December 2004 Developing the Monash Research Directory

www.monash.edu.au

38

Thank you

• Any questions?• The publicly available version of the

Monash Research Directory is available at:– http://monash.edu/research/directory/