1
DwC archive gbiffrance-harvest NoSQL DB (MongoDB) ECAT HARVESTING Gbiffrance-harvest is a tool based on the HIT core, developed by the Secretariat. It harvests data provided in DwC- Archive or XML format and all DarwinCore concepts (and their ABCD equivalent) are taken into account. Data is stored in a SQL or NoSQL database (the latter being more suited for a large amount of data), depending on the version used. The tool has been developed in Java, and is open-source : http://github.com/michaelakbaraly/ gbiffrance-harvest INDEXING We chose to keep the database schema as simple as possible, and decided to avoid dealing with post data normalization. All the occurrences are stored in one large table. The taxonomical information is enriched by the ECAT referential. The table is then replicated on an ElasticSearch server, providing a full-text search engine making searchable all data available. PUBLISHING By providing a simple search system, the portal offers an efficient way to browse the index content and to obtain more detailed information compared to the international data portal. It also includes webservices such as Yahoo Geo and EOL and generates dynamical statistics in order to enhance end-user browsing experience. http://github.com/michaelakbaraly/ gbiffrance-portal The GBIF France new architecture To answer efficiently to French users needs, GBIF France continues the work started last year on the development of its new IT architecture, offering tools to the scientific community and participating actively to the GBIF decentralization policy IPT Biocase Tapir Digir Sources XML PostGIS gbiffrance-mapViewer ElasticSearch gbiffrance-portal IUHH DQG RSHQ DFFHVV WR ELRGLYHUVLW\ GDWD AUTHORS: Michael AKBARALY, Pere ROCA-RISTOL, Anne-Sophie ARCHAMBEAU and Régine VIGNES-LEBBE GBIF France, 43 rue Buffon, 75005 Paris cedex 05 Contact : [email protected]

GBIF France architecture (TDWG meeting in China)

Embed Size (px)

Citation preview

DwCarchive

gbiffrance-harvest

NoSQL DB (MongoDB)

ECAT

HARVESTINGGbiffrance-harvest is a tool based on the HIT core, developed by the Secretariat. It harvests data provided in DwC-Archive or XML format and all DarwinCore concepts (and their ABCD equivalent) are taken into account. Data is stored in a SQL or NoSQL database (the latter being more suited for a large amount of data), depending on the version used.The tool has been developed in Java, and is open-source :http://github.com/michaelakbaraly/gbiffrance-harvest

INDEXINGWe chose to keep the database schema as simple as possible, and decided to avoid dealing with post data normalization. All the occurrences are stored in one large table.The taxonomical information is enriched by the ECAT referential. The table is then replicated on an ElasticSearch server, providing a full-text search engine making searchable all data available.

PUBLISHINGBy providing a simple search system, the portal offers an efficient way to browse the index content and to obtain more detailed information compared to the international data portal. It also includes webservices such as Yahoo Geo and EOL and generates dynamical statistics in order to e n h a n c e e n d - u s e r b r o w s i n g experience.http://github.com/michaelakbaraly/gbiffrance-portal

The GBIF France new architectureTo answer efficiently to French users needs, GBIF France continues the work started last year on the development of its new IT architecture, offering tools to the scientific community and participating actively to the GBIF decentralization policy

IPT Biocase Tapir Digir

Sources

XML

PostGISgbiffrance-mapViewer

ElasticSearch

gbiffrance-portal

���IUHH�DQG�RSHQ�DFFHVV��� WR�ELRGLYHUVLW\�GDWD

1RGHV�FDSDFLW\�EXLOGLQJ�DQG�FROODERUDWLRQ

&ROODERUDWLRQ�EHWZHHQ�SDUWLFLSDQWV�LV�YLWDO�IRU�*%,)�WR�DFW�DV�RQH�JOREDO�FRPPXQLW\��6HYHUDO�DSSURDFKHV�DUH�LQ�SODFH�WR�HQFRXUDJH�DQG�VXSSRUW�VXFK�FROODERUDWLRQ�

5HJLRQDO�QRGHV�PHHWLQJV,Q�������WKUHH�LQGHSHQGHQWO\�ÀQDQFHG�UHJLRQDO�QRGHV�PHHWLQJV�ZHUH�RUJDQL]HG�LQ�$IULFD��6RXWK�$IULFD���(XURSH��)UDQFH��DQG�/DWLQ�$PHULFD��8UXJXD\���7KH�IRFXV�RI�WKLV�UHJLRQDO�DSSURDFK�LV�WR�LGHQWLI\�UHJLRQDO�SULRULWLHV�DQG�WDUJHWV�LQ�UHODWLRQ�WR�*%,)·V�ZRUN�SURJUDPPH��7KH�DLP�LV�DOVR�WR�VWUHQJWKHQ�FRRSHUDWLRQ�EHWZHHQ�SDUWLFLSDQW�QRGHV�

0HQWRULQJ�SURJUDPPH7KH�PHQWRULQJ�SURJUDPPH�EXLOGV�SDUWQHUVKLSV�EHWZHHQ�SDUWLFLSDQW�QRGHV�WR�WUDQVIHU�H[SHULHQFHV�DQG�H[SHUWLVH��6LQFH�LWV�VWDUW�LQ�������WKH�PHQWRULQJ�SURJUDPPH�KDV�VXSSRUWHG����SURMHFWV��LQYROYLQJ����FRXQWULHV��IXQGHG�E\�VPDOO�JUDQWV��

&DSDFLW\�(QKDQFHPHQW�3URJUDPPH�IRU�'HYHORSLQJ�&RXQWULHV��&(3'(&�7KH�HVWDEOLVKPHQW�RI�QDWLRQDO�ELRGLYHUVLW\�LQIRUPDWLRQ�IDFLOLWLHV�LQ�GHYHORSLQJ�FRXQWULHV��WR�HQDEOH�VFLHQFH�DQG�GHFLVLRQ�PDNLQJ��LV�WKH�DLP�RI�*%,)·V�FDSDFLW\�HQKDQFHPHQW�SURMHFWV��7KH�FXUUHQW�SURMHFWV�DUH�

��� VXSSRUW�WR�7DQ%,)�LQ�7DQ]DQLD��ÀQDQFHG�E\�WKH�'DQLVK�JRYHUQPHQW��DQG

��� VXSSRUW�WR����6(3�FRXQWULHV�LQ�$IULFD�DQG�6RXWKHDVW�$VLD��ÀQDQFHG�E\�WKH�)UHQFK�JRYHUQPHQW��DQG�FRRUGLQDWHG�E\�WKH�,QVWLWXW�GH�5HFKHUFKH�SRXU�OH�'pYHORSSHPHQW�LQ�FROODERUDWLRQ�ZLWK�WKH�SURMHFW�6XG�([SHUW�3ODQWHV�DQG�*%,)�

*%,)�HQFRXUDJHV�JRYHUQPHQWV�DQG�RUJDQL]DWLRQV�WR�VXSSRUW�QHZ�SURMHFWV�

ZZZ�VXG�H[SHUW�SODQWHV�LUG�IU�

$ERXW�*%,)

7KH�*OREDO�%LRGLYHUVLW\�,QIRUPDWLRQ�)DFLOLW\��*%,)��ZDV�HVWDEOLVKHG�E\�JRYHUQPHQWV�LQ������WR�HQFRXUDJH�IUHH�DQG�RSHQ�DFFHVV�WR�ELRGLYHUVLW\�GDWD��YLD�WKH�,QWHUQHW��7KURXJK�D�JOREDO�QHWZRUN�RI�QDWLRQDO�DQG�WKHPDWLF�QRGHV��DQG�D�6HFUHWDULDW�EDVHG�LQ�&RSHQKDJHQ��'HQPDUN��*%,)�SURPRWHV�DQG�IDFLOLWDWHV�WKH�PRELOL]DWLRQ��DFFHVV��GLVFRYHU\�DQG�XVH�RI�LQIRUPDWLRQ�DERXW�WKH�RFFXUUHQFH�RI�RUJDQLVPV�RYHU�WLPH�DQG�DFURVV�WKH�SODQHW�

9LVLRQ�²�$�ZRUOG�LQ�ZKLFK�ELRGLYHUVLW\�LQIRUPDWLRQ�LV�IUHHO\�DQG�XQLYHUVDOO\�DYDLODEOH�IRU�VFLHQFH��VRFLHW\��DQG�D�VXVWDLQDEOH�IXWXUH��0LVVLRQ�²�7R�EH�WKH�IRUHPRVW�JOREDO�UHVRXUFH�IRU�ELRGLYHUVLW\�LQIRUPDWLRQ��DQG�HQJHQGHU�VPDUW�VROXWLRQV�IRU�HQYLURQPHQWDO�DQG�KXPDQ�ZHOO�EHLQJ�

ZZZ�JELI�RUJ

<HDU�

�����

�����

�����

�����

�����

�����

0HQWRUHG�*%,)�3DUWLFLSDQW�1RGH�V�� �

*KDQD��*KD%,)��

&HQWUDO�$IULFDQ�5HSXEOLF��*%,)�&HQWUDIULTXH��

8JDQGD��8JD%,)��

&KLOH�

,QGLD��:LOGOLIH�,QVWLWXWH�RI�,QGLD��

7RJR��7RJR�%,)�

.HQ\D��.HQ%,)��1DWLRQDO�0XVHXPV�RI�.HQ\D�

&XED��&HQWUR�1DFLRQDO�GH�%LRGLYHUVLGDG��

0DXULWDQLD��0DXULWDQLDQ�1DWLRQDO�1RGH�

$UJHQWLQD��$UJHQWLQH�1DWLRQDO�1RGH�

&RORPELD��*%,)�&RORPELD�

3DNLVWDQ��0XVHXP�RI�1DWXUDO�+LVWRU\�

3HUX��,QVWLWXWR�GH�,QYHVWLJDFLRQHV�GH�OD�$PD]RQtD�3HUXDQD�

$UJHQWLQD

1LFDUDJXD

*KDQD

0HQWRU�*%,)�3DUWLFLSDQW�1RGH�V�

7KH�1HWKHUODQGV��1/%,)�

*%,)�)UDQFH�DQG�*%,)�&DPHURRQ��&DP%,)�

7KH�1HWKHUODQGV��(7,�%LRLQIRUPDWLFV�

&RVWD�5LFD��,1%LR�

$XVWUDOLD��$WODV�RI�/LYLQJ�$XVWUDOLD�

)UDQFH��*%,)�)UDQFH�

)LQODQG��*%,)�)LQODQG�

&RORPELD��*%,)�&RORPELD���*%,)�6SDLQ�DQG�&RVWD�5LFD��,1%LR�

%HOJLXP��%H%,)�

&RORPELD��*%,)�&RORPELD��DQG�86$��8&�%HUNHOH\�

6SDLQ��*%,)�(6��DQG�WKH�1HWKHUODQGV��1/%,)�

$XVWUDOLD��$%,)�²�$XVWUDOLD�%LRORJLFDO�5HVRXUFHV�6WXG\�

&RVWD�5LFD��,1%LR�

&RVWD�5LFD��,1%LR�

�'HQPDUN��'DQ%,)�

0HQWRULQJ�SURJUDPPHV�VLQFH�����

AUTHORS: Michael AKBARALY, Pere ROCA-RISTOL, Anne-Sophie ARCHAMBEAU and Régine VIGNES-LEBBEGBIF France, 43 rue Buffon, 75005 Paris cedex 05

Contact : [email protected]