Upload
mochasteak
View
82
Download
1
Tags:
Embed Size (px)
Citation preview
Springer for R&D and MarkLogic | 15.04.2023 | 2
Who is Springer?
• Leading global scientific publisher
• 6,000 employees in 25 countries
• 890 million EUR in turnover
• 2,000 journals / 7,000 new book titles published every year
• 50,000 eBooks
• Largest open access portfolio worldwide (over 300 open access journals)
Springer for R&D and MarkLogic | 15.04.2023 | 8
MarkLogic cluster
RESTful APIs realtime.springer.com
citations.springer.com
iPhone apps
Springer for R&D and MarkLogic | 15.04.2023 | 10
The business goal
“Double the sales from Corporate market …in 5 years”
• How?• Hire more sales people
• Investment in dedicated content platform
-- Derk Haank, CEO
Springer for R&D and MarkLogic | 15.04.2023 | 11
What we used to do
Sure, we’ve got what you want.
In here somewhere.
Springer for R&D and MarkLogic | 15.04.2023 | 13
Professional R&D researchers are different from academic researchers
Springer for R&D and MarkLogic | 15.04.2023 | 14
R&D researchers need content categorized according to their needs
Springer for R&D and MarkLogic | 15.04.2023 | 15
They have unique collaboration…
…and security needs
Springer for R&D and MarkLogic | 15.04.2023 | 16
They must satisfy fundamentally different
stakeholders
Springer for R&D and MarkLogic | 15.04.2023 | 18
That require flexible business models to suit each customer’s situation
Springer for R&D and MarkLogic | 15.04.2023 | 22
Goals are prioritized
(top to bottom) and stories
are prioritized (left to right)
Velocity is measured every week, allowing us to accurately forecast when a certain level of work can be completed
Springer for R&D and MarkLogic | 15.04.2023 | 24
We track how much work we are doing against each goal
Springer for R&D and MarkLogic | 15.04.2023 | 25
Story #150 - Abstract page default area (Articles only)
• As Henry, I want to see a quick summary of article information so that I can decide if the article is relevant to me without having to sift through lots of irrelevant content.
Springer for R&D and MarkLogic | 15.04.2023 | 29
What’s specific to Corporate customers
Content organized according to the way customers see the world
Show how their peers are using the content
Extra security/ reporting
for Deposit Accounts
Springer for R&D and MarkLogic | 15.04.2023 | 30
Some cool new enhancements
We have rasterized all pages of all documents
(over 60 million pages)
Limit search results to only accessible content
Links directly to sections of HTML
Auto-suggest based on Google search terms
Springer for R&D and MarkLogic | 15.04.2023 | 31
What have we done with MarkLogic that’s cool?
• Indexed 5.6 million XML metadata files (2TB)
• Faceted search
• Transform XML on the fly
• Related documents
• Local-Disk Failover
• Customized search library
• Store Entitlements as queries
Co
oln
ess
fa
cto
r
Springer for R&D and MarkLogic | 15.04.2023 | 32
Search customizations
• Exact phrase match weighs a lot
• Titles weigh a lot
• Abstracts weigh some
• References are excluded
• Publication level weighs more than document level
• Full-text weighs some
• Search customized to browser language
• Future search enhancements:
• Highly cited weigh more
• Highly downloaded weigh more
Springer for R&D and MarkLogic | 15.04.2023 | 33
Content Entitlements
2TB
Storing Entitlements in MarkLogic
Customers
<material_ID=“001”> Subject : Engineering
<content> Journal_ID:0001 ContentType: Article DatePublished: 4/4/2012 Subject:Mathematics Author: John Smith Language: English Keywords: “k theory” <material_ID=“002”>
Journal_ID: 0001-0099
<material_ID=“003”> Subject: Engineering SearchTerm: “carbon nanotube” DatePublished: 2000-2012
<customer=“001”> material_ID : 001
These are stored as serialized queries
Springer for R&D and MarkLogic | 15.04.2023 | 34
Benefits of this approach
• Each query can be arbitrarily complex and completely customized
• We’ll come back to this in a second
• The stored queries automatically select any new content that matches the query's criteria
• Every day we insert thousands of new documents
• These documents are immediately available on the site
• And immediately included in the query, so customers have access
• Many materials, potentially tens of thousands, can be associated with a user
• all looked up and combined into a single request
• on the fly
Springer for R&D and MarkLogic | 15.04.2023 | 38
Scout Diagnostics are interested in:
• Cerebrospinal fluid
• Alzheimer’s disease
• Peptides
• But:
– “Published:1990-2012”
– AND Only in “Subject: biomedical”
Springer for R&D and MarkLogic | 15.04.2023 | 39
Which can be described as:
• http://rd.springer.com/search?query=peptides+%22Cerebrospinal+fluid%22+%22Alzheimer%27s+disease%22+&facet-start-year=1990&facet-discipline=%22Biomedical+Sciences%22&facet-end-year=2012
Or…
Springer for R&D and MarkLogic | 15.04.2023 | 40
Custom query model
• Do a search for what you want
• Facet to your heart’s content
• Your price is determined by the number of documents
• Database licensing model
• Access, not ownership
• You automatically get access to new documents that match your query
• At the end of a year we see how much the package has grown
• …and re-negotiate
Springer for R&D and MarkLogic | 15.04.2023 | 43
Springer for R&D
• Conceived: May 1, 2011
• Born: Nov. 15, 2011
• Weight: 11TB
• 2 TB XML
• 3.5 TB PDF
• 5.5 TB Images
• Proud parents:
Springer for R&D and MarkLogic | 15.04.2023 | 46
Summary
• Springer needed to:
• Build a content platform from scratch
• Dedicated to a particular market segment
• MarkLogic allowed us to:
• Leverage our significant XML assets
• Increase development speed by having database perform more heavy lifting
• Solve a difficult technical problem related to granting access
• Offer completely new business models tailored to our market
• Have a highly performant solution that operated at scale
Thank you.
Brian Bishop, VP of Platform Development
@mochasteak