Vision and PoliticsVision and Politics
Μεταπτυχιακό μάθημα: Προχωρημένα Ερευνητικά Θέματα Βάσεων Δεδομένων
Έλενα Σαρρή
Καθηγητής: Τ.Σελλής
A. Silberschatz, M. Stonebraker, J. Ullman, "Database Research: Achievements and Opportunities Into the 21st Century," ACM SIGMOD Record, March 1996.
summary of the workshop held in May of 1995
• Key role in creating techonological infrastracture• Areas of database research: support for multimedia
objects, distribution of information, new database applications, workflow and transaction management, and ease of database management and use.
• New capabilities provided by the technology developments in hardware capability, hardware capacity, and communication.
need for industrial support
Points of interest
2 themes: of demand require new solutions
historically confirmed of ability to put ideas to practical use
so viability of db research community vital
The Changing World of Database Management
• A db system is a computerized record keeping system
• Stores provides access to information
• Basic components: data hardware shoftware(consideration: scope magnitude complexity)
• Hardware inpact: cost speed components (multiprocessors) capacity
Every humal enterprise includes computerized info
The Case for DBMS Research
Goal of this report:
• financial support of db research is a worthwhile investment
• Illustrating the pay off from funding db research
Recent Research Achievements
• in 1990 report the grate majority of market are US-owned companies
• products from research prototypes
outline of the new developments
Object-Oriented and Object-Relational Database Systems
In 1990 few research prototype of OODBs. Questionable relationship of OODBs and relational systems
Few research prototypes combining features of relational DBMS (SQL access to simple data types) with OODBs (modelling of complex data) to create ORDBs (object relational database systems) and DOODs (deductive object oriented database systems)
Today there are a variety of commercial OODBs It is a $75M/year market growing at about 50% per year.
Support for New Data Types
• Research attempts of last decade concerning spatial and temporal data types are now part of commercial DBMSs and GIS
Transaction Processing
A DBMS support coordination of many users of shared information
• Traditional transaction management not enadequate for today’s distributed information systems
New db Applications
EOSDISEarth Observing System Data
Information SystemEOS is a collection of satellites gathering info
regarding atmosphere, oceans and land. They return 1/3 PentaByte/year of data that are integrated in EOSDIS.
Challenges are:• Providing on-line access to PB-sized Databases• Supporting thousands of information consumers• Providing effective mechanisms for browsing and
searching for the desired data
Electronic Commerce
There are thousands of projects supporting electronic purchasing of goods. E-commerce involves very large number of participants interacting over the network.
Unlike EOSDIS there are many suppliers and many consumers. Among the challenges are:
• Heterogeneous information sources must be integrated
• E-commerce needs reliable, distributed authentication and funds transfer
Health Care Information Systems
Physicians need to draw on different kinds of info like:• Medical records on various hospitals• Info about drugs• Procedures• Diagnostic toolsTransforming health-care sector will have major impact on
cost and quality. Challenges are:• Integration of heterogeneous forms of information• Access control to preserve confidentiality of medical
records• Interfaces to information appropriate for health-care
professionals
Digital Publishing
Storage of books & articles in electronic form and delivery through high speed networks offering new features like audio & video.
Education industry draws much closer ro publishing and offers facilities like interactive learning. Challenges are:
• Management and delivery of extremely large bodies of data at very high rates
• Protection of intellectual properties
Trends That Affect Database Research
Technological Trends the last 50 years exponential improvement Improvement by factor 10+ every 10 years on:· # of machine instruction executable in a sec· processor cost· amount of secondary storage per unit cost· amount of main memory per unit costImprovement in price/performance new products,
servicesThe last few years · # bits transmitted / unit cost· # bits transmitted / secso able to deal with Terabytes, complex queries cost
effectively
Database Architecture Trends
changes in db structure and use• The relational approach is today ubiquitous (from
very large parallel architectures to home computers) • Client-server architectures will become progressively
more common for database servers to be accessed remotely over networks.
• The traditional data has been joined by various kinds of multimedia data. This trend is fuelling the success of ORDB
Information Highway
• # of Web bits carried by the Internet 15-20% per month, or a factor of 10 growth per year.
• db will play a critical role in this information explosion.
New Research Directions
• Putting multimedia objects to DBMSs• Distribution of information• New uses of db• New transaction models • Easy use and management of db
Support for Multimedia Objects
Areas of research in multimedia data:
Tertiary Storage· new level of storage hierarchy· is made by buffering selected data to secondary
storage like acess to secondary storage by buffering selected data to main memory from disk
Tertiary storage devices are orders of magnitude slower than secondary storage(disks), yet also of vastly greater capacity.
New Data TypesTo support multimedia objects
QoSdelivering multimedia data to many usersbottleneck
different needs (movie/ lecture video)
optimize access based on predicted use
Support for Multimedia Objects
User Interface SupportRequirenment of new interface other than SQL
Ex Quering image db need interface that allows description of color, shape, other characteristics
For ex. Course video : sample frames, text-based indexes, segment search
Support for Multimedia Objects
Distribution of Information
new environment facilitated by the Web requires rethinking of the concepts in current distributed database technology
Degree of autonomyDb sources connected through a network owned by
different participants (health care system Web)Refuse connectionDifferent systems capabilities
Accounting and BillingClient payments for each access to remote dataQuering strategies- billing rates. Willing?
Security and Privacy
• Flexible authentication and authorization systems
• Sale information of anonymous user
Replication and Reconciliation
Nodes disconnected Data often duplicatedCopies reconciled at connectionFrequent eventNeed for high speed protocols
algorithmsEx call routing system
Data Integration and Conversion
Information sources has a variety of formats and models
Use of mediators like agents
Information Retrieval and Discovery
problems information of informally connected
sources / heterogeneous data• Changes without notice• Unclear definitions• Need for techniques to support
searches like in db technology (indexes)
Data Quality
Different sources with different reliability
• Evaluate and query the reliability or the lineage (origin)
Data Mining
•Extraction of information from large bodies
•Decision makers
•Fast response
•Formulate query
•Optimization techniques for complex queries
•Use of non expert users
Data Warehouses
Huge collections of data mainly used for decision support systems. They copy of data from one or more databases.
Issues• Tools for data pumps (modules for obtaining
updates/ translate them)• Methods for data scrubbing (data consistent identify
different representation of the same value)• Create metadictionary (how data obtained)
Repositories
storing and managing both data and metadata
They must• Obtain an evolving set of representations of
the same or similar information (module represented as source code, object code, flow diagram etc..)
• Support versions (snapshots of an element evolving over time) and configurations (versioned collection of versions)
Easy of use
• Improved interfaces for end user and application programmer- administrator
• Easier installation and upgrade of db management systems
Database Metatheory:Asking the Big Queries
Christos H. Papadimitriou
University of California San Diego
Theory and its Function
• In the context of an applied science, theory in broad sense is the use of significant abstraction, scientific research, the suppression of low-level details of the object or artifact being studied or designed.
Solution to complexity imposed by theoreticians:
• (a) They develop mathematical models of the
artifact. Turing machines, formal languages, and the relational model
Solution to complexity imposed by theoreticians:
• (b) abstract models can become reality: (typically,
algorithms and representational schemes) that are derived from the mathematical models.This function of theory is what we usually mean by “synthesis” or “positive results.”Such results must be actually verified by experiments.
Solution to complexity imposed by theoreticians:
• (c) Analyze the mathematical models to predict the
outcome of the experiments (and calibrate the models).
Solution to complexity imposed by theoreticians:
• (d) explore. They develop and study extensions and alternative applications of the model, and they seek its ultimate limitations.
Introduce and apply more and more sophisticated mathematical techniques. build a theoretical body of knowledge and a mathematical methodology that overcome the motivating artifact and model
Exploration is usually guided by aesthetics, taste, and sense of what is “important” and “relevant”.
uncontroversial necessary parts of the research and discovery process in any
science of the artificial:
(a)Model building(b) synthesis(c) analysis
• criticized most: predictably• liked by theoreticians: exploration
arguments in defense of exploration:
• (1) It has been historically beneficial to computer science;
• (2) in reasonable doses, it promotes the field’s health and connectivity;
• (3) exploration and proving elegant theorems are natural and attractive activities, and so it would be wrong and futile to repress them.
Drawbacks
(1) can disortent the field and lead at into crisis, when it is disproportionately extensive in comparison to model budding, synthesis, and analysis
(2) will not thrive if it consistently ignores practice
(3) requires true discipline and honesty in its exposition, especially in avoiding frivolous and unchecked claims of relevance and applicability.
On Negative Results:
In computer science theorems are judged by (as in mathematics)
• Elegance• Depth• importance in long-term research
But here also
• complexity-reducing or points out a setback in this regard.
• negative results are the only possible self-contained theoretical results
• Positive results —complexity reducing solutions such as algorithms and presentation schemes— must be validated experimentally and can therefore be considered as mere invitations to experiment.
• delimitation is the ultimate success in exploration
What is “Good Theory”?
Paul Feyerabend“Science is an essentially anarchic enterprise.[…] There is no idea that is not capable of improving our knowledge. […] The only principle that does not inhibit progress is ‘anything goes’. ”
What is “Good Theory”?
• although there is no such thing as “bad science”, success is an important aspect
• Not just an inner process driven by methodology and results but a much more complex predicate of the social dynamics of the field and its environment, and of course open to circumstances and chance.
• An adoption metaphor in computer science from other sciences is essential and increases its prestige and propagandistic value.
What does this all mean for theoreticians?
• free-style exploratory theoretical research • its success will depend mainly on its propagandistic
value, ability to contaminate its environment,
especially on its potential to influence practice
• Theoreticians should be expositor and popularizer to bring his or her results to the attention of the experimentalist and the practitioner, to convince them of their value by arguments that are measured, rigorous, and credible
What does this all mean for theoreticians?
• do your own experiments helps a lot
• ultimate success of a scientific idea is, of course, the launching of a victorious scientific revolution
Paradigms and Revolution
The stages of the scientific process according to Thomas Kuhn for natural science:
Immature science
Normal science
Crises Revolution
The stages of the scientific process according to Thomas Kuhn for natural science:
Immature science
Normal science
Crises Revolution
• long periods of “normal science, “ in which the field progresses incrementally within a broadly accepted framework that includes not only scientific assumptions and theories, but also conventions about what are appropriate questions
• to ask and how further development should proceed. Such a framework is called a paradigm. Copernicus’ model and Einstein’s general theory seem to be the most frequently mentioned paradigms.
• scientists consider it their duty to defend the paradigm and show that it works
Natural Science
• But cruel facts that do not fit in the paradigm accumulate, despite the community’s ingenious efforts to sweep them under the rug; the paradigm creaks and staggers, and we enter a stage of “science in crisis”
• Νew kinds of ingenuity and imagination develop and compete. Eventually, and typically, one of them triumphs and becomes the next paradigm; this is the stage of “scientific revolution”
• ultimate success of a scientific idea is, of course, the launching of a victorious scientific revolution
Immature science
Normal science
Crises Revolution
adaptation to applied science and the sciences of the artificial.
Kuhn’s model static / eternalin the sciences of the artificial study artifacts, which keep
changing while studied or because it is being studied tight closed-loop interaction between a science and its object.
in case of computer science stages of Kuhn’s model are much accelerated
Crises in natural science are caused by the accumulation of
anomalies, observations of the objective reality that cannot fit the current paradigm. In contrast, in computer science we have no objective reality against which to judge our scientific work.
the operational analog of falsifiability in computer science
“research units” (researchers, papers, research groups, results,or subfields) influence each other
Connection
Autistic behavior is the exception that tests the wisdom of the rule
Most of theory is within a few hops from practice, and vice-versa.
bottom snapshotlocal situation
seems unchanged (say, the average degree is the same)
connectivity is lowTangents and
introverted components are the rule
The little connectivity that exists is via long paths
Practitioners stop communicate to relevant theory
interaction unpleasant, unfriendly, defensive style
The field is in crisis.
Revolution as in natural science
Practitioners (having given up on theory) develop and use their own abstractions, models, and mathematical techniques
while theoreticians make their own attempts to reconnect to practice (responding to “pressures” from within their community and outside).
The uninspiring practical problems and the unresponsive theoretical work that triggered the crisis become less central, and new small research traditions blossom.
Well-targeted exploratory theory connects several of them, and a new healthy state emerges from the ashes. A successfully championed new research paradigm may then take over
Why this relational model is applied in computer science
(1) It was a powerful and attractive proposal (whose plausibility was expertly supported by theoretical arguments)
(2) it was explicitly open-ended, a whole framework for research problems, applications, and experiments;
(3) it came as the result of a crisis (or was it “immature science”?);
(4) it was indeed followed by a period of normal science.
we are now in the blues of a crisis, or even in the flames of an on-going revolution
A Brief History of PracticeAncient Greek tradition strongly favors theory over practice
(Aristotle)Before the last century, an inventor could become famous
only if he was a moonlighting major theoretician or artist (Archimedes, Aristarchus, Leonardo da Vinci) or if his invention helped in the spreading of theoretical knowledge (Gutenberg).
Practice starts obtaining a measure of respectability with Galileo (1564- 1642) (and later under the influence of the British empiricist philosophers)
However, only after James Watt (1736- 1819) did sophisticated theoretical knowledge come to the assistance of practice and invention, thus launching the industrial age and the traditions of applied science and engineering
• Respect for practice is so universal today • Theory and practice collaborated two centuries,
with theory dominating important domains in applied science due to its academic prowess and prestige
• Serious and systematic ideological attack against the value and necessity of theory in applied science seems to be a novel and disturbing phenomenon of the last decade or so.
• Histrory of computer science, is a miniature of the history of science
• The strongest influence came from mathematics(and less from electrical engineering and physics),