47
Writing up

Writing up. Outline The scope of a paper Storytelling Paper Organization Mathematics Algorithms

Embed Size (px)

Citation preview

Writing up

Outline

• The scope of a paper• Storytelling• Paper Organization• Mathematics• Algorithms

The scope of a paper• Which results are the most surprising?• What is the one result that other researchers might adopt

in their work?• Does it make sense to explain the new algorithms first,

followed by description of the previous algorithms in terms of how they differ from the new work?

• Or is the contribution of the new work more obvious if the old approaches are described first, to set the context?

• What is the key background work that has to be discussed?• Who is the readership? For example, are you writing for

specialists in your area, your examiners, or a general computer science audience?

The scope of a paper

• an investigation of external sorting in database systems

• a large relation-tens of millions of records, constituting several gigabytes-must be sorted on a field specified in a query.

• Costs include processing time for sorting and merging , transfer time to and from disk, and temporary space requirements.

• The balance between these costs is governed by available in-memory buffer space, as large blocks are expensive to sort but cheap to merge.

The content of a paper is determined by the readership.

• A paper on machine learning for computer vision may have entirely different implications

• for the two fields, and thus different aspects of the results might be emphasized.

• an expert on vision cannot be assumed to have any experience with machine learning

The publish venue determines the scope of the paper

• Is there a page limit? • Are there specific conventions to be observed?• Are the other papers in that venue primarilytheoretical or experimental? • What prior knowledge or background is areader likely to have? • Do the editors require that your code be

available online?

Telling a story

• A paper is a sequence of concepts, building from a foundation of knowledge

• assumed to be common to all readers up to new ideas and results.

• There are several common ways for structuring the body of a paper, including

• as a chain, by specificity, by example, and by complexity.

compression for fast external sorting• The problem statement consists of an explanation of

external sorting and an argument that disk access costs are a crucial bottleneck.

• The review explains standard compression methods and why they cannot be integrated into external sorting.

• The new solution is the compression method developed in the research.

• The demonstration is a series of graphs and tables based on experiments that compare the cost of sorting with and without compression.

Telling a story

• Structure by specificity– an explanation of a retrieval system. – Such systems generally have several components:– Parsing, indexing, query, …

• Structure by example• Structure by complexity

Organization

• Describe the work in the context of accepted scientific knowledge.

• State the idea that is being investigated , often as a theory or hypothesis.

• Explain what is new about the idea, what is being evaluated, or what contribution the paper is making.

• Justify the theory, by methods such as proof or experiment.

Organization

• Title and author• Abstract• Introduction• Body• Literature review• Conclusions• Bibliography• Appendices

Body

• Introduction-Methods-Results-Discussion• use of fixed headings may prohibit development of a

complex explanation in stages• "compression for the external sorting“

– 1. Introduction– 2. External sorting– 3. Compression techniques for database systems– 4. Sorting with compression– 5. Experimental setup– 6. Results and discussion– 7. Conclusions

Literature review

• A literature review, or survey, is used to compare the new results to similar previously published results,

• to describe existing knowledge, and to explain how it is extended by the new results.

• In many papers the literature review material is not gathered into a single section, but is discussed where it is used

From draft to submission

• brain storm– writing down in point form what has been learnt,

what has been achieved, and what the results are– prepare a skeleton, choosing results to emphasize

and discarding material that on reflection seems irrelevant

– choose the section titles before writing any text– When the structure is complete, each section can

be sketched in perhaps 20 to 200 words

From draft to submission

• When the body and the closing summary are complete, the introduction usually needs substantial revision

• With a reasonably thorough draft completed, it is time to review the paper content and contribution

• For a novice writer who doesn't know where to begin, a good starting point is imitation

Mathematics

Mathematical Clarity

• Mathematics gives solidity to abstract concepts.

• There are well-established conventions of presentation for mathematics and mathematical concepts.

• Reading In mathematical writing it is essential to be precise.

Clarity

• X An inverted list for a given term is a sequence of pairs, where the first element in each pair is a document identifier and the second is the frequency of the term in the document to which the identifier corresponds .

• √ An inverted list for a term t is a sequence of pairs of the form (d, f) , where each d is a document identifier and f is the frequency of t in d.

Mathematical terms• Normal, usual• Definite, strict, proper, all, some Avoid "definite", "strict", and "proper" in their non-mathematical meanings, and be careful with "all" and "some"

• Intractable An algorithm or problem is "intractable" only if it is NP-hard• Formula, equation

– A "formula" is not necessary an "equation"; the latter involves an equality.

• Equivalent, similar• Average, mean. "Average" is used loosely to mean typical. Only

use it in the formal sense-of arithmetic mean-if it is clear to the reader that the formal sense is intended. Otherwise use "mean" or even "arithmetic mean".

Theorems

• the details of the proof may not be important to the reader and can often be omitted.• A common mistake is to unnecessarily include mechanical algebraic transformations• Theorems, definitions, lemmas,and

propositions should be numbered

• state the main theorem first, then state and prove the lemmas before giving the main proof

• Explain the structure of long proofs before getting to the detail, and explain how each part of the proof relate to the structure.

Readability (1)mathematics does not, and so should not be used at the start of a sentence

Give the type of each variable every time it is used, so that the reader doesn‘t have to remember as many details

X The values are represented as a list of numbers L.√ The values are represented as a list L of numbers.

Readability (2)breaking down expressions to make them more readable, especially if doing so enlarges small symbols.

Mathematical expressions should not run together.

Notation

• Ensure that the symbols you use will be correctly understood by, and familiar to, the reader

• The symbols ∽ and ≈ are all used to mean approximately equal to

• The symbol ≌ means lS congruent to , not approximately equal.

• Use ≤ , not < =, for less than or equal to.

Ranges and sequences

• Ranger for Real number– [a,b], [a,b), …

• Ranger for integer: • It is common practice to use an ellipsis to

describe a sequence of integers; thus m,...,n represents all integers between m and n inclusive.

Alphabets• Use of characters from the Greek alphabet to

denote variables and quantities can add clarity to mathematical writing

Some mathematical symbols and characters from other alphabets have asuperficial resemblance to more familiar symbols.

Line breaks

Numbers

• In technical writing, numbers should usually be written as figures, not spelt out.

• The common exceptions are– approximate numbers– numbers up to twenty, unless they are literal values

or part of an expression of measurement– Numbers at the start of a sentence, although it is

generally better to recast the sentence so that the number is elsewhere

– Percentages should always be in figures

Numbers

• X 1024 computers were linked into the ring.• X Partial compilation gave a 4-fold improvement.• X The increase was over five per cent.• X The method requires 2 passes .• √ There were 1024 computers linked into the ring.• √ Partial compilation gave a four-fold

improvement.• √ The increase was over 5 per cent.• √ The increase was over 5%.

Numbers, Percentages• X There were between four and 32 processors in each

machine .• √ There were between 4 and 32 processors in each

machine.• X There were 14 512-Kb sets .• √ There were fourteen 512-Kb sets.• Avoid the phrase "orders of magnitude".• X The new algorithm is at least two orders of

magnitude faster.• √ The new algorithm is at least a hundred times faster.

Numbers, Percentages• In this example, is the unit of magnitude

binary or decimal? It would be better to be explicit.

“there are 10 kinds of people in the world, those that understand binary and those that don' t.”

• X The likelihood of failure is 2: 1 .• √ The likelihood of failure is one in three .• √ The likelihood of failure is about 30%.

Units of measurement

The larger units, especially "Pb","Eb", "Zb", and "Yb", are unfamiliar to most readers and should be written infull at least once, preferably with an explanation.

Algorithms

Presentation of algorithms

• You must demonstrate that the algorithm is a worthwhile contribution– show that it is correct (given appropriate input, it

terminates with appropriate results) – show, by proof, experiment, or both, that it meets

some claimed performance bound.

Presentation of algorithms• The steps that make up the algorithm.• The input and output, and the internal data

structures used by the algorithm.• The scope of application of the algorithm and its

limitations.• The properties that will allow demonstration of

correctness, such as preconditions, post-conditions, and loop invariants.

• A demonstration of correctness.• A complexity analysis, for both space and time

requirements.• Experiments confirming the theoretical results.

Formalismscommon formalisms for presenting algorithms• the list style, in which the algorithm is broken down

into a series of numbered or named steps and loops• pseudocode, in which the algorithm is presented as

if written in a block -structured language• A better option is to use what might be called

prosecode number each step, never break a loop over several

steps, use sub-numbering for the parts of a step, and include explanatory text.

pseudocode

prosecode• WeightedEdit(s,t) compares two strings s and t , of lengths ks

and respectively, to determine the edit distance-the minimum cost in insertions, deletions….

Level of detail

Notation

• Mathematical notation is preferable to programming notation for presentation of algorithms.

• Use "xi" rather than "x [i] “• Mathematics provides many handy

conventions and symbols that can be used in description of algorithms, including set

notation, subscripts and superscripts

Environment of algorithms

• the data structures on which it operates• input and output data types• factors such as properties of the underlying operating

system and hardware.• Describe data structures carefully. • use, say, a simple mathematical notation to

unambiguously specify the structure .• √ Each element is a triple (string , length , positions)in which positions is a set of byte offsets at which string

has been observed.

Performance of algorithms

• Basis of evaluation

• Processing time

• Memory and disk requirements

• Disk and network traffic

• Power and Energy Consumption

Performance of algorithms

• Basis of evaluation. The basis of evaluation should be made explicit. – Processing time. Time (or speed) over some given

input is one of the principal resources used by algorithms

– Memory and disk requirements– Disk and network traffic

• Applicability. Algorithms can be compared not only with regard to their resource requirements. but with regard to functionality.

Asymptotic complexity

• Big-O notation• a function f(n) is said to be O (g (n ) )-that is, g (n) is an upper bound of f (n) if for some constants c and k we have f(n) ≤c . g( n) for all n > k.

Asymptotic complexity

• If f(n) is O(g(n)) and g(n) is O(f(n)) , then f(n) is

a certain algorithm might require O(nlogn) comparisons and O(n)disk accesses. In principle the complexity of the algorithm is O(nlogn) , but,given that a disk access may require 5 milliseconds and a comparison less than a nanosecond, in practice the cost of the disk accesses might well dominate for any possible application.

the logic of asymptotic claims

• Amdahl's law states that the lower bound for the time taken for an algorithm to complete is determined by the part of the algorithm that is inherently sequential.

• it has been claimed that Amdahl's law was broken by, for a certain algorithm, increasing both the size of the input data and the number of processors.

• These changes had minimal impact on the sequential part of the algorithm

• Sometimes a formal analysis is inappropriate or only a minor consideration.– Analytical results often say nothing about constant

factors– or behavior in practice where CPU, cache can

interact in unpredictable ways