25
Chapter 11a Other Approaches to I/O Nick Maclaren Nick Maclaren http://www.ucs.cam.ac.uk/docs/course-notes/unix- http://www.ucs.cam.ac.uk/docs/course-notes/unix- courses/CPLUSPLUS courses/CPLUSPLUS This was written by me, not Bjarne Stroustrup This was written by me, not Bjarne Stroustrup

Chapter 11a Other Approaches to I/O - people.ds.cam.ac.ukpeople.ds.cam.ac.uk/nmm1/C++/11a_other_io.pdf · This was written by me, not Bjarne Stroustrup. 2 Warning: Heresy ... describe

Embed Size (px)

Citation preview

Chapter 11aOther Approaches to I/O

Nick MaclarenNick Maclarenhttp://www.ucs.cam.ac.uk/docs/course-notes/unix-http://www.ucs.cam.ac.uk/docs/course-notes/unix-

courses/CPLUSPLUScourses/CPLUSPLUS

This was written by me, not Bjarne StroustrupThis was written by me, not Bjarne Stroustrup

22

Warning: HeresyWarning: Heresy Many people don't like C++'s approach to I/O much, Many people don't like C++'s approach to I/O much,

and you will often need to use other ones, so here is a and you will often need to use other ones, so here is a very brief summary of the most important alternative very brief summary of the most important alternative approaches, covering three aspectsapproaches, covering three aspects As far as textual input is concernedAs far as textual input is concerned As far as textual output is concernedAs far as textual output is concerned The basic file and data transfer levelThe basic file and data transfer level

These objections have These objections have NOTHINGNOTHING to do with the C++ to do with the C++ language as such, and little with the book's approachlanguage as such, and little with the book's approach They are entirely to do with the I/O library designThey are entirely to do with the I/O library design

Stroustrup/ProgrammingStroustrup/Programming

33

Textual InputTextual Input No language handles this very wellNo language handles this very well

It's a fundamentally hard problem, on many levelsIt's a fundamentally hard problem, on many levels The C++ approach works as well as any otherThe C++ approach works as well as any other If you have serious trouble, consider alternatives If you have serious trouble, consider alternatives

One common alternative is using C, but:One common alternative is using C, but: C C scanfscanf is is dangerousdangerous and is described under output and is described under output If possible, avoid this – it's trickier to get right than most If possible, avoid this – it's trickier to get right than most

people thinkpeople think

Stroustrup/ProgrammingStroustrup/Programming

44

Textual InputTextual Input You can just read characters, and do all decoding You can just read characters, and do all decoding

yourself, as covered in the courseyourself, as covered in the course Painful, but you can decode any text formatPainful, but you can decode any text format If it is done in right way, it is not all that painfulIf it is done in right way, it is not all that painful If it is If it is notnot done in the right way, you done in the right way, you willwill introduce serious introduce serious

bugsbugs You have already covered all of the techniques that you need, You have already covered all of the techniques that you need,

and the book explains how to use themand the book explains how to use them Don't forget the techniques in chapters 6 and 7 – this is a Don't forget the techniques in chapters 6 and 7 – this is a

form of parsing, after allform of parsing, after all

Stroustrup/ProgrammingStroustrup/Programming

55

Regular Expressions Regular Expressions These are extremely useful for decoding input formatsThese are extremely useful for decoding input formats

They are introduced in chapter 23 and C++11 includes a They are introduced in chapter 23 and C++11 includes a <regex><regex> header header

Most compilers have it, but there may well be some that Most compilers have it, but there may well be some that don't, or have a slightly non-standard onedon't, or have a slightly non-standard one

There is also an open source C library, PCRE, which There is also an open source C library, PCRE, which currently is more portablecurrently is more portable

I give a two-afternoon course on them, best attended after I give a two-afternoon course on them, best attended after you have worked through chapter 23you have worked through chapter 23

They are not covered further in my version of the courseThey are not covered further in my version of the course

Stroustrup/ProgrammingStroustrup/Programming

66

Using a PreprocessorUsing a Preprocessor This does NOT mean the C preprocessor!This does NOT mean the C preprocessor!

This means using another program to check the data and This means using another program to check the data and convert it to a format that is easy to handle in C++convert it to a format that is easy to handle in C++

I recommend this for Fortran, which is very bad at reading I recommend this for Fortran, which is very bad at reading free-format, but it could equally well be used for C++free-format, but it could equally well be used for C++

The best language is almost always Python - or perhaps Perl, The best language is almost always Python - or perhaps Perl, but but onlyonly if you already know Perl if you already know Perl

But you could also use Fortran to read Fortran-style or fixed-But you could also use Fortran to read Fortran-style or fixed-format data – it may sound perverse, but is notformat data – it may sound perverse, but is not

Etc.Etc.

Stroustrup/ProgrammingStroustrup/Programming

77

Textual OutputTextual Output My dislike of C++'s approach is mainly its use of I/O My dislike of C++'s approach is mainly its use of I/O

manipulators (e.g. manipulators (e.g. setwsetw) to control formatting ) to control formatting It would be much cleaner to separate formatting (which is It would be much cleaner to separate formatting (which is

simple character manipulation) from I/O (which is about data simple character manipulation) from I/O (which is about data transfer)transfer)

The way that they are implemented makes them The way that they are implemented makes them veryvery painful painful to use for formatting compared to Fortran, C or even Cobolto use for formatting compared to Fortran, C or even Cobol

E.g. 'stickiness' is typically E.g. 'stickiness' is typically notnot what is wanted what is wanted They make formatted output about as painful as unformatted They make formatted output about as painful as unformatted

input in Fortran, which is saying something!input in Fortran, which is saying something! Many C++ programmers also find them inconvenient, and Many C++ programmers also find them inconvenient, and

sometimes use sometimes use worseworse methods methods

Stroustrup/ProgrammingStroustrup/Programming

88

C C printfprintf, , scanfscanf etc. etc.

Many people use C I/O in C++, but:Many people use C I/O in C++, but: It works for built-in types only (i.e. not your own classes) It works for built-in types only (i.e. not your own classes) It is type-unsafe, so you get little static checking, though It is type-unsafe, so you get little static checking, though gccgcc does better than most compilers does better than most compilers

It is size-unsafe (i.e. bad data can overwrite anything and It is size-unsafe (i.e. bad data can overwrite anything and cause chaos) – this is particularly so for cause chaos) – this is particularly so for scanfscanf – and you – and you won't get any help finding such errorswon't get any help finding such errors

If you want to do this, then:If you want to do this, then: Encapsulate your I/O in operators, functions and methods, as Encapsulate your I/O in operators, functions and methods, as

described in the book (or as below)described in the book (or as below) Use C I/O Use C I/O onlyonly within those functions and methods within those functions and methods Check all of your types and sizes before doing the transferCheck all of your types and sizes before doing the transfer

Stroustrup/ProgrammingStroustrup/Programming

99

Another ApproachAnother Approach This is to encapsulate the formatting in a single This is to encapsulate the formatting in a single

function, which takes the value and options, and function, which takes the value and options, and returns a stringreturns a string You can have several such functions for a single classYou can have several such functions for a single class They can have a single name, if their arguments are differentThey can have a single name, if their arguments are different The slight extra cost can be removed by the use of advanced The slight extra cost can be removed by the use of advanced

C++ techniques, if it mattersC++ techniques, if it matters This works very well together with almost all of the This works very well together with almost all of the

standard C++ I/O facilitiesstandard C++ I/O facilities I.e. everything except the formatting manipulatorsI.e. everything except the formatting manipulators It is a It is a veryvery small change from the descriptions in the book small change from the descriptions in the book

and entirely compatible with the book's approach!and entirely compatible with the book's approach!

Stroustrup/ProgrammingStroustrup/Programming

1010

Using FunctionsUsing Functions With this approach:With this approach:

cout << fixed << setprecision(2) << setw(10) <<cout << fixed << setprecision(2) << setw(10) <<1.23 << scientific << setprecision(3) <<1.23 << scientific << setprecision(3) <<setw(12) << 9.8e7 << general << setprecision(6);setw(12) << 9.8e7 << general << setprecision(6);

becomes:becomes:cout << put_fix(1.23,10,2) << cout << put_fix(1.23,10,2) << put_sci(9.8e7,12,3);put_sci(9.8e7,12,3);

and the first function becomes:and the first function becomes:string put_fix (double val, int width, int prec) string put_fix (double val, int width, int prec) {{        ostringstream str;ostringstream str;        str << fixed << setprecision(prec) <<        str << fixed << setprecision(prec) <<                      setw(width) << val;              setw(width) << val;        return str.str();return str.str();}} Stroustrup/ProgrammingStroustrup/Programming

1111

Using MethodsUsing Methods The following:The following:

class wombat { . . . };class wombat { . . . };

wombat a;wombat a;

cout << setw(...) << fixed(...) << a;cout << setw(...) << fixed(...) << a;

becomes:becomes:

class wombat { . . . };class wombat { . . . };

wombat a;wombat a;

cout << a.format(...);cout << a.format(...);

You can overload the method You can overload the method format()format() And you can have more than one such methodAnd you can have more than one such method

Stroustrup/ProgrammingStroustrup/Programming

1212

A Possible ExampleA Possible Example So the following:So the following:

cout << number.fixed();cout << number.fixed();

cout << number.fixed(5);cout << number.fixed(5);

cout << number.fixed(10,5);cout << number.fixed(10,5); could be the equivalent of:could be the equivalent of:

printf(“%f”,number);printf(“%f”,number);

printf(“%.5f”,number);printf(“%.5f”,number);

printf(“%10.5f”,number);printf(“%10.5f”,number); or:or:

WRITE('G0') numberWRITE('G0') number

WRITE('F0.5') numberWRITE('F0.5') number

WRITE('F10.5') numberWRITE('F10.5') number

Stroustrup/ProgrammingStroustrup/Programming

1313

Implementing ThoseImplementing Those You have already learnt everything you needYou have already learnt everything you need

You can use built-in C++ I/O inside the functions to do the You can use built-in C++ I/O inside the functions to do the conversion, and handle only the options yourselfconversion, and handle only the options yourself

You can use C I/O instead, if it is easier – but take care!You can use C I/O instead, if it is easier – but take care! Or you can convert it to characters the hard wayOr you can convert it to characters the hard way

A summary of the differences is:A summary of the differences is: You don't define an 'You don't define an 'ostream & operator<<ostream & operator<<'' You do define functions/methods that return stringsYou do define functions/methods that return strings Everything else in the book applies unchangedEverything else in the book applies unchanged

When you learn them (chapters 13 to 15), default When you learn them (chapters 13 to 15), default arguments and derived classes will simplify the taskarguments and derived classes will simplify the task

Stroustrup/ProgrammingStroustrup/Programming

1414

An ExerciseAn Exercise

File “File “elements.txtelements.txt” contains element properties, from ” contains element properties, from ““http://en.wikipedia.org/wiki/List_of_elementshttp://en.wikipedia.org/wiki/List_of_elements”” The first line is a headerThe first line is a header

Z Sym Element Group Period Weight Density Melt Boil Heat Neg AbundanceZ Sym Element Group Period Weight Density Melt Boil Heat Neg Abundance

It then contains the values, for the first 92 elements, sanitisedIt then contains the values, for the first 92 elements, sanitised

Do the following:Do the following: Define an element class, with methods to return each property Define an element class, with methods to return each property

as a string, using traditional conventionsas a string, using traditional conventions Read the data into a vector of elements, and print a table of Read the data into a vector of elements, and print a table of

selected properties (your choice), properly alignedselected properties (your choice), properly aligned Left-justify strings, right-justify integers, and align the Left-justify strings, right-justify integers, and align the

decimal point of real numbersdecimal point of real numbers Use fixed or scientific, at choice, but provide a precision Use fixed or scientific, at choice, but provide a precision

argument for each method for a real propertyargument for each method for a real property

Stroustrup/ProgrammingStroustrup/Programming

1515

Don't Believe Me?Don't Believe Me? Then you should try the exercise both using I/O Then you should try the exercise both using I/O

manipulators directly and using my preferred approachmanipulators directly and using my preferred approach Neither is “right” or “wrong” - it's Neither is “right” or “wrong” - it's youryour choice choice The principles of encapsulation, error checking, and the use The principles of encapsulation, error checking, and the use

of classes, are the same for bothof classes, are the same for both Lots of other people, libraries and languages use similar Lots of other people, libraries and languages use similar

approaches to the one I describe – and have done for many approaches to the one I describe – and have done for many decadesdecades

I find that this approach is easier to write, use and debug, and I find that this approach is easier to write, use and debug, and more flexible – so do many other peoplemore flexible – so do many other people

Stroustrup/ProgrammingStroustrup/Programming

1616

Boost::formatBoost::format The boost library has a format module, which is a sort The boost library has a format module, which is a sort

of hybrid between C++ manipulators, C of hybrid between C++ manipulators, C printfprintf and and the approach described abovethe approach described above Unlike C Unlike C printfprintf, it is type- and size-safe, it is type- and size-safe It doesn't handle user-defined classes well, but it does handle It doesn't handle user-defined classes well, but it does handle

them (unlike C them (unlike C printfprintf)) Unfortunately, boost is a bit of a problem (well, more than a Unfortunately, boost is a bit of a problem (well, more than a

bit), and is described laterbit), and is described later However, if this does what you want, and especially if you However, if this does what you want, and especially if you

use boost anyway, it's worth consideringuse boost anyway, it's worth considering

Stroustrup/ProgrammingStroustrup/Programming

1717

File and Transfer ModelFile and Transfer Model This probably won't affect you until you write This probably won't affect you until you write

'production' scientific programs'production' scientific programs C's and C++'s model is derived from Unix, and has C's and C++'s model is derived from Unix, and has

two major flawstwo major flaws Its error handling is extremely dangerousIts error handling is extremely dangerous Its handling of non-trivial files is very poorIts handling of non-trivial files is very poor

The book covers some of the first aspect, but I am The book covers some of the first aspect, but I am repeating it and stressing it hererepeating it and stressing it here

You don't use non-trivial files? Think againYou don't use non-trivial files? Think again 'Terminals', pipes and I/O through ssh are all non-trivial'Terminals', pipes and I/O through ssh are all non-trivial Even files kept on a file server are often non-trivialEven files kept on a file server are often non-trivial

Stroustrup/ProgrammingStroustrup/Programming

1818

Error HandlingError Handling C's and C++'s model fails unsafe, which is very bad software C's and C++'s model fails unsafe, which is very bad software

engineeringengineering If you don't test for failure, it will carry on regardless; throwing If you don't test for failure, it will carry on regardless; throwing

an exception would be better, in all casesan exception would be better, in all cases Also, clearing an error state is wrong; it should be cleared as a Also, clearing an error state is wrong; it should be cleared as a

side-effect of recovering from the failureside-effect of recovering from the failure There are also very nasty issues to do with real I/O errors (i.e. There are also very nasty issues to do with real I/O errors (i.e.

ones generated by the hardware or operating system)ones generated by the hardware or operating system)

For all those reasons, you should For all those reasons, you should alwaysalways encapsulate your I/O encapsulate your I/O in functions or methods and include your own checking therein functions or methods and include your own checking there

When you hit a problem (and you may well), you can easily When you hit a problem (and you may well), you can easily add more error detection and (if needed) fixup codeadd more error detection and (if needed) fixup code

Stroustrup/ProgrammingStroustrup/Programming

1919

Non-trivial File TypesNon-trivial File Types There are also very serious problems with non-trivial There are also very serious problems with non-trivial

file types; this is not just a C++ issue, or even a C/C++ file types; this is not just a C++ issue, or even a C/C++ one – it's basic to the Unix file modelone – it's basic to the Unix file model

The good news: The good news: effectively alleffectively all file types can be used as file types can be used as simplex streams (i.e. input or output, but not both)simplex streams (i.e. input or output, but not both) Don't use repositioning, and don't close and reopenDon't use repositioning, and don't close and reopen That will work on any system and almost any file type that That will work on any system and almost any file type that

you will encounteryou will encounter

End of problem (in 99% of cases)End of problem (in 99% of cases)

Stroustrup/ProgrammingStroustrup/Programming

2020

Non-trivial File TypesNon-trivial File Types Simple disk files on Simple disk files on locallocal file systems can be both file systems can be both

repositioned and used for duplex streams (i.e. both repositioned and used for duplex streams (i.e. both input and output), as in the book's chapter 11input and output), as in the book's chapter 11

For files held on file servers, as on most departmental For files held on file servers, as on most departmental systems and all HPC systems, use only the options I systems and all HPC systems, use only the options I describe in my extra slides on chapter 11describe in my extra slides on chapter 11 Many systems' file servers allow rather more, but some do Many systems' file servers allow rather more, but some do

not, or allow them but implement them incompatiblynot, or allow them but implement them incompatibly Very weird things can happen when you push them too farVery weird things can happen when you push them too far

If you want to know more, please askIf you want to know more, please ask

Stroustrup/ProgrammingStroustrup/Programming

2121

Unix-like File ModelsUnix-like File Models Unix was designed as a computer science researcher's Unix was designed as a computer science researcher's

workbench in 1970workbench in 1970 It was never designed for production use or reliabilityIt was never designed for production use or reliability The file model was designed for simplicity on the near-trivial The file model was designed for simplicity on the near-trivial

I/O needed for that kind of computer scienceI/O needed for that kind of computer science It was designed for systems with local disks It was designed for systems with local disks onlyonly Its model doesn't match other systems (e.g. mainframes or Its model doesn't match other systems (e.g. mainframes or

embedded) or even non-trivial file types very wellembedded) or even non-trivial file types very well C++'s basic file and I/O model is derived from UnixC++'s basic file and I/O model is derived from Unix Microsoft systems' modern file model has been derived Microsoft systems' modern file model has been derived

from Unix, but is not quite the samefrom Unix, but is not quite the same A line separator is two characters (CR-LF) and not one as in A line separator is two characters (CR-LF) and not one as in

Unix (LF) or MacOS (CR)Unix (LF) or MacOS (CR) There are some other minor, but significant, differencesThere are some other minor, but significant, differences

Stroustrup/ProgrammingStroustrup/Programming

2222

An Example: SocketsAn Example: Sockets Sockets are streams, but not just byte streams, and not Sockets are streams, but not just byte streams, and not

at all file-like in their advanced propertiesat all file-like in their advanced properties The 'packet' boundaries are semantically significantThe 'packet' boundaries are semantically significant There are control operations that can be embedded in the There are control operations that can be embedded in the

streamstream They cannot be repositioned in any wayThey cannot be repositioned in any way They cannot even be closed and reopenedThey cannot even be closed and reopened They are duplex streams, but their input and output are They are duplex streams, but their input and output are

mostly separatemostly separate The same applies to pipes and 'terminal' I/O, which are The same applies to pipes and 'terminal' I/O, which are

closely related to socketsclosely related to sockets Use all of these as simplex streams, and no problemUse all of these as simplex streams, and no problem

Beyond that needs real expertiseBeyond that needs real expertise

Stroustrup/ProgrammingStroustrup/Programming

2323

Specialist FilesSpecialist Files Many programs use specialist data formats, usually Many programs use specialist data formats, usually

accessed through a special libraryaccessed through a special library Not all – e.g. the book mentions Not all – e.g. the book mentions XMLXML

Any files that are Any files that are veryvery unlike disks are almost always unlike disks are almost always accessed through a special libraryaccessed through a special library You can use C++ I/O for such uses, but it's dependent on You can use C++ I/O for such uses, but it's dependent on

unspecified and unreliable implementation detailsunspecified and unreliable implementation details The best approach is to find and use the right library – but The best approach is to find and use the right library – but

that's easier said than donethat's easier said than done Be warned that it will often behave differently from C++ I/O, Be warned that it will often behave differently from C++ I/O,

and will usually be type- and size-unsafeand will usually be type- and size-unsafe

Stroustrup/ProgrammingStroustrup/Programming

2424

Other I/O InterfacesOther I/O Interfaces POSIXPOSIX

Facilities for sockets, shared-memory segments, SCSI and Facilities for sockets, shared-memory segments, SCSI and more – often poorly specified and hard to use reliablymore – often poorly specified and hard to use reliably

MPI (Message Passing Interface)MPI (Message Passing Interface) Can be used for specialist inter-process I/O for HPCCan be used for specialist inter-process I/O for HPC

CUDACUDA Can be used for specialist CPU-GPU I/OCan be used for specialist CPU-GPU I/O

HDF (Hierarchical Data Format)HDF (Hierarchical Data Format) Widely used interface for storing numerical dataWidely used interface for storing numerical data There are a zillion other such formats in use, especially in the There are a zillion other such formats in use, especially in the

commercial arenacommercial arena

Stroustrup/ProgrammingStroustrup/Programming

2525

Next LectureNext Lecture Graphical outputGraphical output

Creating a windowCreating a window Drawing graphsDrawing graphs

IMPORTANTIMPORTANT You need to download and install the rest of the materialsYou need to download and install the rest of the materials On any non-Linux system, this is likely to be much harder On any non-Linux system, this is likely to be much harder

than for the basic materialsthan for the basic materials It may also be for some Linux systems, depending on what It may also be for some Linux systems, depending on what

other software is already installedother software is already installed http://www.ucs.cam.ac.uk/docs/course-notes/unix-http://www.ucs.cam.ac.uk/docs/course-notes/unix-

courses/CPLUSPLUScourses/CPLUSPLUS

Stroustrup/ProgrammingStroustrup/Programming