Upload
trandiep
View
216
Download
1
Embed Size (px)
Citation preview
Chapter 11aOther Approaches to I/O
Nick MaclarenNick Maclarenhttp://www.ucs.cam.ac.uk/docs/course-notes/unix-http://www.ucs.cam.ac.uk/docs/course-notes/unix-
courses/CPLUSPLUScourses/CPLUSPLUS
This was written by me, not Bjarne StroustrupThis was written by me, not Bjarne Stroustrup
22
Warning: HeresyWarning: Heresy Many people don't like C++'s approach to I/O much, Many people don't like C++'s approach to I/O much,
and you will often need to use other ones, so here is a and you will often need to use other ones, so here is a very brief summary of the most important alternative very brief summary of the most important alternative approaches, covering three aspectsapproaches, covering three aspects As far as textual input is concernedAs far as textual input is concerned As far as textual output is concernedAs far as textual output is concerned The basic file and data transfer levelThe basic file and data transfer level
These objections have These objections have NOTHINGNOTHING to do with the C++ to do with the C++ language as such, and little with the book's approachlanguage as such, and little with the book's approach They are entirely to do with the I/O library designThey are entirely to do with the I/O library design
Stroustrup/ProgrammingStroustrup/Programming
33
Textual InputTextual Input No language handles this very wellNo language handles this very well
It's a fundamentally hard problem, on many levelsIt's a fundamentally hard problem, on many levels The C++ approach works as well as any otherThe C++ approach works as well as any other If you have serious trouble, consider alternatives If you have serious trouble, consider alternatives
One common alternative is using C, but:One common alternative is using C, but: C C scanfscanf is is dangerousdangerous and is described under output and is described under output If possible, avoid this – it's trickier to get right than most If possible, avoid this – it's trickier to get right than most
people thinkpeople think
Stroustrup/ProgrammingStroustrup/Programming
44
Textual InputTextual Input You can just read characters, and do all decoding You can just read characters, and do all decoding
yourself, as covered in the courseyourself, as covered in the course Painful, but you can decode any text formatPainful, but you can decode any text format If it is done in right way, it is not all that painfulIf it is done in right way, it is not all that painful If it is If it is notnot done in the right way, you done in the right way, you willwill introduce serious introduce serious
bugsbugs You have already covered all of the techniques that you need, You have already covered all of the techniques that you need,
and the book explains how to use themand the book explains how to use them Don't forget the techniques in chapters 6 and 7 – this is a Don't forget the techniques in chapters 6 and 7 – this is a
form of parsing, after allform of parsing, after all
Stroustrup/ProgrammingStroustrup/Programming
55
Regular Expressions Regular Expressions These are extremely useful for decoding input formatsThese are extremely useful for decoding input formats
They are introduced in chapter 23 and C++11 includes a They are introduced in chapter 23 and C++11 includes a <regex><regex> header header
Most compilers have it, but there may well be some that Most compilers have it, but there may well be some that don't, or have a slightly non-standard onedon't, or have a slightly non-standard one
There is also an open source C library, PCRE, which There is also an open source C library, PCRE, which currently is more portablecurrently is more portable
I give a two-afternoon course on them, best attended after I give a two-afternoon course on them, best attended after you have worked through chapter 23you have worked through chapter 23
They are not covered further in my version of the courseThey are not covered further in my version of the course
Stroustrup/ProgrammingStroustrup/Programming
66
Using a PreprocessorUsing a Preprocessor This does NOT mean the C preprocessor!This does NOT mean the C preprocessor!
This means using another program to check the data and This means using another program to check the data and convert it to a format that is easy to handle in C++convert it to a format that is easy to handle in C++
I recommend this for Fortran, which is very bad at reading I recommend this for Fortran, which is very bad at reading free-format, but it could equally well be used for C++free-format, but it could equally well be used for C++
The best language is almost always Python - or perhaps Perl, The best language is almost always Python - or perhaps Perl, but but onlyonly if you already know Perl if you already know Perl
But you could also use Fortran to read Fortran-style or fixed-But you could also use Fortran to read Fortran-style or fixed-format data – it may sound perverse, but is notformat data – it may sound perverse, but is not
Etc.Etc.
Stroustrup/ProgrammingStroustrup/Programming
77
Textual OutputTextual Output My dislike of C++'s approach is mainly its use of I/O My dislike of C++'s approach is mainly its use of I/O
manipulators (e.g. manipulators (e.g. setwsetw) to control formatting ) to control formatting It would be much cleaner to separate formatting (which is It would be much cleaner to separate formatting (which is
simple character manipulation) from I/O (which is about data simple character manipulation) from I/O (which is about data transfer)transfer)
The way that they are implemented makes them The way that they are implemented makes them veryvery painful painful to use for formatting compared to Fortran, C or even Cobolto use for formatting compared to Fortran, C or even Cobol
E.g. 'stickiness' is typically E.g. 'stickiness' is typically notnot what is wanted what is wanted They make formatted output about as painful as unformatted They make formatted output about as painful as unformatted
input in Fortran, which is saying something!input in Fortran, which is saying something! Many C++ programmers also find them inconvenient, and Many C++ programmers also find them inconvenient, and
sometimes use sometimes use worseworse methods methods
Stroustrup/ProgrammingStroustrup/Programming
88
C C printfprintf, , scanfscanf etc. etc.
Many people use C I/O in C++, but:Many people use C I/O in C++, but: It works for built-in types only (i.e. not your own classes) It works for built-in types only (i.e. not your own classes) It is type-unsafe, so you get little static checking, though It is type-unsafe, so you get little static checking, though gccgcc does better than most compilers does better than most compilers
It is size-unsafe (i.e. bad data can overwrite anything and It is size-unsafe (i.e. bad data can overwrite anything and cause chaos) – this is particularly so for cause chaos) – this is particularly so for scanfscanf – and you – and you won't get any help finding such errorswon't get any help finding such errors
If you want to do this, then:If you want to do this, then: Encapsulate your I/O in operators, functions and methods, as Encapsulate your I/O in operators, functions and methods, as
described in the book (or as below)described in the book (or as below) Use C I/O Use C I/O onlyonly within those functions and methods within those functions and methods Check all of your types and sizes before doing the transferCheck all of your types and sizes before doing the transfer
Stroustrup/ProgrammingStroustrup/Programming
99
Another ApproachAnother Approach This is to encapsulate the formatting in a single This is to encapsulate the formatting in a single
function, which takes the value and options, and function, which takes the value and options, and returns a stringreturns a string You can have several such functions for a single classYou can have several such functions for a single class They can have a single name, if their arguments are differentThey can have a single name, if their arguments are different The slight extra cost can be removed by the use of advanced The slight extra cost can be removed by the use of advanced
C++ techniques, if it mattersC++ techniques, if it matters This works very well together with almost all of the This works very well together with almost all of the
standard C++ I/O facilitiesstandard C++ I/O facilities I.e. everything except the formatting manipulatorsI.e. everything except the formatting manipulators It is a It is a veryvery small change from the descriptions in the book small change from the descriptions in the book
and entirely compatible with the book's approach!and entirely compatible with the book's approach!
Stroustrup/ProgrammingStroustrup/Programming
1010
Using FunctionsUsing Functions With this approach:With this approach:
cout << fixed << setprecision(2) << setw(10) <<cout << fixed << setprecision(2) << setw(10) <<1.23 << scientific << setprecision(3) <<1.23 << scientific << setprecision(3) <<setw(12) << 9.8e7 << general << setprecision(6);setw(12) << 9.8e7 << general << setprecision(6);
becomes:becomes:cout << put_fix(1.23,10,2) << cout << put_fix(1.23,10,2) << put_sci(9.8e7,12,3);put_sci(9.8e7,12,3);
and the first function becomes:and the first function becomes:string put_fix (double val, int width, int prec) string put_fix (double val, int width, int prec) {{ ostringstream str;ostringstream str; str << fixed << setprecision(prec) << str << fixed << setprecision(prec) << setw(width) << val; setw(width) << val; return str.str();return str.str();}} Stroustrup/ProgrammingStroustrup/Programming
1111
Using MethodsUsing Methods The following:The following:
class wombat { . . . };class wombat { . . . };
wombat a;wombat a;
cout << setw(...) << fixed(...) << a;cout << setw(...) << fixed(...) << a;
becomes:becomes:
class wombat { . . . };class wombat { . . . };
wombat a;wombat a;
cout << a.format(...);cout << a.format(...);
You can overload the method You can overload the method format()format() And you can have more than one such methodAnd you can have more than one such method
Stroustrup/ProgrammingStroustrup/Programming
1212
A Possible ExampleA Possible Example So the following:So the following:
cout << number.fixed();cout << number.fixed();
cout << number.fixed(5);cout << number.fixed(5);
cout << number.fixed(10,5);cout << number.fixed(10,5); could be the equivalent of:could be the equivalent of:
printf(“%f”,number);printf(“%f”,number);
printf(“%.5f”,number);printf(“%.5f”,number);
printf(“%10.5f”,number);printf(“%10.5f”,number); or:or:
WRITE('G0') numberWRITE('G0') number
WRITE('F0.5') numberWRITE('F0.5') number
WRITE('F10.5') numberWRITE('F10.5') number
Stroustrup/ProgrammingStroustrup/Programming
1313
Implementing ThoseImplementing Those You have already learnt everything you needYou have already learnt everything you need
You can use built-in C++ I/O inside the functions to do the You can use built-in C++ I/O inside the functions to do the conversion, and handle only the options yourselfconversion, and handle only the options yourself
You can use C I/O instead, if it is easier – but take care!You can use C I/O instead, if it is easier – but take care! Or you can convert it to characters the hard wayOr you can convert it to characters the hard way
A summary of the differences is:A summary of the differences is: You don't define an 'You don't define an 'ostream & operator<<ostream & operator<<'' You do define functions/methods that return stringsYou do define functions/methods that return strings Everything else in the book applies unchangedEverything else in the book applies unchanged
When you learn them (chapters 13 to 15), default When you learn them (chapters 13 to 15), default arguments and derived classes will simplify the taskarguments and derived classes will simplify the task
Stroustrup/ProgrammingStroustrup/Programming
1414
An ExerciseAn Exercise
File “File “elements.txtelements.txt” contains element properties, from ” contains element properties, from ““http://en.wikipedia.org/wiki/List_of_elementshttp://en.wikipedia.org/wiki/List_of_elements”” The first line is a headerThe first line is a header
Z Sym Element Group Period Weight Density Melt Boil Heat Neg AbundanceZ Sym Element Group Period Weight Density Melt Boil Heat Neg Abundance
It then contains the values, for the first 92 elements, sanitisedIt then contains the values, for the first 92 elements, sanitised
Do the following:Do the following: Define an element class, with methods to return each property Define an element class, with methods to return each property
as a string, using traditional conventionsas a string, using traditional conventions Read the data into a vector of elements, and print a table of Read the data into a vector of elements, and print a table of
selected properties (your choice), properly alignedselected properties (your choice), properly aligned Left-justify strings, right-justify integers, and align the Left-justify strings, right-justify integers, and align the
decimal point of real numbersdecimal point of real numbers Use fixed or scientific, at choice, but provide a precision Use fixed or scientific, at choice, but provide a precision
argument for each method for a real propertyargument for each method for a real property
Stroustrup/ProgrammingStroustrup/Programming
1515
Don't Believe Me?Don't Believe Me? Then you should try the exercise both using I/O Then you should try the exercise both using I/O
manipulators directly and using my preferred approachmanipulators directly and using my preferred approach Neither is “right” or “wrong” - it's Neither is “right” or “wrong” - it's youryour choice choice The principles of encapsulation, error checking, and the use The principles of encapsulation, error checking, and the use
of classes, are the same for bothof classes, are the same for both Lots of other people, libraries and languages use similar Lots of other people, libraries and languages use similar
approaches to the one I describe – and have done for many approaches to the one I describe – and have done for many decadesdecades
I find that this approach is easier to write, use and debug, and I find that this approach is easier to write, use and debug, and more flexible – so do many other peoplemore flexible – so do many other people
Stroustrup/ProgrammingStroustrup/Programming
1616
Boost::formatBoost::format The boost library has a format module, which is a sort The boost library has a format module, which is a sort
of hybrid between C++ manipulators, C of hybrid between C++ manipulators, C printfprintf and and the approach described abovethe approach described above Unlike C Unlike C printfprintf, it is type- and size-safe, it is type- and size-safe It doesn't handle user-defined classes well, but it does handle It doesn't handle user-defined classes well, but it does handle
them (unlike C them (unlike C printfprintf)) Unfortunately, boost is a bit of a problem (well, more than a Unfortunately, boost is a bit of a problem (well, more than a
bit), and is described laterbit), and is described later However, if this does what you want, and especially if you However, if this does what you want, and especially if you
use boost anyway, it's worth consideringuse boost anyway, it's worth considering
Stroustrup/ProgrammingStroustrup/Programming
1717
File and Transfer ModelFile and Transfer Model This probably won't affect you until you write This probably won't affect you until you write
'production' scientific programs'production' scientific programs C's and C++'s model is derived from Unix, and has C's and C++'s model is derived from Unix, and has
two major flawstwo major flaws Its error handling is extremely dangerousIts error handling is extremely dangerous Its handling of non-trivial files is very poorIts handling of non-trivial files is very poor
The book covers some of the first aspect, but I am The book covers some of the first aspect, but I am repeating it and stressing it hererepeating it and stressing it here
You don't use non-trivial files? Think againYou don't use non-trivial files? Think again 'Terminals', pipes and I/O through ssh are all non-trivial'Terminals', pipes and I/O through ssh are all non-trivial Even files kept on a file server are often non-trivialEven files kept on a file server are often non-trivial
Stroustrup/ProgrammingStroustrup/Programming
1818
Error HandlingError Handling C's and C++'s model fails unsafe, which is very bad software C's and C++'s model fails unsafe, which is very bad software
engineeringengineering If you don't test for failure, it will carry on regardless; throwing If you don't test for failure, it will carry on regardless; throwing
an exception would be better, in all casesan exception would be better, in all cases Also, clearing an error state is wrong; it should be cleared as a Also, clearing an error state is wrong; it should be cleared as a
side-effect of recovering from the failureside-effect of recovering from the failure There are also very nasty issues to do with real I/O errors (i.e. There are also very nasty issues to do with real I/O errors (i.e.
ones generated by the hardware or operating system)ones generated by the hardware or operating system)
For all those reasons, you should For all those reasons, you should alwaysalways encapsulate your I/O encapsulate your I/O in functions or methods and include your own checking therein functions or methods and include your own checking there
When you hit a problem (and you may well), you can easily When you hit a problem (and you may well), you can easily add more error detection and (if needed) fixup codeadd more error detection and (if needed) fixup code
Stroustrup/ProgrammingStroustrup/Programming
1919
Non-trivial File TypesNon-trivial File Types There are also very serious problems with non-trivial There are also very serious problems with non-trivial
file types; this is not just a C++ issue, or even a C/C++ file types; this is not just a C++ issue, or even a C/C++ one – it's basic to the Unix file modelone – it's basic to the Unix file model
The good news: The good news: effectively alleffectively all file types can be used as file types can be used as simplex streams (i.e. input or output, but not both)simplex streams (i.e. input or output, but not both) Don't use repositioning, and don't close and reopenDon't use repositioning, and don't close and reopen That will work on any system and almost any file type that That will work on any system and almost any file type that
you will encounteryou will encounter
End of problem (in 99% of cases)End of problem (in 99% of cases)
Stroustrup/ProgrammingStroustrup/Programming
2020
Non-trivial File TypesNon-trivial File Types Simple disk files on Simple disk files on locallocal file systems can be both file systems can be both
repositioned and used for duplex streams (i.e. both repositioned and used for duplex streams (i.e. both input and output), as in the book's chapter 11input and output), as in the book's chapter 11
For files held on file servers, as on most departmental For files held on file servers, as on most departmental systems and all HPC systems, use only the options I systems and all HPC systems, use only the options I describe in my extra slides on chapter 11describe in my extra slides on chapter 11 Many systems' file servers allow rather more, but some do Many systems' file servers allow rather more, but some do
not, or allow them but implement them incompatiblynot, or allow them but implement them incompatibly Very weird things can happen when you push them too farVery weird things can happen when you push them too far
If you want to know more, please askIf you want to know more, please ask
Stroustrup/ProgrammingStroustrup/Programming
2121
Unix-like File ModelsUnix-like File Models Unix was designed as a computer science researcher's Unix was designed as a computer science researcher's
workbench in 1970workbench in 1970 It was never designed for production use or reliabilityIt was never designed for production use or reliability The file model was designed for simplicity on the near-trivial The file model was designed for simplicity on the near-trivial
I/O needed for that kind of computer scienceI/O needed for that kind of computer science It was designed for systems with local disks It was designed for systems with local disks onlyonly Its model doesn't match other systems (e.g. mainframes or Its model doesn't match other systems (e.g. mainframes or
embedded) or even non-trivial file types very wellembedded) or even non-trivial file types very well C++'s basic file and I/O model is derived from UnixC++'s basic file and I/O model is derived from Unix Microsoft systems' modern file model has been derived Microsoft systems' modern file model has been derived
from Unix, but is not quite the samefrom Unix, but is not quite the same A line separator is two characters (CR-LF) and not one as in A line separator is two characters (CR-LF) and not one as in
Unix (LF) or MacOS (CR)Unix (LF) or MacOS (CR) There are some other minor, but significant, differencesThere are some other minor, but significant, differences
Stroustrup/ProgrammingStroustrup/Programming
2222
An Example: SocketsAn Example: Sockets Sockets are streams, but not just byte streams, and not Sockets are streams, but not just byte streams, and not
at all file-like in their advanced propertiesat all file-like in their advanced properties The 'packet' boundaries are semantically significantThe 'packet' boundaries are semantically significant There are control operations that can be embedded in the There are control operations that can be embedded in the
streamstream They cannot be repositioned in any wayThey cannot be repositioned in any way They cannot even be closed and reopenedThey cannot even be closed and reopened They are duplex streams, but their input and output are They are duplex streams, but their input and output are
mostly separatemostly separate The same applies to pipes and 'terminal' I/O, which are The same applies to pipes and 'terminal' I/O, which are
closely related to socketsclosely related to sockets Use all of these as simplex streams, and no problemUse all of these as simplex streams, and no problem
Beyond that needs real expertiseBeyond that needs real expertise
Stroustrup/ProgrammingStroustrup/Programming
2323
Specialist FilesSpecialist Files Many programs use specialist data formats, usually Many programs use specialist data formats, usually
accessed through a special libraryaccessed through a special library Not all – e.g. the book mentions Not all – e.g. the book mentions XMLXML
Any files that are Any files that are veryvery unlike disks are almost always unlike disks are almost always accessed through a special libraryaccessed through a special library You can use C++ I/O for such uses, but it's dependent on You can use C++ I/O for such uses, but it's dependent on
unspecified and unreliable implementation detailsunspecified and unreliable implementation details The best approach is to find and use the right library – but The best approach is to find and use the right library – but
that's easier said than donethat's easier said than done Be warned that it will often behave differently from C++ I/O, Be warned that it will often behave differently from C++ I/O,
and will usually be type- and size-unsafeand will usually be type- and size-unsafe
Stroustrup/ProgrammingStroustrup/Programming
2424
Other I/O InterfacesOther I/O Interfaces POSIXPOSIX
Facilities for sockets, shared-memory segments, SCSI and Facilities for sockets, shared-memory segments, SCSI and more – often poorly specified and hard to use reliablymore – often poorly specified and hard to use reliably
MPI (Message Passing Interface)MPI (Message Passing Interface) Can be used for specialist inter-process I/O for HPCCan be used for specialist inter-process I/O for HPC
CUDACUDA Can be used for specialist CPU-GPU I/OCan be used for specialist CPU-GPU I/O
HDF (Hierarchical Data Format)HDF (Hierarchical Data Format) Widely used interface for storing numerical dataWidely used interface for storing numerical data There are a zillion other such formats in use, especially in the There are a zillion other such formats in use, especially in the
commercial arenacommercial arena
Stroustrup/ProgrammingStroustrup/Programming
2525
Next LectureNext Lecture Graphical outputGraphical output
Creating a windowCreating a window Drawing graphsDrawing graphs
IMPORTANTIMPORTANT You need to download and install the rest of the materialsYou need to download and install the rest of the materials On any non-Linux system, this is likely to be much harder On any non-Linux system, this is likely to be much harder
than for the basic materialsthan for the basic materials It may also be for some Linux systems, depending on what It may also be for some Linux systems, depending on what
other software is already installedother software is already installed http://www.ucs.cam.ac.uk/docs/course-notes/unix-http://www.ucs.cam.ac.uk/docs/course-notes/unix-
courses/CPLUSPLUScourses/CPLUSPLUS
Stroustrup/ProgrammingStroustrup/Programming