How To Write An Input Loop In C++

Introduction

This note is provided so students can see examples, and understand, the simple art of writing appropriate input loops in the C++ language. While this may seem trivial, it is done incorrectly by so many that this note seems necessary. This is a variant of the input loop in C note from CIS4282. It is well worth remembering that most of the ravages of the Internet worm of 1988 could have been avoided if these rules had been followed.

Rule 1: Do not allow input to overflow your input buffers.
Rule 2: Do not allow illegal or undesirable input to put you into an infinite loop
Rule 3: Do not allow unexpected end-of-file or device errors to put you into an infinite loop
Rule 4: Determine an input policy and stick to it.

Note that under C++, there are a large number of variations on the input and output stream functions, allowing white space to be skipped, and buffers to be read into variables of non-char type.

Think of these like scanf() variations; that is, use them as convenient, but don't assume they relieve you of the responsibility to robustly parse your input.

Character-oriented input

Character-oriented input is appropriate to several forms of processing including statistical input analysis, input transformation, form filling, and pattern matching applications. Character-oriented input is not usually appropriate for command line interfaces.

The main function available for character-oriented input is istream::get().

This function returns values of type int, which represent the character read as a small positive number.

If an error occurs, or if end-of-file is encountered, the symbolic constant EOF is returned. EOF is defined in <stdio.h> and in <iostream.h>.

Every input loop must check for these conditions.

 
#include <iostream.h>
...
int ch; /* The value must be of int type */
...
while ( (ch = cin.get()) != EOF ) // Similarly for any istream
{ 
  ... /* Work with the input character ch */
} 

Line-oriented input

Line-oriented input is most widely used in command-line or command-language type applications. Line-oriented input is not usually appropriate to stream-based or communication-based applications.

The main function available for line-oriented input is istream::getline().

The getline() member function has corrected the drawbacks associated with the gets() and fgets() functions in <stdio.h>. The instances of this function will fill a buffer up to a provided size, and stop at a provided delimiter ('\n' by default). The istream::get() function is also overloaded with the same signature, but the behavior is slightly different -- istream::get() is not upset if no delimiter is found before the buffer is full, and get() leaves the delimiter for the next reader.

A line-oriented processing algorithm might look as follows:

char buffer[BUF_SIZE];
...
while ( cin.getline(buffer, sizeof(buffer)) ) // ios::operator void *
{ 
  ... /* Process the current line */
}

Word-oriented input

Word-oriented processing is mostly used when scanning input generated from another application (e.g., publishing, technical papers, communications lines).

The definition of a word varies widely; you will probably need to write your own word-recognition procedure.

The term token is a more formal term used for the recognition-worthy components of an input stream.

If your words are nicely delimited, istream::get() or istream::getline() will work well for you.

A very nice tool for this in a larger program is lex. Lex (or its more modern derivative flex) will generate a scanner capable of recognizing an easily-modified set of tokens based on regular expression matching.

A simple word scanner is presented below, with the policy that a word consists of any series of alphabetic characters or any series of numeric characters or any single non-space character.

The scanner returns the value of the first character in the word. The caller provides the buffer to store the word in; the word is truncated if the buffer is not large enough. EOF is returned if the end-of-file or an error was encountered before a word has been read.

int getword ( istream& is, char *buffer, int maxlen)
{
  enum { alpha, numeric } type;
  int i=0, ch;
  while ((ch=is.get()) != EOF )
  { 
    if ( isalpha (ch) ) 
    {
       if ( i == 0 ) 
        type = alpha; 
      else if ( type == numeric ) 
        break; 
    } 
    else if ( isdigit (ch) ) 
    { 
      if ( i == 0 ) 
        type = numeric; 
      else if ( type == alpha ) 
        break; 
    } 
    else 
      break; 
    if ( i < maxlen - 1 ) /* Leave room for the '\0' */ 
      buffer[i++] = ch; 
  } 
  buffer[i] = '\0'; 
  if (ch == EOF && i == 0) 
     return EOF; 
  else 
    is.putback(ch); /* Put back one that didn't fit */ 
  return buffer[0];
}
An algorithm for using word-oriented input might look as follows:

char buffer[BUF_SIZE];
int word;
...
while ( ( word = getword(cin, buffer, sizeof(buffer)) ) != EOF) 
{ 
  ... /* Process the current word */
}