Doc. no.   WG21/N1841=05-0101
Date:        2005-08-23
Project:     Programming Language C++
Reply to:   Beman Dawes <[email protected]>

Filesystem Library Proposal for TR2

Introduction
Motivation and Scope
Impact on the Standard
Important Design Decisions
Proposed Text for TR2
    Introductory chapter
    Filesystem library chapter
        Definitions
        Requirements
            Requirements on programs
            Requirements on implementations
        Header <filesystem> synopsis
        Path traits
        Class template basic_path
            Pathname formats
            Pathname grammar
            Filename conversion
            Requirements
            basic_path constructors
            basic_path assignments
            basic_path comparisons
            basic_path modifiers
            basic_path operators
            basic_path observers
            basic_path iterators
        Class template basic_filesystem_error
            basic_filesystem_error constructors
            basic_filesystem_error observers
        Class template basic_directory_entry
            basic_directory_entry constructors
            basic_directory_entry modifiers
            basic_directory_entry observers
            basic_directory_entry comparisons
        Class template basic_directory_iterator
            basic_directory_iterator constructors
        Class template basic_recursive_directory_iterator
        Non-member operational functions
            Status functions
            Predicate functions
            Attribute functions
            Other operations functions
            Convenience functions
        Additions to header <cerrno>
        Additions to header <fstream>
Suggestions for <fstream> implementations
Path decomposition table
Issues
Acknowledgements
References

Introduction

This paper proposes addition of a  filesystem library component to the C++ Standard Library Technical Report 2. The proposal is based on the Boost Filesystem Library (see www.boost.org/libs/filesystem).

The library provides portable facilities to query and manipulate paths, files, and directories. The Boost version of the library is widely used. It would be a pure addition to the C++ standard, leaving in place existing standard library functionality in the relatively few areas where there is overlap.

Users say they prefer the Boost Filesystem Library interface to native operating system or POSIX API's, even in code without portability requirements, because the design follows modern C++ practice.

The proposed text includes an example of a program using the library.

Motivation and Scope

Why is this important?

The motivation for the library is the desire to perform safe, portable, script-like filesystem operations from within C++ programs. Because the C++ Standard Library currently contains no facilities for such filesystem tasks as directory iteration or directory creation, programmers currently must rely on operating system specific interfaces, making it difficult to write portable programs.

The intent is not to compete with Python, Perl, or shell scripting languages, but rather to provide file system operations where C++ is already the language of choice. The design encourages, but does not require, safe and portable usage.

What kinds of problems does it address, and what kinds of programmers is it intended to support?

The library addresses everyday needs, for both application programs and libraries. It is useful across every application domain that uses files. It is intended to be useful to all levels of programmers, from rank beginners to seasoned experts.

Is it based on existing practice?

Yes, very much so. The proposal is based on the Boost Filesystem Library, which has been in use since 2002 and by now is in very wide use. For example, current versions of Adobe Systems products such as Adobe Reader use the Boost Filesystem Library on the many platforms they support.

Note, however, that until recently all the Boost experience was with a narrow-character only version of the library. The internationalized version as described in this proposal is just starting to be used, and will not be fully released until Boost release 1.34.

The underlying mechanisms have been in use for decades on the world's most wide-spread operating systems, such as POSIX, Windows, and various mainframe operating systems. What this proposal brings to the table is an approach that is C++ Standard Library friendly and fully internationalized.

Is there a reference implementation?

Yes. The Boost Filesystem Library is freely and publicly available. The Boost library will track the TR2 proposed library as the proposal evolves.

Impact on the Standard

What does it depend on, and what depends on it?

It depends on some standard library components, such as basic_string. No other proposals depend on it.

If a revision to the Code Conversion Proposal (See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2004/n1683.html) is accepted, it may be advantageous for the Filesystem Library to use that library rather than the current code conversion facilities proposed below.

Is it a pure extension, or does it require changes to standard components?

Most of the proposed library is a pure extension.

There are additions to header <cerrno>. Since the critical portions that might require change to C headers (always a sore point) are already mandated for POSIX compliance, and codify existing practice for many non-POSIX implementations such as for Windows, it is not expected that they will cause any problems.

There are additions to header <fstream>.  These have been carefully specified to avoid breaking existing code in common operating environments such as POSIX, Windows, and OpenVMS. See Suggestions for <fstream> implementations for techniques to avoid breaking existing code in other environments, particularly on operating systems allowing slashes in filenames.

Can it be implemented using today's compilers, or does it require language features that will only be available as part of C++0x?

It can be (and has been) implemented with today's compilers.

There is one minor function that can best be implemented by an addition to current C++ runtime libraries, although an acceptable workaround is documented.

On operating systems with built-in support for wide-character file names, such as Windows, high-quality implementation of the header <fstream> additions require an addition to the C++ Standard Library implementation. The addition is relatively small and localized. There is a workaround that avoids modifying the standard library, but it is very much a hack and depends on a Windows feature (8.3 filename support) which some users disable, thereby disabling the workaround. The issue doesn't affect implementations on operating systems which only support narrow character file names.

Important Design Decisions

Why did you choose the specific design that you did?

Many of the specific design decisions were driven by the desire to provide a modern C++ interface that works well with the C++ Standard Library. The intent is that Standard Library users can become comfortable with the Filesystem Library in very short order.

The proposed library encourages both syntactic and semantic portability, yet does not force implementors into heroic efforts on hopeless systems. This balances the benefits to users of both code and knowledge portability with the realities faced by implementors on some operating systems.

Because of the desire to support simple "script-like" usage, use cases often drove design choices. For example, users can write if (exists("foo")) rather than the lengthier if (exists(path("foo"))).

Because filesystem operations often encounter unexpected runtime errors, the library reports runtime errors via C++ exceptions, and ensures enough information is provided for meaningful error messages, including internationalized error messages.

What alternatives did you consider, and what are the tradeoffs?

Additional observers and modifiers for file system attributes. Attribute functions which cannot supply portable semantics are not provided, avoiding the illusion of portability in cases where it cannot in fact exist.

A larger number of operational convenience functions. Convenience functions (functions which can be portably created by composition from basic functions) were not provided unless there was widespread agreement on usefulness and need.

Compile-time or run-time options for operational functions. Numerous trial implementations were abandoned because the added complexity out weighed the benefits, and because consensus could not be reached on the feature set.

Automatic path name checking. This feature, supplied by the Boost library for several years, allow users to specify both default and per constructor path name checking, allowed the desired degree of portability to be automatically enforce. This implicit name checking was abandoned because of user confusion and complaints.

Separate path types for regular file and directory pathnames. Pathname formats that use different syntax for regular pathnames versus directory pathnames are passing into extinction. Why prolong the agony at the cost of torturing those using modern systems? It is perhaps significant that one of the few web sites dedicated to preserving a dual pathname format operating system is named Deathrow (http://deathrow.vistech.net/).

Single path type which can at runtime accept narrow or wide character pathnames. Although certainly interesting, and possibly superior, such a design would not interoperate well with the current Standard Library's compile-time typed basic_string. A new runtime polymorphic string class would be the best place to experiment with this concept, not a path class.

What are the consequences of your choices, for users and implementors?

The design has evolved over a period of four years of actual experience by Boost users, and the most frequent causes of user complaints (such as enforced name-checking and several over-strict preconditions) were eliminated. The TR process will allow further refinement. The intent is to ensure user needs are met.

Because the Boost implementation is tested and used in a wide range of POSIX and Windows environments, many implementation concerns have already been addressed.

What decisions are left up to implementors?

Because implementations of the library are dependent on facilities of the underlying operating system, implementors are given unusual freedom to redefine semantics of the library. That being said, implementors are given strong normative encouragement to provide the TR described semantics whenever feasible.

If there are any similar libraries in use, how do their design decisions compare to yours?

There are a number of libraries which address the problem domain. Most of the C/C++ libraries have C, rather than C++ interfaces. For example, see the Apache Portable Runtime Project (http://apr.apache.org). The ACE toolkit (http://www.cs.wustl.edu/~schmidt/ACE.html) uses a C++ approach, but doesn't mesh well with the C++ Standard Library. For example, the ACE directory iterator differs greatly from Standard Library iterator requirements.

Proposed Text for Technical Report 2

Gray-shaded italic text is commentary on the proposal. It is not to be added to the TR.

Italic text is editorial guidance. It is not to be added to the TR.

Add to the introductory section of the TR:

The following standard contains provisions which, through reference in this text, constitute provisions of this Technical Report. At the time of publication, the editions indicated were valid. All standards are subject to revision, and parties to agreements based on this Technical Report are encouraged to investigate the possibility of applying the most recent editions of the standard indicated below. Members of IEC and ISO maintain registers of currently valid International Standards.

ISO/IEC 9945:2003, with the indicated corrections, is hereinafter called POSIX.

Some library behavior in this Technical Report is defined by reference to POSIX. How such behavior is actually implemented is unspecified.

[Note: This constitutes an "as if" rule for implementation of operating system dependent behavior. Presumably implementations will actually call native operating system API's. --end note]

Implementations are encouraged, but not required, to support such behavior as it is defined by POSIX. Implementations shall document any behavior that differs from the POSIX defined behavior. Implementations that do not support exact POSIX behavior are encouraged to provide behavior as close to POSIX behavior as is reasonable given the limitations of actual operating systems. If an implementation cannot provide any reasonable behavior, the implementation shall report an error in an implementation-defined manner.

[Note: Such errors might be reported by an #error directive, a static_assert, a basic_filesystem_error exception, a special return value, or some other manner. --end note]

Footnote 1: POSIX® is a registered trademark of The IEEE.

Footnote 2: UNIX® is a registered trademark of The Open Group.

Add a new clause to the TR:


Chapter (tbs) - Filesystem library


This clause describes components that C++ programs may use to interrogate and manipulate files (including directories), and certain of their attributes.

This clause applies only to hosted implementations (C++ Std, 1.4, Implementation compliance [intro.compliance]).

[Note: This clause applies to any hosted implementation. Specific operating systems such as OpenMVS3, UNIX, and Windows4 are mentioned only for purposes of illustration or to give guidance to implementors. No slight to other operating systems is implied or intended. --end note.]

Unless otherwise specified, all components described in this clause are declared in namespace std::tr2::sys.

[Note: The sys subnamespace prevents collisions with names already in the standard library and emphasizes reliance on the operating system dependent behavior inherent in file system operations. -- end note]

The Effects and Postconditions of functions described in this clause may not be achieved in the presence of race conditions. No diagnostic is required.

If the possibility of race conditions makes it unreliable for a program to test for a precondition before calling a function described in this clause, Requires is not specified for the condition. Instead, the condition is specified as a Throws condition.

[Note: As a design practice, preconditions are not specified when it is unreasonable for a program to detect them prior to calling the function. -- end note]

Some error conditions, such as empty path function arguments, are specified both in Requires and in Throws elements.

[Note: This dual specification is employed when an error condition is trivially detectable by the C++ program, is not subject to race conditions, and are serious errors or will be detected by most operating system API calls in any case.]

Footnote 3: OpenMVS® is a registered trademark of Hewlett-Packard Development Company.

Footnote 4: Windows® is a registered trademark of Microsoft Corporation.

Definitions

The following definitions shall apply to this clause:

File: An object that can be written to, or read from, or both. A file has certain attributes, including type. File types include regular file, symbolic link, and directory. Other types of files may be supported by the implementation.

File system: A collection of files and certain of their attributes.

Filename: The name of a file. The format is as specified by the POSIX Filename base definition.

Path: A sequence of elements which identify a location within a filesystem. The elements are the root-name, root-directory, and each successive filename. See Pathname grammar.

Pathname: A character string that represents a path.

Link: A directory entry object that associates a filename with a file. On some file systems, several directory entries can associate names with the same file.

Hard link: A link to an existing file. Some file systems support multiple hard links to a file. If the last hard link to a file is removed, the file itself is removed.

[Note: A hard link can be thought of as a shared-ownership smart pointer to a file. -- end note]

Symbolic link: A type of file with the property that when the file is encountered during pathname resolution, a string stored by the file is used to modify the pathname resolution.

[Note: A symbolic link can be thought of as a raw pointer to a file. If the file pointed to does not exist, the symbolic link is said to be a "dangling" symbolic link. -- end note]

Slash: The character '/', also known as solidus.

Dot: The character '.', also known as period.

Race condition: The condition that occurs when multiple threads, processes, or computers interleave access and modification of the same object within a file system.

Requirements

Requirements on programs

The arguments for template parameters named Path, Path1, or Path2 described in this clause shall be of type basic_path, or a class derived from basic_path, unless otherwise specified.

Requirements on implementations

Some function templates described in this clause have a template parameter named Path, Path1, or Path2. When called with a function argument s of type char* or std::string, the implementation shall treat the argument as if it were coded path(s). When called with a function argument s of type wchar_t* or std::wstring, the implementation shall treat the argument as if it were coded wpath(s). For functions with two arguments, implementations shall not supply this treatment when Path1 and Path2 are different types.

[Note: This "do-the-right-thing" rule allows users to write exists("foo"), taking advantage of class basic_path's string conversion constructor,  rather than the lengthier and more error prone exists(path("foo")). This is particularly important for the simple, script-like, programs which are an important use case for the library. Calling two argument functions with different types is a very rare usage, and may well be a coding error, so automatic conversion is not supported for such cases.

The implementation technique is unspecified. One possible implementation technique, using exists() as an example, is:

template <class Path>
  typename boost::enable_if<is_basic_path<Path>,bool>::type exists(const Path& p);
inline bool exists(const path& p) { return exists<path>(p); }
inline bool exists(const wpath& p) { return exists<wpath>(p); }

 The enable_if will fail for a C string or std::basic_string argument, which will then be automatically converted to a basic_path object via the appropriate basic_path conversion constructor.   -- end note]

The two overloads are not given in the normative text because:

Implementations of functions described in this clause are permitted to call the applications program interface (API) provided by the operating system. If such an operating system API call results in an error, implementations shall report the error by throwing exception basic_filesystem_error, unless otherwise specified.

[Note: Such exceptions and the conditions that cause them to be thrown are not explicitly described in each Throws element within this clause. Because hardware failures, network failures, race conditions, and a plethora of other errors occur frequently in file system operations, users should be aware that any file system operation, not matter how apparently innocuous, may throw an exception. -- end note]

Header <filesystem> synopsis

namespace std
{
  namespace tr2
  {
    namespace sys
    {
      template <class String, class Traits> class basic_path;

      struct path_format_t{};
      extern path_format_t portable;
      extern path_format_t native;
      
      struct path_traits;
      struct wpath_traits;

      typedef basic_path< std::string, path_traits >    path;
      typedef basic_path< std::wstring, wpath_traits >  wpath;

      template<class Path> struct is_basic_path;
      template<class Path> struct slash { static const char value = '/'; };
      template<class Path> struct dot { static const char value = '.'; };

      typedef int errno_type;  // type is determined by the C standard
      typedef implementation-defined system_error_type; // usually int

      template <class Path> class basic_filesystem_error;

      typedef basic_filesystem_error<path> filesystem_error;
      typedef basic_filesystem_error<wpath> wfilesystem_error;

      typedef bitmask-type status_flags; // C++ std, 17.3.2.1.2 Bitmask types [lib.bitmask.types]

      //  values are for exposition only; actual values are unspecified
      static const status_flags  error_flag(1);
      static const status_flags  not_found_flag(1<<1);
      static const status_flags  directory_flag(1<<2);
      static const status_flags  regular_flag(1<<3);
      static const status_flags  other_flag(1<<4);
      static const status_flags  symlink_flag(1<<5);

      struct symlink_t{};
      extern symlink_t symlink;

      template <class Path> class basic_directory_entry;

      typedef basic_directory_entry<path> directory_entry;
      typedef basic_directory_entry<wpath> wdirectory_entry;

      template <class Path> class basic_directory_iterator;

      typedef basic_directory_iterator<path> directory_iterator;
      typedef basic_directory_iterator<wpath> wdirectory_iterator;

      template <class Path> class basic_recursive_directory_iterator;

      typedef basic_recursive_directory_iterator<path> recursive_directory_iterator;
      typedef basic_recursive_directory_iterator<wpath> wrecursive_directory_iterator;

      //  status functions
      template <class Path>
        status_flags status(const Path& p, system_error_type* ec=0);
      template <class Path>
        status_flags status(const Path& p, const symlink_t&, system_error_type* ec=0);

      //  predicate functions
      template <class Path> bool exists(const Path& p);
      template <class Path> bool is_directory(const Path& p);
      template <class Path> bool is_regular(const Path& p);
      template <class Path> bool is_other(const Path& p);
      template <class Path> bool is_symlink(const Path& p);
      template <class Path> bool is_empty(const Path& p);
      template <class Path1, class Path2>
        bool equivalent(const Path1& p1, const Path2& p2);

      //  attribute functions
      template <class Path> Path current_path();
      template <class Path> const Path& initial_path();
      template <class Path> intmax_t file_size(const Path& p);
      template <class Path> std::time_t last_write_time(const Path& p);
      template <class Path>
        void last_write_time(const Path& p, const std::time_t new_time);

      //  operations functions
      template <class Path> bool create_directory(const Path& dp);
      template <class Path1, class Path2>
        void create_hard_link(const Path1& old_fp, const Path2& new_fp);
      template <class Path> bool remove(const Path& p);
      template <class Path1, class Path2>
        void rename(const Path1& from_p, const Path2& to_p);
      template <class Path1, class Path2>
        void copy_file(const Path1& from_fp, const Path2& to_fp);
      template <class Path> Path system_complete(const Path& p);
      template <class Path> Path complete(const Path& p, const Path& base=initial_path<Path>());
      errno_type lookup_errno(system_error_type code);
      void system_message(system_error_type code, std::string & target);
      void system_message(system_error_type code, std::wstring & target);

      //  convenience functions
      template <class Path> bool create_directories(const Path & p);
      template <class Path> typename Path::string_type extension(const Path & p);
      template <class Path> typename Path::string_type basename(const Path & p);
      template <class Path>
        Path replace_extension(const Path & p, const typename Path::string_type & new_extension);

    } // namespace sys
  } // namespace tr2
} // namespace std

Path traits

This subclause defines requirements on classes representing path behavior traits, and defines two classes that satisfy those requirements for paths based on string and wstring.. It also defines several path additional path traits structure templates, and defines several specializations of them.

Class template basic_path defined in this clause requires additional types, values, and behavior to complete the definition of its semantics.

For purposes of exposition, Traits behaves as if it is a class with private members bool m_locked, initialized false, and std::locale m_locale, initialized

Path Behavior Traits Requirements
Expression Requirements
Traits::external_string_type A typedef which is a specialization of basic_string. The value_type is a character type used by the operating system to represent pathnames.
Traits::internal_string_type A typedef which is a specialization of basic_string. The value_type is a character type to be used by the program to represent pathnames. Required be the same type as the basic_path String template parameter.
Traits::to_external( p, is ) is, converted by the m_locale codecvt facet to external_string_type.
Traits::to_internal( p, xs ) xs, converted by the m_locale codecvt facet to to internal_string_type.
Traits::imbue(loc) Effects: if m_locked, throw. Otherwise, m_locked = true; m_locale = loc;
Returns: void
Throws: basic_filesystem_error
Traits::imbue(loc, std::nothrow) Effects: if (!m_locked) m_locale = loc; bool temp(m_locked); m_locked = true;
Returns: temp

Type is_basic_path shall be a UnaryTypeTrait (TR1, 4.1). The primary template shall be derived directly or indirectly from std::tr1::false_type. Type is_basic_path shall be specialized for path, wpath, and any user-specialized basic_path types, and such specializations shall be derived directly or indirectly from std::tr1::true_type.

Structure templates slash and dot are supplied with values of type char. If a user-specialized basic_path has a value_type type which is not convertible from char, the templates  slash and dot shall be specialized to provide value with type which is convertible to basic_path::value_type.

Class template basic_path

namespace std
{
  namespace tr2
  {
    namespace sys
    {
      template <class String, class Traits> class basic_path
      {
      public:
        typedef basic_path<String, Traits> path_type;
        typedef String string_type;
        typedef typename String::value_type value_type;
        typedef Traits traits_type;
        typedef typename Traits::external_string_type external_string_type; 

        // constructors/destructor
        basic_path();
        basic_path(const basic_path& p);
        basic_path(const string_type& s, path_format_t=portable);
        basic_path(const value_type* s, path_format_t=portable);
        template <class InputIterator>
          basic_path(InputIterator first, InputIterator last, path_format_t=portable);

       ~basic_path();

        // assignments
        basic_path& operator=(const basic_path& p);
        basic_path& operator=(const string_type& s);
        basic_path& operator=(const value_type* s);
        template <class InputIterator>
          basic_path& assign(InputIterator first, InputIterator last, path_format_t=portable);

        // comparisons
        bool operator<(const basic_path& that) const;
        bool operator==(const basic_path& that) const;
        bool operator!=(const basic_path& that) const;
        bool operator>(const basic_path& that) const;
        bool operator<=(const basic_path& that) const;
        bool operator>=(const basic_path& that) const;

        // modifiers
        basic_path& operator/=(const basic_path& rhs);
        basic_path& operator/=(const string_type& s);
        basic_path& operator/=(const value_type* s);
        template <class InputIterator>
          basic_path& append(InputIterator first, InputIterator last, path_format_t=portable);

        basic_path& remove_leaf();

        // observers
        const string_type string() const;
        const string_type file_string() const;
        const string_type directory_string() const;

        const external_string_type external_file_string() const;
        const external_string_type external_directory_string() const;

        string_type  root_name() const;
        string_type  root_directory() const;
        basic_path   root_path() const;
        basic_path   relative_path() const;
        string_type  leaf() const;
        basic_path   branch_path() const;

        bool empty() const;
        bool is_complete() const;
        bool has_root_name() const;
        bool has_root_directory() const;
        bool has_root_path() const;
        bool has_relative_path() const;
        bool has_leaf() const;
        bool has_branch_path() const;

        // iterators
        class iterator;
        typedef iterator const_iterator;

        iterator begin() const;
        iterator end() const;

        // operators
        basic_path operator/(const basic_path& rhs) const;
        basic_path operator/(const string_type& s) const;
        basic_path operator/(const value_type* s) const;
        template <class InputIterator>
          basic_path concat(InputIterator first, InputIterator last, path_format_t=portable);
      };

    } // namespace sys
  } // namespace tr2
} // namespace std

A basic_path object stores a possibly empty path. The internal form of the stored path is unspecified.

Functions described in this clause which access files or their attributes do so by resolving a basic_path object into a particular file in a file hierarchy. The pathname, suitably converted to the string type, format, and encoding required by the operating system, is resolved as if by the POSIX Pathname Resolution mechanism. The encoding of the resulting pathname is determined by the Traits::to_external conversion function.

[Note: There is no guarantee that the path stored in a  basic_path object is valid for a particular operating system or file system. -- end note]

Some functions in this clause return basic_path objects for paths composed partly or wholly of pathnames obtained from the operating system. Such pathnames are suitably converted from the actual format and string type supplied by the operating system. The encoding of the resulting path is determined by the Traits::to_internal conversion function.

For member functions described as returning "const string_type" or "const external_string_type", implementations are permitted to return "const string_type&" or  "const external_string_type&" respectively.

[Note: This allows implementations to avoid unnecessary copies. Return-by-value is specified as const to ensure programs won't break if moved to a return-by-reference implementation. -- end note]

Pathname formats

There are two formats for string or sequence arguments that describe a path:

basic_path constructors with a path_format_t argument of native accept the native pathname format.

Implementations may define additional path_format_t argument values and associated formats.

All other string or sequence arguments that describe a path accept the portable pathname format. Implementations are encouraged to also accept the native pathname format if it is possible to distinguish the two in cases where interpretation differs. An implementation shall document whether or not the native pathname format is also accepted.

[Example:

-- OpenVMS: "SYS1::DISK1:[JANE.TYLER.HARRY]" is treated as a native pathname with a system name, drive name, and three directory filenames, rather than a portable pathname with one filename.

-- Windows: "c:\\jane\\tyler\\harry" is treated as a native pathname with a drive letter, root-directory, and three filenames, rather than a portable pathname with one filename.

-- Counter-example 1: An operating system that allows slashes in filenames and uses dot as a directory separator. Distinguishing between portable and native format argument strings or sequences is not possible as there is no other distinguishing syntax. The implementation does not accept native format pathnames unless the native argument is present.

-- Counter-example 2: An operating system that allows slashes in filenames and uses some unusual character as a directory separator. The implementation does accept native format pathnames without the additional native argument, which only has to be used for native format arguments containing slashes in filenames.

-- end example]

[Note: This duck-rule ("if it looks like a duck, walks like a duck, and quacks like a duck, it must be a duck") eliminates format confusion as a source of programmer error and support requests. -- end note]

If both the portable and native formats are accepted, implementations shall document what characters or character sequences are used to distinguish between portable and native formats.

[Note: Windows implementations are encouraged to define colons and backslashes as the characters which distinguish native from portable formats. --end note]

Pathname grammar

The grammar for the portable pathname format is as follows:

pathname:
            root-nameopt root-directoryopt relative-pathopt

root-name:
            implementation-defined

root-directory:
            slash
            root-directory slash
            implementation-defined

relative-path:
            filename
            relative-path slash
            relative-path slash filename

filename:
            name
            dot
            dot dot

slash:
            slash<Path>::value

dot:
            dot<Path>::value

The grammar is aligned with the POSIX  Filename, Pathname and Pathname Resolution definitions. Any conflict between the grammar and POSIX is unintentional. This technical report defers to POSIX.

The form of the above wording was taken from POSIX, which uses it in several places to defer to the C standard.

[Note: Windows implementations are encouraged to define slash slash name as a permissible root-name. POSIX permits, but does not require, implementations to do the same. Windows implementations are encouraged to define an additional root-directory element root_directory name. It is applicable only to the slash slash name form of root-name.

Windows implementations are encouraged to recognize a name followed by a colon as a native format root-name, and a backslash as a format element equivalent to slash. -- end note]

Filename conversion

When converting filenames to the native operating system format, implementations are encouraged, but not required, to convert otherwise invalid characters or character sequences to valid characters or character sequences. Such conversions are implementation-defined.

[Note: Filename conversion allows much wider portability of both programs and filenames that would otherwise be possible.

Implementations are encouraged to base conversion on existing standards or practice. Examples include the Uniform Resource Locator escape syntax of a percent sign ('%') followed by two hex digits representing the character value. On OpenVMS, which does not allow percent signs in filenames, a dollar sign ('$') followed by two hex digits is the existing practice, as is converting lowercase letters to uppercase. -- end note.]

The Boost implementation for Windows currently does not map invalid characters. Pending feedback from the LWG, Boost may settle on % hex hex as the preferred escape sequence. If so, should there be normative encouragement?

Requirements

The argument for the template parameter named String shall be a class that includes members with the same names, types, values, and semantics as class template basic_string.

The argument for the template parameter named Traits shall be a class that satisfies the requirements specified in the Path Behavior Traits Requirements table.

The argument for template parameters named InputIterator shall satisfy the requirements of an input iterator (C++ Std, 24.1.1, Input iterators [lib.input.iterators]) and shall have a value type convertible to basic_path::value_type.

Some function templates with a template parameter named InputIterator also have non-template overloads. Implementations shall only select the function template overload if the type named by InputIterator is not path_format_t.

[Note: This "do-the-right-thing" rule ensures that the overload expected by the user is selected. The implementation technique is unspecified - implementations may use enable_if or other techniques to achieve the effect. -- end note]

basic_path constructors

basic_path();

Postconditions: empty().

basic_path(const string_type& s, path_format_t=portable);
basic_path(const value_type * s, path_format_t=portable);
template <class InputIterator>
  basic_path(InputIterator s, InputIterator last, path_format_t=portable);

Remarks: The format of string s and sequence [first,last) is described in Pathname formats.

Effects: The path elements in string s or sequence [first,last) are stored.

basic_path assignments

basic_path& operator=(const string_type& s);
basic_path& operator=(const value_type* s);
template <class InputIterator>
  basic_path& assign(InputIterator first, InputIterator last, path_format_t=portable);

Remarks: The format of string s and sequence [first,last) is described in Pathname formats.

Effects: The path elements in string s or sequence [first,last) are stored.

Returns: *this

basic_path comparisons

[Note: Path equality and path equivalence have different semantics.

Equality is determined by basic_path's operator==, which considers the two path's lexical representations only. Paths "abc" and "ABC" are never equal.

Equivalence is determined by the equivalent() non-member function, which determines if two paths resolve to the same file system entity. Paths "abc" and "ABC" may or may not resolve to the same file, depending on the file system. 

Programmers wishing to determine if two paths are "the same" must decide if "the same" means "the same representation" or "resolve to the same actual file", and choose the appropriate function accordingly. -- end note]

bool operator<(const basic_path& that) const;

Returns: std::lexicographical_compare(begin(), end(), that.begin(), that.end())

[Note: Relational operators ease specifying paths as keys in associative containers. Lexicographical comparison is specified because although not full-fledged containers, paths are enough like containers to merit meeting container comparison requirements (23.1 table 65). -- end note]

bool operator==(const basic_path& that) const;

Returns: !(*this < that) && !(that < *this)

bool operator!=(const basic_path& that) const;

Returns: !(*this == that)

bool operator>(const basic_path& that) const;

Returns: that < *this

bool operator<=(const basic_path& that) const;

Returns: !(that < *this)

bool operator>=(const basic_path& that) const;

Returns: !(*this < that)

basic_path modifiers

basic_path& operator/=(const basic_path& rhs);

Effects: The path stored in rhs is appended to the stored path.

Returns: *this

basic_path& operator/=(const string_type& s);
basic_path& operator/=(const value_type* s);
template <class InputIterator>
basic_path& append(InputIterator first, InputIterator last, path_format_t=portable);

Remarks: The format of string s and sequence [first,last) is described in Pathname formats.

Effects: The path elements in string s or sequence [first,last) are appended to the stored path.

Returns: *this

basic_path& remove_leaf();

Effects: If has_branch_path() then remove the last filename from the stored path. If that leaves the stored path with one or more trailing slash elements not representing  root-directory, remove them.

Returns: *this

[Note: This function is needed to efficiently implement basic_directory_iterator. It is made public to allow additional uses. -- end note]

basic_path observers

See the Path decomposition table for examples for values returned by decomposition functions.

const string_type string() const;

Returns: The stored path, formatted according to the Pathname grammar rules.

const string_type file_string() const;

Returns: The stored path, formatted according to the operating system rules for regular file pathnames, with any Filename conversion applied.

[Note: For some operating systems, including POSIX and Windows, the native format for regular file pathnames and directory pathnames is the same, so file_string() and directory_string() return the same string. On OpenMVS, however, the expression path("/cats/jane").file_string() would return the string "[CATS]JANE" while path("/cats/jane").directory_string() would return the string "[CATS.JANE]". -- end note]

const string_type directory_string() const;

Returns: The stored path, formatted according to the operating system rules for directory pathnames, with any Filename conversion applied.

const external_string_type external_file_string() const;

Returns: The stored path, formatted according to the operating system rules for regular file pathnames, with any Filename conversion applied, and encoded by the Traits::to_external conversion function.

const external_string_type external_directory_string() const;

Returns: The stored path, formatted according to the operating system rules for directory pathnames, with any Filename conversion applied, and encoded by the Traits::to_external conversion function.

string_type root_name() const;

Returns: root-name, if the stored path includes root-name, otherwise string_type().

string_type root_directory() const;

Returns: root-directory, if the stored path includes root-directory, otherwise string_type().

If root-directory is composed slash name, slash is excluded from the returned string.

basic_path root_path() const;

Returns: root_name() / root_directory()

basic_path relative_path() const;

Returns: A basic_path composed from the the stored path, if any, beginning with the first filename after root-path. Otherwise, an empty basic_path.

string_type leaf() const;

Returns: empty() ? string_type() : *--end()

basic_path branch_path() const;

Returns: (string().empty() || begin() == --end()) ? path_type("") : br, where br is constructed as if by starting with an empty basic_path and successively applying operator/= for each element in the range begin(), --end().

bool empty() const;

Returns: string().empty().

bool is_complete() const;

Returns: true, if the elements of root_path() uniquely identify a directory, else false.

bool has_root_path() const;

Returns: !root_path().empty()

bool has_root_name() const;

Returns: !root_name().empty()

bool has_root_directory() const;

Returns: !root_directory().empty()

bool has_relative_path() const;

Returns: !relative_path().empty()

bool has_leaf() const;

Returns: !leaf().empty()

bool has_branch_path() const;

Returns: !branch_path().empty()

basic_path iterators

A basic_path::iterator is a constant iterator satisfying all the requirements of a bidirectional iterator (C++ Std, 24.1.4 Bidirectional iterators [lib.bidirectional.iterators]). Its value_type is string_type.

Calling any non-const member function of a basic_path object invalidates all iterators referring to elements of the object.

The forward traversal order is as follows:

The backward traversal order is the reverse of forward traversal.

iterator begin() const;

Returns: An iterator for the first present element in the traversal list above. If no elements are present, the end iterator.

iterator end() const;

Returns: The end iterator.

basic_path operators

basic_path operator /(const basic_path& rhs) const;
basic_path operator /(const string_type& s) const;
basic_path operator /(const value_type* s) const;
template <class InputIterator>
basic_path concat(InputIterator first, InputIterator last, path_format_t=portable);

Remarks: The format of string s and sequence [first,last) is described in Pathname formats.

Returns: basic_path(*this) with rhs, s, or [first,last) appended, as if by operator/= or append.

Class template basic_filesystem_error

namespace std
{
  namespace tr2
  {
    namespace sys
    {
      template <class Path> class basic_filesystem_error : public std::runtime_error
      {
      public:
        typedef Path path_type;

        explicit basic_filesystem_error(const std::string& msg, system_error_type ec=0);
        basic_filesystem_error(const std::string& msg, const path_type& p1, system_error_type ec);
        basic_filesystem_error(const std::string& msg, const path_type& p1, const path_type& p2, system_error_type ec);
        basic_filesystem_error(const basic_filesystem_error& bfe);
        basic_filesystem_error& operator=(const basic_filesystem_error& bfe);
       ~basic_filesystem_error();

        const std::string& message() const;
        system_error_type system_error() const;
        const path_type& path1() const;
        const path_type& path2() const;
      };

    } // namespace sys
  } // namespace tr2
} // namespace std

The class template basic_filesystem_error defines the type of objects thrown as exceptions to report file system errors from functions described in this clause.

basic_filesystem_error constructors

explicit basic_filesystem_error(const std::string& msg, system_error_type ec=0);

Postconditions:

Expression Value
message() Reference to stored copy of msg
system_error() ec
path1().empty() true
path2().empty() true
basic_filesystem_error(const std::string& msg, const path_type& p1, system_error_type ec);

Postconditions:

Expression Value
message() Reference to stored copy of  msg
system_error() ec
path1() Reference to stored copy of p1
path2().empty() true
basic_filesystem_error(const std::string& msg, const path_type& p1, const path_type& p2, system_error_type ec);

Postconditions:

Expression Value
message() Reference to stored copy of  msg
system_error() ec
path1() Reference to stored copy of p1
path2() Reference to stored copy of p2

basic_filesystem_error observers

const std::string& message() const;

Returns: Reference to copy of  msg stored by the constructor, or, if none, an empty string.

system_error_type system_error() const;

Returns: The value of ec stored by the constructor.

const path_type& path1() const;

Returns: Reference to copy of p1 stored by the constructor, or, if none, an empty path.

const path_type& path2() const;

Returns: Reference to copy of p2 stored by the constructor, or, if none, an empty path.

Class template basic_directory_entry

namespace std
{
  namespace tr2
  {
    namespace sys
    {
      template <class Path> class basic_directory_entry
      {
      public:
        typedef Path path_type;
        typedef typename Path::string_type string_type;

        // constructors
        basic_directory_entry();
        explicit basic_directory_entry(const path_type& p, status_flags sf=0, status_flags symlink_sf=0);

        // modifiers
        void assign(const path_type& p, status_flags sf=0, status_flags symlink_sf=0);
        void replace_leaf(const string_type& s, status_flags sf=0, status_flags symlink_sf=0);

        // observers
        const Path& path() const;
        operator const Path&() const;

        status_flags  status(system_error_type* ec=0) const;
        status_flags  status(const symlink_t&, system_error_type* ec=0) const;

        bool exists() const;
        bool is_directory() const;
        bool is_regular() const;
        bool is_other() const;
        bool is_symlink() const;

        // comparisons
        bool operator<(const basic_directory_entry<Path>& rhs);
        bool operator==(const basic_directory_entry<Path>& rhs);
        bool operator!=(const basic_directory_entry<Path>& rhs);
        bool operator>(const basic_directory_entry<Path>& rhs);
        bool operator<=(const basic_directory_entry<Path>& rhs);
        bool operator>=(const basic_directory_entry<Path>& rhs);

      private:
        path_type             m_path;           // for exposition only
        mutable status_flags  m_status;         // for exposition only; stat()-like
        mutable status_flags  m_symlink_status; // for exposition only; lstat()-like
      };

    } // namespace sys
  } // namespace tr2
} // namespace std

A basic_directory_entry object stores a basic_path object, a status_flags object for non-symbolic link status, and a status_flags object for symbolic link status. The status_flags objects act as value caches.

[Note: Because status() may be a very expensive operation, caching of status flags can result is significant time savings. Cached and non-cached results may differ in the presence of race conditions. -- end note]

Actual cold-boot timing of iteration over a directory with 15,047 entries was six seconds for non-cached status queries versus one second for cached status queries. Windows XP, 3.0 GHz processor, with a moderately fast hard-drive. Similar speedup expected on Linux and BSD-derived Unix variants that provide status during directory iteration.

basic_directory_entry constructors

basic_directory_entry();

Postconditions:

Expression Value
path().empty() true
status() 0
status(symlink) 0
explicit basic_directory_entry(const path_type& p, status_flags sf=0, status_flags symlink_sf=0);

Postconditions:

Expression Value
path() p
status() sf
status(symlink) symlink_sf

basic_directory_entry modifiers

void assign(const path_type& p, status_flags sf=0, status_flags symlink_sf=0);

Postconditions:

Expression Value
path() p
status() sf
status(symlink) symlink_sf
void replace_leaf(const string_type& s, status_flags sf=0, status_flags symlink_sf=0);

Postconditions:

Expression Value
path() path().branch() / s
status() sf
status(symlink) symlink_sf

basic_directory_entry observers

const Path& path() const;
operator const Path&() const;

Returns: m_path

status_flags status(system_error_type* ec=0) const;

Effects: if m_status is zero, set m_status to sys::status(ec)

Returns: m_status

status_flags status(const symlink_t&, system_error_type* ec=0) const;

Effects: if m_symlink_status is zero, set m_symlink_status to sys::status(symlink, ec)

Returns: m_symlink_status

bool exists() const;

Returns: this->status() != not_found_flag

bool is_directory() const;

Returns: (this->status() & directory_flag) != 0

bool is_regular() const;

Returns: (this->status() & regular_flag) != 0

bool is_other() const;

Returns: (this->status() & other_flag) != 0

bool is_symlink() const;

Returns: (this->symlink_status() & symlink_flag) != 0

basic_directory_entry comparisons

bool operator<(const basic_directory_entry<Path>& rhs);

Returns: path()<rhs.path()

bool operator==(const basic_directory_entry<Path>& rhs);

Returns: path()==rhs.path()

bool operator!=(const basic_directory_entry<Path>& rhs);

Returns: path()!=rhs.path()

bool operator>(const basic_directory_entry<Path>& rhs);

Returns: path()>rhs.path()

bool operator<=(const basic_directory_entry<Path>& rhs);

Returns: path()<=rhs.path()

bool operator>=(const basic_directory_entry<Path>& rhs);

Returns: path()>=rhs.path()

Class template basic_directory_iterator

namespace std
{
  namespace tr2
  {
    namespace sys
    {
      template <class Path>
      class basic_directory_iterator :
        public iterator<input_iterator_tag, basic_directory_entry<Path> >
      {
      public:
        typedef Path path_type;

        // constructors
        basic_directory_iterator();
        explicit basic_directory_iterator(const Path& dp);
        basic_directory_iterator(const basic_directory_iterator& bdi);
        basic_directory_iterator& operator=(const basic_directory_iterator& bdi);
       ~basic_directory_iterator();

        // other members as required by
        //  C++ Std, 24.1.1 Input iterators [lib.input.iterators]
      };

    } // namespace sys
  } // namespace tr2
} // namespace std

basic_directory_iterator satisfies the requirements of an input iterator (C++ Std, 24.1.1, Input iterators [lib.input.iterators]).

A basic_directory_iterator reads successive elements from the directory for which it was constructed, as if by calling POSIX readdir_r(). After a basic_directory_iterator is constructed, and every time operator++ is called, it reads and stores a value of basic_directory_entry<Path> and possibly stores associated status values. operator++ is not equality preserving; that is, i == j does not imply that ++i == ++j.

[Note: The practical consequence of not preserving equality is that directory iterators can be used only for single-pass algorithms. --end note]

If the end of the directory elements is reached, the iterator becomes equal to the end iterator value. The constructor basic_directory_iterator() with no arguments always constructs an end iterator object, which is the only legitimate iterator to be used for the end condition. The result of operator* on an end iterator is not defined. For any other iterator value a const basic_directory_entry<Path>& is returned. The result of operator-> on an end iterator is not defined. For any other iterator value a const basic_directory_entry<Path>* is returned.

Two end iterators are always equal. An end iterator is not equal to a non-end iterator.

The above wording is based on the Standard Library's istream_iterator wording. Commentary was shortened and moved into a note.

The result of calling the path() member of the basic_directory_entry object obtained by dereferencing a basic_directory_iterator is a reference to a basic_path object composed of the directory argument from which the iterator was constructed with filename of the directory entry appended as if by operator/=.

[Example: This program accepts an optional command line argument, and if that argument is a directory pathname, iterates over the contents of the directory. For each directory entry, the name is output, and if the entry is for a regular file, the size of the file is output.

#include <iostream>
#include <filesystem>

using std::tr2::sys;
using std::cout;

int main(int argc, char* argv[])
{
  std::string p(argc <= 1 ? "." : argv[1]);

  if (is_directory(p))
  {
    for (directory_iterator itr(p); itr!=directory_iterator(); ++itr)
    {
      cout << itr->path().leaf() << ' '; // display filename only
      if (itr->is_regular_file()) cout << " [" << file_size(itr->path()) << ']';
      cout << '\n';
    }
  }
  else cout << (exists(p) : "Found: " : "Not found: ") << p << '\n';

  return 0;
}

-- end example]

Directory iteration shall not yield directory entries for the current (dot) and parent (dot dot) directories.

The order of directory entries obtained by dereferencing successive increments of a basic_directory_iterator is unspecified.

[Note: Programs performing directory iteration may wish to test if the path obtained by dereferencing a directory iterator actually exists. It could be a symbolic link to a non-existent file. Programs recursively walking directory trees for purposes of removing and renaming entries may wish to avoid following symbolic links.

If a file is removed from or added to a directory after the construction of a basic_directory_iterator for the directory, it is unspecified whether or not subsequent incrementing of the iterator will ever result in an iterator whose value is the removed or added directory entry. See POSIX readdir_r(). --end note]

basic_directory_iterator constructors

basic_directory_iterator();

Effects: Constructs the end iterator.

explicit basic_directory_iterator( const Path & dp );

Effects: Constructs a iterator with a value representing the first entry in the directory resolved to by dp, or, if the directory is empty, the end iterator value.

[Note: To iterate over the current directory, write directory_iterator(".") rather than directory_iterator(""). -- end note]

Class template basic_recursive_directory_iterator

namespace std
{
  namespace tr2
  {
    namespace sys
    {
      template <class Path>
      class basic_recursive_directory_iterator :
        public iterator<input_iterator_tag, basic_directory_entry<Path> >
      {
      public:
        typedef Path path_type;

        // constructors
        basic_recursive_directory_iterator();
        explicit basic_recursive_directory_iterator(const Path& dp);
        basic_recursive_directory_iterator(const basic_recursive_directory_iterator& brdi);
        basic_recursive_directory_iterator& operator=(const basic_recursive_directory_iterator& brdi);
       ~basic_recursive_directory_iterator();

        // observers
        int level() const;

        // modifiers
        void pop();
        void no_push();

        // other members as required by
        //  C++ Std, 24.1.1 Input iterators [lib.input.iterators]

      private:
        int m_level; // for exposition only
      };

    } // namespace sys
  } // namespace tr2
} // namespace std

The behavior of a basic_recursive_directory_iterator is the same as a basic_directory_iterator unless otherwise specified.

[Note: One of the uses of no_push() is to prevent unwanted recursion into symlinked directories. This may be necessary to prevent loops on some operating systems. -- end note]

Non-member operational functions

Status functions

template <class Path> status_flags status(const Path& p, system_error_type* ec=0);
template <class Path> status_flags status(const Path& p, const symlink_t&, system_error_type* ec=0);

Returns:

If p.empty(): If the symlink_t argument is not present, determine the attributes of p as if by POSIX stat(), else determine the attributes as if by POSIX lstat().

[Note: For symbolic links, stat() continues pathname resolution using the contents of the symbolic link, lstat() does not. -- end note]

If the attribute determination reports an error:

Otherwise:

[Note: directory_flag implies basic_directory_iterator on the file would succeed, and regular_flag implies appropriate <fstream> operations would succeed, assuming no hardware, permission, access errors, or no race conditions. For regular_flag, the converse is not true; lack of regular_flag does not necessarily imply <fstream> operations would fail on a directory or other file. -- end note]

Predicate functions

template <class Path> bool exists(const Path& p);

Effects: Determines status_flags sf, as if by status(p).

Throws: basic_filesystem_error<Path> if sf == error_flag.

Returns: sf != not_found_flag

template <class Path> bool is_directory(const Path& p);

Effects: Determines status_flags sf, as if by status(p).

Throws: basic_filesystem_error<Path> if sf == error_flag.

Returns: (sf & directory_flag) != 0

template <class Path> bool is_regular(const Path& p);

Effects: Determines status_flags sf, as if by status(p).

Throws: basic_filesystem_error<Path> if sf == error_flag.

Returns: (sf & regular_flag) != 0

template <class Path> bool is_other(const Path& p);

Effects: Determines status_flags sf, as if by status(p).

Throws: basic_filesystem_error<Path> if sf == error_flag.

Returns: (sf & other_flag) != 0

template <class Path> bool is_symlink(const Path& p);

Effects: Determines status_flags sf, as if by status(p, symlink).

Throws: basic_filesystem_error<Path> if sf == error_flag.

Returns: (sf & symlink_flag) != 0

template <class Path> bool empty(const Path& p);

Effects: Determines status_flags sf, as if by status(p).

Throws: basic_filesystem_error<Path> if sf == error_flag || sf == not_found_flag || sf == other_flag.

Returns: (sf & directory_flag) != 0
          ? basic_directory_iterator<Path>(p) == basic_directory_iterator<Path>(p)
          : file_size(p) == 0;

template <class Path1, class Path2> bool equivalent(const Path1& p1, const Path2& p2);

Requires: Path1::external_string_type and Path2::external_string_type are the same type.

Effects: Determines status_flags sf1 and sf2, as if by status(p1) and  status(p2), respectively.
Then, throws if sf1 == error_flag ||sf2 == error_flag || (sf1 == not_found_flag && sf2 == not_found_flag) || (sf1 == other_flag && sf2 == other_flag).

Throws: basic_filesystem_error<Path1>

Returns: true, if sf1 == sf2 and p1 and p2 resolve to the same file system entity, else false.

Two paths are considered to resolve to the same file system entity if two candidate entities reside on the same device at the same location. This is determined as if by the values of the POSIX stat structure, obtained as if by stat() for the two paths, having equal st_dev values and equal st_ino values.

[Note: POSIX requires that "st_dev must be unique within a Local Area Network". Conservative POSIX implementations may also wish to check for equal st_size and st_mtime values. Windows implementations may use GetFileInformationByHandle() as a surrogate for stat(), and consider "same" to be equal values for dwVolumeSerialNumber, nFileIndexHigh, nFileIndexLow, nFileSizeHigh, nFileSizeLow, ftLastWriteTime.dwLowDateTime, and ftLastWriteTime.dwHighDateTime. -- end note]

Attribute functions

[Note: A strictly limited number of attribute functions are provided because few file system attributes are even somewhat portable. Even the functions provided will be impossible to implement on some file systems. --end note.]

template <class Path> const Path& initial_path();

Returns: current_path() at the time of entry to main().

[Note: These semantics turn a dangerous global variable into a safer global constant. --end note]

[Note: Full implementation requires runtime library support. Implementations which cannot provide runtime library support are encouraged to instead store the value of current_path() at the first call of initial_path(), and return this value for all subsequent calls. Programs using initial_path() are encouraged to call it immediately on entrance to main() so that they will work correctly with such partial implementations. --end note]

template <class Path> Path current_path();

Returns: The current path, as if by POSIX getcwd().

Postcondition: current_path().is_complete()

[Note: The current path as returned by many operating systems is a dangerous global variable. It may be changed unexpectedly by a third-party or system library functions, or by another thread. Although dangerous, the function is useful in dealing with other libraries.. For a safer alternative, see initial_path(). The current_path() name was chosen to emphasize that the return is a complete path, not just a single directory name. -- end note]

template <class Path> intmax_t file_size(const Path& p);

Returns: The size in bytes of the file p resolves to, determined as if by the value of the POSIX stat structure member st_size obtained as if by POSIX stat().

template <class Path> std::time_t last_write_time(const Path& p);

Returns: The time of last data modification of p, determined as if by the value of the POSIX stat structure member st_mtime  obtained as if by POSIX stat().

template <class Path> void last_write_time(const Path& p, const std::time_t new_time);

Effects: Sets the time of last data modification of the file resolved to by p to new_time, as if by POSIX stat() followed by POSIX utime().

[Note: The apparent postcondition last_write_time(p) == new_time is not specified since it would not hold for many file systems due to coarse time mechanism granularity. -- end note]

Other operations functions

template <class Path> bool create_directory(const Path& dp);

Requires: !dp.empty()

Effects: Attempts to create the directory dp resolves to, as if by POSIX mkdir() with a second argument of S_IRWXU|S_IRWXG|S_IRWXO.

Throws: basic_filesystem_error<Path> if Effects fails for any reason other than because the directory already exists.

Returns: True if a new directory was created, otherwise false.

Postcondition: is_directory(dp)

template <class Path1, class Path2>
  void create_hard_link(const Path1& old_p, const Path2& new_p);

Requires: Path1::external_string_type and Path2::external_string_type are the same type. !old_p.empty() && !new_p.empty()

Effects: Establishes the postcondition, as if by POSIX link().

Postcondition:

[Note: Many operating systems do not support hard links or support them only for regular files. Some operating systems limit the number of links per file to a fairly small value - 1023 on Windows NTFS, for example. Operating systems cannot support hard links on file systems that do not support them - the FAT system used on floppy discs, memory cards and flash drives, is a common example. Thus hard links should be avoided if wide portability is a concern. -- end note]

template <class Path> bool remove(const Path& p);

Precondition: !p.empty()

Effects:  Attempts to delete the file p resolves to, as if by POSIX remove().

Returns: The value of exists(p) prior to the establishment of the postcondition.

Postcondition: !exists(p)

Throws: basic_filesystem_error<Path> if:

[Note: A symbolic link is itself removed, rather than what it resolves to being removed. -- end note]

template <class Path1, class Path2> void rename(const Path1& from_p, const Path2& to_p);

Requires: Path1::external_string_type and Path2::external_string_type are the same type. !from_p.empty() && !to_p.empty()

Effects: Renames from_p to to_p, as if by POSIX rename().

Postconditions: !exists(from_p) && exists(to_p), and the contents and attributes of the file originally named from_p are otherwise unchanged.

[Note: If from_p and to_p resolve to the same file, no action is taken. Otherwise, if to_p resolves to an existing file, it is removed. A symbolic link is itself renamed, rather than the file it resolves to being renamed. -- end note]

template <class Path1, class Path2> void copy_file(const Path1& from_fp, const Path2& to_fp);

Requires: Path1::external_string_type and Path2::external_string_type are the same type. !from_fp.empty() && !to_fp.empty()

Effects: The contents and attributes of the file from_fp resolves to are copied to the file to_fp resolves to.

Throws: basic_filesystem_error<Path> if from_fp.empty() || to_fp.empty() ||!exists(from_fp) || !is_regular(from_fp) || exists(to_fp)

template <class Path> Path complete(const Path& p, const Path& base=initial_path<Path>());

Requires: base.is_complete() && (p.is_complete() || !p.has_root_name())

Effects: Composes a complete path from p and base, using the following rules:

  p.has_root_directory() !p.has_root_directory()
p.has_root_name() p precondition failure
!p.has_root_name() base.root_name()
/ p
base / p

Returns: The composed path.

Postcondition: For the returned path, rp, rp.is_complete() is true.

Throws: On precondition failure (see clause introduction).

[Note: When portable behavior is required, use complete(). When operating system dependent behavior is required, use system_complete().

Portable behavior is useful when dealing with paths created internally within a program, particularly if the program should exhibit the same behavior on all operating systems.

Operating system dependent behavior is useful when dealing with paths supplied by user input, reported to program users, or when such behavior is expected by program users. -- end note]

template <class Path> Path system_complete(const Path& p);

Requires: !p.empty()

Effects: Composes a complete path from p, using the same rules used by the operating system to resolve a path passed as the filename argument to standard library open functions.

Returns: The composed path.

Postcondition: For the returned path, rp, rp.is_complete() is true.

Throws: On precondition failure (see clause introduction).

[Note: For POSIX, system_complete(p) has the same semantics as complete(p, current_path()).

For Widows, system_complete(p) has the same semantics as complete(ph, current_path()) if p.is_complete() || !p.has_root_name() or p and base have the same root_name(). Otherwise it acts like complete(p, kinky), where kinky is the current directory for the p.root_name() drive. This will be the current directory of that drive the last time it was set, and thus may be residue left over from a prior program run by the command processor! Although these semantics are often useful, they are also very error-prone.

See complete() note for usage suggestions. -- end note]

errno_type to_errno( system_error_type code );

Returns: The value of the errno error number which corresponds to the operating system's error code code. The exact correspondence is implementation defined. Implementations are only required to support error codes reported by basic_filesystem_error exceptions thrown by functions defined in this clause.

void system_message( system_error_type ec, std::string & target );
void system_message( system_error_type ec, std::wstring & target );

Effects: Appends a message corresponding to ec to target.

[Note: Implementations are encouraged to supply a localized message. -- end note]

Convenience functions

template <class Path> bool create_directories(const Path & p);

Requires: p.empty() ||
forall px: px == p || is_parent(px, p): is_directory(px) || !exists( px )

Returns: The value of !exists(p) prior to the establishment of the postcondition.

Postcondition: is_directory(p)

Throws:  basic_filesystem_error<Path> if exists(p) && !is_directory(p)

template <class Path> typename Path::string_type extension(const Path & p);

Returns: if p.leaf() contains a dot, returns the substring of p.leaf() starting at the rightmost dot and ending at the string's end. Otherwise, returns an empty string.

[Note: The dot is included in the return value so that it is possible to distinguish between no extension and an empty extension.

Implementations are permitted but not required to define additional behavior for file systems which append additional elements to extensions, such as alternate data stream or partitioned dataset names. -- end note]

template <class Path> typename Path::string_type basename(const Path & p);

Returns: if p.leaf() contains a dot, returns the substring of p.leaf() starting at its beginning and ending at the last dot (the dot is not included). Otherwise, returns p.leaf().

template <class Path>
  Path replace_extension(const Path & p, const typename Path::string_type & new_extension);

Postcondition: basename(return_value) == basename(p) && extension(return_value) == new_extension

[Note: It follows from the semantics of extension() that new_extension should include dot to achieve reasonable results. -- end note]

Additions to header <cerrno>

The header <cerrno> shall include an additional symbolic constant macro for each of the values returned by the to_errno function. The macro names shall be as defined in POSIX errno.h, with the additions below.

This codifies existing practice. The required names are only a sub-set of those defined by POSIX, and are usually already supplied in <errno.h> (as wrapped by <cerrno>) as shipped with POSIX and Windows compilers. These implementations require no changes to their underlying C headers to conform with the above requirement.

Name Meaning
EBADHANDLE Bad operating system handle.
EOTHER Other error.

Additions to header <fstream>

These additions have been carefully specified to avoid breaking existing code in common operating environments such as POSIX, Windows, and OpenVMS. See Suggestions for <fstream> implementations for techniques to avoid breaking existing code in other environments, particularly on operating systems allowing slashes in filenames.

[Note: The "do-the-right-thing" rule from Requirements on implementations does apply to header <fstream>.

The overloads below are specified as additions rather than replacements for existing functions. This preserves existing code (perhaps using a home-grown path class) that relies on an automatic conversion to const char*. -- end note]

In 27.8.1.1 Class template basic_filebuf [lib.filebuf] synopsis preceding paragraph 1, add the function:

template <class Path> basic_filebuf<charT,traits>* open(const Path& p, ios_base::openmode mode);

In 27.8.1.3 Member functions [lib.filebuf.members], add the above to the signature preceding paragraph 2, and replace the sentence:

It then opens a file, if possible, whose name is the NTBS s (“as if” by calling std::fopen(s ,modstr )).

with:

It then opens, if possible, the file that p or path(s) resolves to, “as if” by calling std::fopen() with a second argument of modstr.

In 27.8.1.5 Class template basic_ifstream [lib.ifstream] synopsis preceding paragraph 1, add the functions:

template <class Path> explicit basic_ifstream(const Path& p, ios_base::openmode mode = ios_base::in);
template <class Path> void open(const Path& p, ios_base::openmode mode = ios_base::in);

In 27.8.1.6 basic_ifstream constructors [lib.ifstream.cons] add the above constructor to the signature preceding paragraph 2, and in paragraph 2 replace

rdbuf()->open(s, mode | ios_base::in)

with

rdbuf()->open(path(s), mode | ios_base::in) or rdbuf()->open(p, mode | ios_base::in) as appropriate

In 27.8.1.7 Member functions [lib.ifstream.members] add the above open function to the signature preceding paragraph 3, and in paragraph 3 replace

rdbuf()->open(s, mode | ios_base::in)

with

rdbuf()->open(path(s), mode | ios_base::in) or rdbuf()->open(p, mode | ios_base::in) as appropriate

In 27.8.1.8 Class template basic_ofstream [lib.ofstream] synopsis preceding paragraph 1, add the functions:

template <class Path> explicit basic_ofstream(const Path& p, ios_base::openmode mode = ios_base::out);
template <class Path> void open(const Path& p, ios_base::openmode mode = ios_base::out);

In 27.8.1.9 basic_ofstream constructors [lib.ofstream.cons] add the above constructor to the signature preceding paragraph 2, and in paragraph 2 replace

rdbuf()->open(s, mode | ios_base::out)

with

rdbuf()->open(path(s), mode | ios_base::out) or rdbuf()->open(p, mode | ios_base::out) as appropriate

In 27.8.1.10 Member functions [lib.ofstream.members] add the above open function to the signature preceding paragraph 3, and in paragraph 3 replace

rdbuf()->open(s, mode | ios_base::out)

with

rdbuf()->open(path(s), mode | ios_base::out) or rdbuf()->open(p, mode | ios_base::out) as appropriate

In 27.8.1.11 Class template basic_fstream [lib.fstream] synopsis preceding paragraph 1, add the functions:

template <class Path> explicit basic_fstream(const Path& p, ios_base::openmode mode = ios_base::in|ios_base::out);
template <class Path> void open(const Path& p, ios_base::openmode mode = ios_base::in|ios_base::out);

In 27.8.1.12 basic_fstream constructors [lib.fstream.cons] add the above constructor to the signature preceding paragraph 2, and in paragraph 2 replace

rdbuf()->open(s, mode)

with

rdbuf()->open(path(s), mode) or rdbuf()->open(p, mode) as appropriate

In 27.8.1.13 Member functions [lib.fstream.members] add the above open function to the signature preceding paragraph 3, and in paragraph 3 replace

rdbuf()->open(s, mode)

with

rdbuf()->open(path(s), mode) or rdbuf()->open(p, mode) as appropriate

End of proposed text.

Path decomposition table

The table is generated by a program compiled with the Boost implementation.

Shaded entries indicate cases where POSIX and Windows implementations yield different results. The top value is the POSIX result and the bottom value is the Windows result.
 

Constructor
argument
Elements found
by iteration
string() file_
string()
root_
path()
.string()
root_
name()
root_
directory()
relative_
path()
.string()
branch_
path()
.string()
leaf()
"" "" "" "" "" "" "" "" "" ""
"." "." "." "." "" "" "" "." "" "."
".." ".." ".." ".." "" "" "" ".." "" ".."
"foo" "foo" "foo" "foo" "" "" "" "foo" "" "foo"
"/" "/" "/" "/"
"\"
"/" "" "/" "" "" "/"
"/foo" "/","foo" "/foo" "/foo"
"\foo"
"/" "" "/" "foo" "/" "foo"
"foo/" "foo","." "foo/" "foo/"
"foo\"
"" "" "" "foo/" "foo" "."
"/foo/" "/","foo","." "/foo/" "/foo/"
"\foo\"
"/" "" "/" "foo/" "/foo" "."
"foo/bar" "foo","bar" "foo/bar" "foo/bar"
"foo\bar"
"" "" "" "foo/bar" "foo" "bar"
"/foo/bar" "/","foo","bar" "/foo/bar" "/foo/bar"
"\foo\bar"
"/" "" "/" "foo/bar" "/foo" "bar"
"///foo///" "/","foo","." "///foo///" "///foo///"
"\foo\\\"
"/" "" "/" "foo///" "///foo" "."
"///foo///bar" "/","foo","bar" "///foo///bar" "///foo///bar"
"\foo\\\bar"
"/" "" "/" "foo///bar" "///foo" "bar"
"/." "/","." "/." "/."
"\."
"/" "" "/" "." "/" "."
"./" ".","." "./" "./"
".\"
"" "" "" "./" "." "."
"/.." "/",".." "/.." "/.."
"\.."
"/" "" "/" ".." "/" ".."
"../" "..","." "../" "../"
"..\"
"" "" "" "../" ".." "."
"foo/." "foo","." "foo/." "foo/."
"foo\."
"" "" "" "foo/." "foo" "."
"foo/.." "foo",".." "foo/.." "foo/.."
"foo\.."
"" "" "" "foo/.." "foo" ".."
"foo/./" "foo",".","." "foo/./" "foo/./"
"foo\.\"
"" "" "" "foo/./" "foo/." "."
"foo/./bar" "foo",".","bar" "foo/./bar" "foo/./bar"
"foo\.\bar"
"" "" "" "foo/./bar" "foo/." "bar"
"foo/.." "foo",".." "foo/.." "foo/.."
"foo\.."
"" "" "" "foo/.." "foo" ".."
"foo/../" "foo","..","." "foo/../" "foo/../"
"foo\..\"
"" "" "" "foo/../" "foo/.." "."
"foo/../bar" "foo","..","bar" "foo/../bar" "foo/../bar"
"foo\..\bar"
"" "" "" "foo/../bar" "foo/.." "bar"
"c:" "c:" "c:" "c:" ""
"c:"
""
"c:"
"" "c:"
""
"" "c:"
"c:/" "c:","."
"c:","/"
"c:/" "c:/"
"c:\"
""
"c:/"
""
"c:"
""
"/"
"c:/"
""
"c:" "."
"/"
"c:foo" "c:foo"
"c:","foo"
"c:foo" "c:foo" ""
"c:"
""
"c:"
"" "c:foo"
"foo"
""
"c:"
"c:foo"
"foo"
"c:/foo" "c:","foo"
"c:","/","foo"
"c:/foo" "c:/foo"
"c:\foo"
""
"c:/"
""
"c:"
""
"/"
"c:/foo"
"foo"
"c:"
"c:/"
"foo"
"c:foo/" "c:foo","."
"c:","foo","."
"c:foo/" "c:foo/"
"c:foo\"
""
"c:"
""
"c:"
"" "c:foo/"
"foo/"
"c:foo" "."
"c:/foo/" "c:","foo","."
"c:","/","foo","."
"c:/foo/" "c:/foo/"
"c:\foo\"
""
"c:/"
""
"c:"
""
"/"
"c:/foo/"
"foo/"
"c:/foo" "."
"c:/foo/bar" "c:","foo","bar"
"c:","/","foo","bar"
"c:/foo/bar" "c:/foo/bar"
"c:\foo\bar"
""
"c:/"
""
"c:"
""
"/"
"c:/foo/bar"
"foo/bar"
"c:/foo" "bar"
"prn:" "prn:" "prn:" "prn:" ""
"prn:"
""
"prn:"
"" "prn:"
""
"" "prn:"
"c:\" "c:\"
"c:","/"
"c:\"
"c:/"
"c:\" ""
"c:/"
""
"c:"
""
"/"
"c:\"
""
""
"c:"
"c:\"
"/"
"c:foo" "c:foo"
"c:","foo"
"c:foo" "c:foo" ""
"c:"
""
"c:"
"" "c:foo"
"foo"
""
"c:"
"c:foo"
"foo"
"c:\foo" "c:\foo"
"c:","/","foo"
"c:\foo"
"c:/foo"
"c:\foo" ""
"c:/"
""
"c:"
""
"/"
"c:\foo"
"foo"
""
"c:/"
"c:\foo"
"foo"
"c:foo\" "c:foo\"
"c:","foo","."
"c:foo\"
"c:foo/"
"c:foo\" ""
"c:"
""
"c:"
"" "c:foo\"
"foo/"
""
"c:foo"
"c:foo\"
"."
"c:\foo\" "c:\foo\"
"c:","/","foo","."
"c:\foo\"
"c:/foo/"
"c:\foo\" ""
"c:/"
""
"c:"
""
"/"
"c:\foo\"
"foo/"
""
"c:/foo"
"c:\foo\"
"."
"c:\foo/" "c:\foo","."
"c:","/","foo","."
"c:\foo/"
"c:/foo/"
"c:\foo/"
"c:\foo\"
""
"c:/"
""
"c:"
""
"/"
"c:\foo/"
"foo/"
"c:\foo"
"c:/foo"
"."
"c:/foo\bar" "c:","foo\bar"
"c:","/","foo","bar"
"c:/foo\bar"
"c:/foo/bar"
"c:/foo\bar"
"c:\foo\bar"
""
"c:/"
""
"c:"
""
"/"
"c:/foo\bar"
"foo/bar"
"c:"
"c:/foo"
"foo\bar"
"bar"

Suggestions for <fstream> implementations

The change in semantics to functions taking const char* arguments can break existing code, but only on operating systems where implementations don't implicitly accept native format pathnames or operating systems that allow slashes in filenames. Thus on POSIX, Windows, and OpenVMS, for example, there is no problem if the implementation follows encouraged behavior.

For most of the Filesystem Library, there is no existing code, so the issue preserving existing code that uses slashes in filenames doesn't arise. New code simply must use basic_path constructors with path_format_t arguments of native. To preserve existing fstream code that uses slashes in filenames, an implementation may wish to provide a mechanism such as a macro to control selection of the old behavior.

Implementations are already required by the TR front-matter to provide a mechanism such as a macro to control selection of the old behavior (useful to guarantee protection of existing code) or new behavior (useful in new code, and code being ported from other systems) for headers. Because use of the rest of the Filesystem Library is independent of use of the <fstream> additions, affected implementations are encouraged to allow disabling the <fstream> additions separately from other TR features.

An rejected alternative was to supply new fstream classes in namespace sys, inheriting from the current classes, overriding the constructors and opens taking pathname arguments, and providing the additional overloads. In Lillehammer LWG members indicated lack of support for this alternative, feeling that costs outweigh benefits.

Issues

1. Return type of certain basic_path members returning strings. [Howard Hinnant]

For member functions described as returning "const string_type" or "const external_string_type", implementations are permitted to return "const string_type&" or  "const external_string_type&" respectively.

This allows implementations to avoid unnecessary copies. Return-by-value is specified as const to ensure programs won't break if moved to a return-by-reference implementation.

For example, the Boost implementation keeps the internal representation of a pathname in the portable format, so string() returns by reference and is inlined:

const string_type & string() const { return m_path; }

Howard Hinnant comments: This may inhibit optimization if rvalue reference is accepted.  Const-qualified return types can't be moved from.  I'd rather see either the return type specified as const string_type& or string_type.

Beman Dawes comments: I can't make up my mind. Removing the const will bite users, but not very often. OTOH, excessive copying is a real concern, and if move semantics can alleviate that, I'm all for it. What does the LWG think?

2. Basic_path canonize() and normalize() removed. [Beman Dawes]

The Boost implementation has basic_path functions canonize() and normalize() which return cleaned up string representations of a pathname. They have been removed from the proposal as messy to specify and implement, not hugely useful, and possible to implement by users as non-member functions without any loss of functionality or efficiency. There was also a concern the proposal was getting a bit large.

These functions can be added later as convenience functions if the LWG so desires..

3. Filename checking functions. [Beman Dawes]

Boost has a set of predicate functions that determine if a filename is valid for a particular operating or system. These can be used as building blocks for functions that determine if an entire pathname is valid for a particular operating or file system.

Users can use these functions to ensure that pathnames are in fact portable to target operating or file systems, without having to actually test on the target systems.

These functions are not included in the proposal because of lack of time, and uncertainty as to their fit with the Standard Library. They can be added later if the LWG so desires.

4. Request for operation to determine available disk space. [Beman Dawes]

There have been requests from two Boost users (Steve Hartmann, Thomas Matelich) for a function to return available disk space. For POSIX and Windows, this looks both useful and trivial to implement, but I'm reluctant to propose an untested operational function. My intent is to propose it later as an addition, assuming a trial implementation turns up no showstoppers.

Acknowledgements

This Filesystem Library is dedicated to my wife, Sonda, who provided the support necessary to see both a trial implementation and the proposal itself through to completion. She gave me the strength to continue after a difficult year of cancer treatment in the middle of it all.

Many people contributed technical comments, ideas, and suggestions to the Boost Filesystem Library. See http://www.boost.org/libs/filesystem/doc/index.htm#Acknowledgements.

Dietmar Kühl contributed the original Boost Filesystem Library directory_iterator design. Peter Dimov, Walter Landry, Rob Stewart, and Thomas Witt were particularly helpful in refining the library.

The create_directories, extension, basename, and replace_extension functions were developed by Vladimir Prus.

Howard Hinnant and John Maddock reviewed a draft of the proposal, and identified a number of mistakes or weaknesses, resulting in a more polished final document.

References

[ISO-POSIX] ISO/IEC 9945:2003, IEEE Std 1003.1-2001, and The Open Group Base Specifications, Issue 6. Also known as The Single Unix® Specification, Version 3. Available from each of the organizations involved in its creation. For example, read online or download from www.unix.org/single_unix_specification/. The ISO JTC1/SC22/WG15 - POSIX homepage is www.open-std.org/jtc1/sc22/WG15/
[Abrahams] Dave Abrahams, Error and Exception Handling, www.boost.org/more/error_handling.html

© Copyright Beman Dawes, 2002-2005

Revised 2005-08-23