L i s t D i r e c t o r i e s : I m p l e m e n t a t i o n s 1.x
-
Problem Description (click here)
-
All implementations for searching directories named List_Dir_1x base on a recursion pattern, as Jugal Kalita described in
On Perl, for Students and Professionals (Chapter 6.3).
Following ZIP-Archives - Perl code, use case - are available for download from the Perl Project Sources table:
Nr. Download Ref. Short Description 1 List_Dir_1_1 Version 1.1: Initial version 1.1 is merely a proof of concept.
Keywords: Recursive tree-search, ALGO 1, ALGO 2, All Files Found Listed.
For more information, see chapter List_Dir_1_1.2 List_Dir_1_3 Version 1.3: Extended version 1.3 is recommended for practical use.
Keywords: Recursive tree-search, ALGO 1, File Filter, File Counting, Recursion Control.
For more information, see chapter List_Dir_1_2. -
Recursive procedures implement by nature depth-first search: If List_Dir_1_1.pl starts tracking a directory, it completely resolves its content first, before treating the remaining resolution stack. To illustrate this point, I implemented an ALGO option in List_Dir_1_1.pl that alters the behavior of procedure RecursiveResolveContentOf:
ALGO_1 If element to resolve is a directory then recursion line writes:
return (
$FirstElement,
RecursiveResolveContentOf(@ListOfContent_Added),
RecursiveResolveContentOf(@ListOfContent_Old)
);ALGO_2 If element to resolve is a directory then recursion line writes:
return (
$FirstElement,
RecursiveResolveContentOf(@ListOfContent_Old),
RecursiveResolveContentOf(@ListOfContent_Added)
);Command List_Dir_1_1.pl DirName displays following output:
Start processing... algo=[ALGO_1]
sub RecursiveResolveContentOf: Treat dir DirName
sub RecursiveResolveContentOf: Treat dir DirName/Dir_1
sub RecursiveResolveContentOf: Treat dir DirName/Dir_2
sub RecursiveResolveContentOf: Treat file DirName/Dir_2/fil_2_1.txt
sub RecursiveResolveContentOf: Treat file DirName/Dir_2/fil_2_2.txt
sub RecursiveResolveContentOf: Treat dir DirName/Dir_3
sub RecursiveResolveContentOf: Treat dir DirName/Dir_3/Dir_3_1
sub RecursiveResolveContentOf: Treat dir DirName/Dir_3/Dir_3_1/Dir_3_1_1
sub RecursiveResolveContentOf: Treat file DirName/Dir_3/Dir_3_1/fil_3_1_1.txt
sub RecursiveResolveContentOf: Treat dir DirName/Dir_3/Dir_3_2
sub RecursiveResolveContentOf: Treat dir DirName/Dir_3/Dir_3_3
sub RecursiveResolveContentOf: Treat file DirName/Dir_3/Dir_3_3/fil_3_3_1.txt
sub RecursiveResolveContentOf: Treat file DirName/Dir_3/fil_3_1.txt
sub RecursiveResolveContentOf: Treat file DirName/fil_1.txt
sub RecursiveResolveContentOf: Treat file DirName/fil_2.txt
sub RecursiveResolveContentOf: Treat file DirName/fil_3.txt
Listing completed within 0.00 seconds
Command List_Dir_1_1.pl algo=ALGO_2 DirName displays following output:
Start processing... algo=[ALGO_2]
sub RecursiveResolveContentOf: Treat dir DirName
sub RecursiveResolveContentOf: Treat dir DirName/Dir_1
sub RecursiveResolveContentOf: Treat dir DirName/Dir_2
sub RecursiveResolveContentOf: Treat dir DirName/Dir_3
sub RecursiveResolveContentOf: Treat file DirName/fil_1.txt
sub RecursiveResolveContentOf: Treat file DirName/fil_2.txt
sub RecursiveResolveContentOf: Treat file DirName/fil_3.txt
sub RecursiveResolveContentOf: Treat dir DirName/Dir_3/Dir_3_1
sub RecursiveResolveContentOf: Treat dir DirName/Dir_3/Dir_3_2
sub RecursiveResolveContentOf: Treat dir DirName/Dir_3/Dir_3_3
sub RecursiveResolveContentOf: Treat file DirName/Dir_3/fil_3_1.txt
sub RecursiveResolveContentOf: Treat file DirName/Dir_3/Dir_3_3/fil_3_3_1.txt
sub RecursiveResolveContentOf: Treat dir DirName/Dir_3/Dir_3_1/Dir_3_1_1
sub RecursiveResolveContentOf: Treat file DirName/Dir_3/Dir_3_1/fil_3_1_1.txt
sub RecursiveResolveContentOf: Treat file DirName/Dir_2/fil_2_1.txt
sub RecursiveResolveContentOf: Treat file DirName/Dir_2/fil_2_2.txt
Listing completed within 0.00 seconds
ALGO_1 and ALGO_2 produce a somewhat different output. However, the fundamental characteristic remains: tree-search is 'depth first'. (In a sense, ALGO_1 represents the 'pure' variant.)
-
List_Dir_1_2 or later
Beware: Where not explicitly mentioned, the descriptions below also apply to later versions of the program (current last version is List_Dir_1_3.pl).-
Recursion is an elegant programming solution. The code reflects adequately the nature of the problem to tackle. Also, it is mostly shorter than alternatives, because it shifts much of the work behind the scene onto the compiler/linker. In this respect, recursive style is a cross between procedural and declarative style.
However, this advantage turns out to be a major drawback, if the compiler/linker fails to handle the use case because existing resources e.g. temporary storage capacity are insufficient. I launched Dir_List_1_1.pl to list the entire user directory of my OSX10.10 computer. The program crashed: Too many directories and files to manage in the background.
Control features to alleviate this drawback and eventually prevent crashes have been introduced in version Dir_List_1_2.pl. These additional parameters have been defined as command line options.
Focus on the goal:
Only files, which match a given pattern, are collected (They are the true data carriers).Controlled abort:
- Search aborts, if the number of processed directories or files exceeds a given limit.
- Search aborts, if the number of collected files exceeds a given limit.
- Search aborts, if the processing time exceeds a given limit (in seconds).
Controlled display:
- An intermediary status is output, e.g. one processed element out one hundred ones is shown, indicating that the process is still running.
- Only a limited number of collected files (the final result) is displayed.
- Dir_List_1_2.pl will not help, if the user also focus on directories. It should however be easy to modify the program code in such a way, that pattern matching and collection also apply to directories.
-
Fig.: Flow chart of Dir_List_1_3.plThe flow chart representation has been deliberately kept close to the Perl source: Treatment steps are linked to their associated procedures. proc_ListDirR encapsulates the tree search. In comparison to its equivalent RecursiveResolveContentOf in Dir_List_1_1.pl-, it encompasses much more lines of code but the bulk of coding is to manage (read, check, display) the control information itself. Therefore an effort has been made to comply with the coding principles.
-
The program can be invoked without any argument: perl Dir_List_1_2.pl. In this particular case, default values apply for all required options.
In most cases however, the user will enter options to communicate his intentions: option=value. (Meta-options are options that operate at a higher abstraction level - controlling the option reading process itself.)(Meta-)Option Value Signification -syn Meta-option show_syntax.
If set, perl Dir_List_1_2.pl will only display the list of all available options with their legal values and exit.-OptFil FileName1[,FileName2[,..]] Meta-option option_file.
- If set, perl Dir_List_1_2.pl attempts to read a file containing options values.
- If no value is attributed, perl Dir_List_1_2.pl attempts to read file perl Dir_List_1_2.txt.
- Option files are supposed to be TXT-files (No attempt is made to verify this assumption). If no file can be retrieved or if no valid option values can be read, the program may display warnings but does not abort.
About formatting of option files see chapter below.
-ListDir Dir1[,Dir2[,..]]
Option Directories_List
specifies the directories to search.If no value is set, the current directory is assumed to be the target.
-ListFP Pattern1[,Pattern2[,..]]
Option Patterns_List.
- If set, the values correspond to patterns that files to retrieve must match.
- If not set, perl Dir_List_1_2.pl does no filter the content of directories to search.
-ModFP 0 (default value)
1
Option Pattern_Matching_Mode.
- If set to 0 (default value), patterns refer to file name extensions.
- If set to 1, patterns refer to the entire file name.
-ScoDis 0 (default value)
1
2
3
Option Display_Scope determines the amount of information sent to STDOUT during tree search.
- If set to 0 (default value = minimum output): Only the main results are displayed.
- If set to 1, all directories are additionally displayed.
- If set to 2, all matching files are additionally displayed.
- If set to 3, all files (matching or not) are additionally displayed.
-FV 0 (default value)
1
Option File_Visibility
- If set to 0 (default value): Only elements (files, directories) visible to the user are treated.
- If set to 1, elements invisible to the user are also accounted for.
In the current version, the 'visibility' of files and directories is interpreted according to Unix Operating Systems (OS) i.e. this option does not behave properly for other OS (see also
Hidden file / directory ).
-maxDir 20000 (default value)
Option MAX_Number_of_directories_to_search
If the limit is exceeded, tree search is aborted. A warning is issued. All results gathered so far are kept.
-maxFil 20000 (default value)
Option MAX_Number_of_files_to_screen
If the limit is exceeded, tree search is aborted. A warning is issued. All results gathered so far are kept.
-maxFM 10000 (default value)
Option MAX_Number_of_matching_files_to_collect
If the limit is exceeded, tree search is aborted. A warning is issued. All results gathered so far are kept.
-maxTim 300 (default value)
Option MAX_Duration_of_tree_search_in_seconds
If the limit is exceeded, tree search is aborted. A warning is issued. All results gathered so far are kept.
-maxDisp 100 (default value)
Option MAX_Number_of_matching_files_to_display
The number of matching files (=results) displayed, shall not exceed this limit.
Note: This option has no influence on the problem resolution itself.-modDisp 500 (default value)
Option MODULO_display_element_currently_treated
This display option may be interpreted as an alternative to -ScoDis, especially if -ScoDis=0: During treatment, every modDispth element (file or directory) is displayed - so that the user can control the progress.
Note: This option has no influence on the problem resolution itself.Rem:
Option names and reading coding have been designed in order to enable a maximum flexibility at possibly low coding costs. Among others following conditions apply:- Letter case (upper, lower case or combinations) is not significant.
- Option hyphens are optional (may be entered for the sake of readability).
- Recommended option names only represent denomination cores e.g. instead of syn, tag syntax and even show_syntax will also be correctly interpreted.
- If an option bears a value, it is strongly recommended to write option-name=option-value.
Under circumstances another separation symbol (as =) between option-name and option-value can be correctly interpreted.
If several values are entered, they must be separated with symbols , (comma) or ; (semi column). Especially normal separators like blanks, underscores, do not apply, primarily because they do not separate file or directory names. -
If several values are set for the same option, the program considers different cases:
- If the option is basically connected to a single value, the last value treated applies. All options bearing integer values belong to this category. Although in most cases irrelevant, since users are expected to enter a single value, this feature proves advantageous e.g. when options are also read from files as well as from the command line (see next item).
- If the option is basically connected to a list of values, all values are taken. This is presently the case for all options bearing string values: list of directories, list of patterns, list of option files.
- Option values were primarily entered with the command line. Since their number has grown bigger, a meta-option read_from_file has been coded. In the case that options are read both from option files and command line, the file information is processed first - which means that in case of conflict command line option values prevail.
-
An example of option file, List_Dir_1_2.txt, is inserted in the download package.
File reading principles are described in its first section called GENERAL_CONDITIONS. The main issues are:
- Comment lines starts with symbol # (pound key). They are discarded.
- The file is divided into data sections delimited with tags __BEGIN_SECTION_NAME__ (begin of section) and __END_SECTION_NAME (end of section). Data associated with one section must be inserted within the data section limits (otherwise they will not be retrieved).
-
Each effective (i.e. not commented) line matches following pattern: option-tag=option-value1[,value2[,..]].
Option-tags shall not be changed or somehow corrupted, less the program will not recognize them.
Currently, there is only one effective data section called OPTIONS. Here its content (corresponding to the use case published below):
__BEGIN_OPTIONS__ OPT_DIR ="/Users/amarkhelil/Downloads/DirName" OPT_FPAT =txt OPT_OUT =0 OPT_FV =0 OPT_MPAT =0 OPT_MAXDIR =20000 OPT_MAXFIL =20000 OPT_MAXFMAT =10000 OPT_MAXTIM =60 OPT_MAXDISP =100 OPT_MODDISP =500 __END_OPTIONS__ - The directory to search is explicitly stated.
- The files to collect must have extension TXT.
-
I ran a number of tests to verify the behavior of the program under different conditions (especially using directories that store many thousands of files). There is no time to document all results now. In short, the program did well. However, there is no such thing as a perfect program: If a user discovers bugs or has specific wishes, he may please contact me and describe his concern.
To demonstrate how the program works, I will select the same directory as for List_Dir_1_1.pl: DirName (see also DirName Structure or DirName Listing). DirName is tiny. None of the abort conditions will be raised. Nonetheless, the substantial differences between versions 1_1 and 1_2 will become apparent.
Following command lines will produce the same results:
- Perl -w List_Dir_1_2.pl -ListDir=/Users/amarkhelil/Downloads/DirName -ListFP=txt
- Perl -w List_Dir_1_2.pl -OptFil
- Perl -w List_Dir_1_2.pl -OptFil=List_Dir_1_2.txt
Summary of results # Number of all dirs screened [ 8] ; limit to abort recursion [20000] Number of all files screened [ 8] ; limit to abort recursion [20000] Number of matching files [ 8] ; limit to abort recursion [10000] Distribution of file patterns Extension [.\.txt$ ] 8 occurences # Time needed to process the command line [ 0.001] Sec [ 16.28] percent Time needed to collect the files [ 0.002] Sec [ 49.61] percent Time needed to sort the files [ 0.001] Sec [ 33.73] percent Time needed for all processes [ 0.004] Sec [100.00] percent # List of the matching files after sorting [ 1] DirName/Dir_2/fil_2_1.txt [ 2] DirName/Dir_2/fil_2_2.txt [ 3] DirName/Dir_3/Dir_3_1/fil_3_1_1.txt [ 4] DirName/Dir_3/Dir_3_3/fil_3_3_1.txt [ 5] DirName/Dir_3/fil_3_1.txt [ 6] DirName/fil_1.txt [ 7] DirName/fil_2.txt [ 8] DirName/fil_3.txt In Version 1_2 only the files are retained (directories are ignored). Should another file extension to look for be determined e.g. pdf like in Command line Perl -w List_Dir_1_2.pl -ListDir=/Users/amarkhelil/Downloads/DirName -ListFP=pdf, no file will be collected at all.
-
-
We first demonstrated the feasibility of a recursive algorithm List_Dir_1_1 for searching directory trees. On this basis, we developed a practical program List_Dir_1_2 to prevent uncontrolled crashes, if available resources do not suffice. (The user may beforehand determine option values fitting to his computer system).
The next step is to re-use the code basis into a extended treatment of collected files, one way or the other.
An example of such extension would be the Count_Lines_Of_Code program encompassing all source files pertaining to a given development project.