/* * regexp.sli * * This file is part of NEST. * * Copyright (C) 2004 The NEST Initiative * * NEST is free software: you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by * the Free Software Foundation, either version 2 of the License, or * (at your option) any later version. * * NEST is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU General Public License for more details. * * You should have received a copy of the GNU General Public License * along with NEST. If not, see <http://www.gnu.org/licenses/>. * */ /regexp /SLI ($Revision: 9988 $) provide-component /regexp /C++ (1.3) require-component /* BeginDocumentation Name: regcomp - Create a regular expression Synopsis: string integer regcomp -> regex string regcomp -> regex Description: regcomp will prepare a regular expression to be used with regexec. Any allowed flags are found in dictionary regexdict. Parameters: in: string: defining the regular expression, integer: flag, see Remarks. If in doubt, choose second call of regcomp which presets integer to REG_EXTENDED. out: the regular expression object Examples: 1) (.*) regcomp -> <regextype> 2) regexdict begin (\() REG_EXTENDED regcomp -> <ERROR> Diagnostics: If the string cannot be converted to a regular expression, an error message is displayed and /InvalidRegexError is raised. Variants: The variant "regcomp_" never raises an error, but returns: -> regex true -> regex integer false In case of an error, the regex error code is returned as an integer. This error code can be translated to a string using ":regerror" (see there). Author: Diesmann & Hehl, R Kupper (added error handling) FirstVersion: 27.9.99 Remarks: See man regcomp for futher details on POSIX regcomp. SeeAlso: regexec, :regerror */ /regcomp [/stringtype /integertype] { % The regcomp_ function does not itself raise an error. % In case of an error, it returns an error code and false. % This error code can be decoded by :regerror. % % Synopsis: string integer regcomp_ -> regex true % regex integer false % string regcomp -> regex true % regex integer false regcomp_ not { :regerror % leaves a string on stack M_ERROR (regcomp) rolld message /regcomp /InvalidRegexError raiseerror } if } def /regcomp [/stringtype] {regexdict /REG_EXTENDED get regcomp} def /* BeginDocumentation Name: :regerror - return message of what went wrong with "regcomp_" Synopsis: regex integer -> string Description: :regerror will decode the integer error code and return the related error description. Parameters: in: regex: a regular expression generated by "regcomp_" integer code of what went wrong. out: string of error description. Examples: preparation: (\() regcomp_ -> <regextype> 8 False pop now there's a wrong regex and an errorcode on the stack :regerror = -> Unmatched ( or \( Bugs: no known ;-) Diagnostics: no errors raised - this _is_ a command to check an error! Author: Diesmann & Hehl FirstVersion: 27.9.99 Remarks: See man regerror for further details on POSIX regerror. Note that the command "regcomp" performs automatic error checking. You should probably use "regcomp", not "regcomp_". SeeAlso: regexec, regcomp */ /:regerror trie [/regextype /integertype] /regerror_ load addtotrie def /* BeginDocumentation Name: regexdict - dictionary with flags and codes for regular expressions Synopsis: - Description: This dictionary provides flags and codes signaling properties of or results from regular expression evaluations. SeeAlso: regexec */ /* BeginDocumentation Name: regexec - compare string and regular expression Synopsis: regex string integer integer -> array integer regex string 0 integer -> integer Description: regexec evaluates the given regular expression and looks whether it is contained within a given string. It returns failure/success and the offsets of the position where the regular expression was matched. The same is true for any paranthesized subexpression: Those offsets are stored in an array up to the chosen depth. Flags and other codes REG_... are found in dictionary regexdict. Parameters: in: string is the string where te regular expression should be looked for; regex is a regular expression generated by regcomp first integer is the depth of paranthesized subexpression offsets that should be returned, second integer is a flag, for details see POSIX regexec (set to zero if in doubt!) out: integer is 0 if it matched, nonzero else (POSIX regexec error code) array: an array of [beginoffset,endoffset] arrays indicating the matches of the regex an any paranthesized subregexes within the string. Examples: (remember regexdict begin for these examples!) 1) (simple) REG_EXTENDED regcomp -> <regextype> ( simple ) 0 0 regexec -> 0 %this 0 indicating success 2) (simple) REG_EXTENDED regcomp -> <regextype> ( simple ) 1 0 regexec -> [[3 9]] 0 % this 3,9 are begin and end offsets of matched regex! 3) ((paranthesize)+.*example) regcomp -> <regextype> (This is a paranthesize using example) 2 0 regexec -> 0 [[10 36] [10 22]] Bugs: no knwon ;-) Diagnostics: no errors raised Author: Diesmann & Hehl FirstVersion: 27.9.99 Remarks: See man regexec for further details on POSIX regexec. SeeAlso: regexec, regcomp */ /regexec trie [/regextype /stringtype /integertype /integertype] /regexec_ load addtotrie def /* BeginDocumentation Name: regex_find_sf - Check if a regex is included in a stream Synopsis: string istream -> boolean Description: Takes the first argument. Converts to regex and calls regexec to find out if this regex matches the stream. Reports success/failure in a boolean true/false. Parameters: in: first argument : a string which will be converted to a regex by a regcomp call. second argument : an istream where this regex should be matched. out: true/false telling if there is/is no match. Examples: See examples of regex_find, exchange second string with a file. Bugs: - Diagnostics: Will raise an /InvalidRegexError if regcomp cannot compile the regex. Try immidiate :regerror = to find out why! Author: Hehl FirstVersion: 1.10.99 Remarks: Does _not_ return any information about the matched expression more than matched/not matched; use lower level commands regcomp, regexec if in need! SeeAlso: regexec, regcomp, regex_replace */ /regex_find_sf { exch regcomp exch regex_find_rf } bind def /* BeginDocumentation Name: regex_find_rf - Check if a regex is included in a stream Synopsis: regex istream -> boolean Description: Takes the first argument. Calls regexec to find out if this regex matches the stream. Reports success/failure in a boolean true/false. Parameters: in: first argument : a regex generated by regcomp second argument : an istream where this regex should be matched. out: true/false telling if there is a/is no match. Examples: See examples of regex_find, exchange second string with a file. Bugs: - Diagnostics: no errors raised Author: Hehl FirstVersion: 1.10.99 Remarks: Does _not_ return any information about the matched expression more than matched/not matched; use lower level commands regcomp, regexec if in need! SeeAlso: regexec, regcomp, regex_replace */ /regex_find_rf { << >> begin /Where Set /TheRegex Set /regex_found false def { Where getline not {pop exit} if exch pop TheRegex exch regex_find_r {/regex_found true def exit} if } loop regex_found end } bind def /* BeginDocumentation Name: regex_find_s - Check if a regex is included in a string Synopsis: string string -> boolean Description: Takes the first argument. Converts to regex and calls regexec to find out if this regex matches the string. Reports success/failure in a boolean true/false. Parameters: in: first argument :a string which will be converted to a regex by a regcomp call. second argument : a string where this regex should be matched. out: true/false telling if there is/is no match. Examples: (hello) (is there a hello hiding) regex_find -> true (hello) (is there a HeLlO hiding) regex_find -> false Bugs: - Diagnostics: Will raise an /InvalidRegexError if regcomp cannot compile the regex. Try immidiate :regerror = to find out why! Author: Hehl FirstVersion: 1.10.99 Remarks: Compiles regex and calls regex_find_s. Does _not_ return any information about the matched expression more than matched/not matched; use lower level commands regcomp, regexec if in need! SeeAlso: regexec, regcomp, regex_replace */ /regex_find_s { exch regcomp exch regex_find_r } bind def /* BeginDocumentation Name: regex_find_r - Check if a regex is included in a string Synopsis: regex string -> boolean Description: Takes the first argument and calls regexec to find out if this regex matches the string. Reports success/failure in a boolean true/false. Parameters: in: first argument : a regex generated by regcomp second argument : a string where this regex should be matched. out: true/false telling if there is/is no match. Examples: (hello) regexdict /REG_ICASE get regcomp pop (is there a HeLlO hiding) regex_find -> true Bugs: - Diagnostics: no errors raised Author: Hehl FirstVersion: 1.10.99 Remarks: Does _not_ return any information about the matched expression more than matched/not matched; use lower level commands regcomp, regexec if in need! SeeAlso: regexec, regcomp, regex_replace */ /regex_find_r { 0 0 regexec 0 eq {true}{false}ifelse } bind def /* BeginDocumentation Name: regex_find - Check if a regex is included in a string or stream Synopsis: string istream -> boolean string string -> boolean regex istream -> boolean regex string -> boolean Description: Takes the first argument. Converts to regex, if necessary, and calls regexec to find out if this regex matches the string/stream. Reports success/failure in a boolean true/false. Parameters: in: first argument : a regex generated by regcomp OR a string which will be converted to a regex by a regcomp call. second argument : an istream or a string where this regex should be matched. out: true/false telling if there is/is no match. Examples: (hello) (is there a hello hiding) regex_find -> true (hello) (is there a HeLlO hiding) regex_find -> false (hello) regexdict /REG_ICASE get regcomp pop (is there a HeLlO hiding) regex_find -> true Bugs: - Diagnostics: If called with a string as first argument, will raise an /InvalidRegexError if regcomp cannot compile the regex. Try immidiate :regerror = to find out why! Author: Hehl FirstVersion: 1.10.99 Remarks: Does _not_ return any information about the matched expression more than matched/not matched; use lower level commands regcomp, regexec if in need! SeeAlso: regexec, regcomp, regex_replace */ /regex_find trie [/stringtype /istreamtype] /regex_find_sf load addtotrie [/regextype /istreamtype] /regex_find_rf load addtotrie [/stringtype /stringtype] /regex_find_s load addtotrie [/regextype /stringtype] /regex_find_r load addtotrie def /regex_replace_sf { 4 -1 roll regcomp 4 1 roll regex_replace_rf } bind def /regex_replace_rf { << >> begin /DestFile Set /SourceFile Set /ReplaceString Set /TheRegex Set { SourceFile getline not {pop exit} if exch pop TheRegex ReplaceString 3 -1 roll regex_replace DestFile exch <- endl ; } loop end } bind def /regex_replace_s { 3 -1 roll regcomp 3 1 roll regex_replace_r } bind def /regex_replace_r { << >> begin /SourceString Set /ReplaceString Set /Regex Set /DestString () def { Regex SourceString 1 0 regexec 0 eq not {pop exit} { 0 get /offsets Set SourceString offsets 0 get SourceString length offsets 0 get sub erase_s ReplaceString join_s DestString exch join /DestString Set SourceString 0 offsets 1 get erase_s /SourceString Set } ifelse } loop DestString SourceString join_s end } bind def /* BeginDocumentation Name: regex_replace - replace all occurences of a regex Synopsis: string string istreamtype ostreamtype -> - regex string istreamtype ostreamtype -> - string string string -> string regex string string -> string Description: regex_replace tries to match the regex in istream/string. Any occurence of regex is replaced by the given string. Parameters: in: first argument : a regex generated by regcomp OR a string which will be converted to a regex by a regcomp call. second argument : the string with which regex should be replaced. third argument : an istream or a string where this regex should be matched. fourth argument (if any): an ostream where the changes are saved. out: either a string or, if called with streams, the ostream will now contain replaced stream. Examples: Bugs: Diagnostics: if called with a string as first argument, will raise an /InvalidRegexError if regcomp fails to compile a regex. Author: Hehl FirstVersion: 4.10.99 Remarks: SeeAlso: regexec, regcomp, regex_find */ /regex_replace trie [/stringtype /stringtype /istreamtype /ostreamtype] /regex_replace_sf load addtotrie [/regextype /stringtype /istreamtype/ostreamtype] /regex_replace_rf load addtotrie [/stringtype /stringtype /stringtype] /regex_replace_s load addtotrie [/regextype /stringtype /stringtype] /regex_replace_r load addtotrie def /* BeginDocumentation Name: grep - extract lines matching a regular expression pattern Synopsis: (filename) (expression) grep -> [(line1) (line2) ...] [(string1) (string2) ...] (expression) grep -> [(match1) (match2) ...] Description: "grep" is similar to the Unix command "grep". It performs a regular expression match, either on the lines of a file, or on the strings in an array of strings. A valid regular expression must be passed as a string as second argument. "grep" returns an array of matching lines or strings. It returns the full lines or strings that matched. If not match was found, the empty array is returned. Parameters: (filename) - name of the file to search for the pattern [(string1) ...] - array of strings to search for the pattern [(match1) ...] - result: array of matching strings Options: By default, "grep" uses extended regular expressions and performs case sensitive matching on the single lines or strings. This behaviour can be customized via the "SetOptions" command. "grep" has the following options that can be set via the "SetOptions" command: /flags_regcomp (integer) - flags passed to the "regcomp" command. Must be the logical AND of any of regexdict::REG_{EXTENDED,ICASE,NOSUB,NEWLINE}. Default: REG_EXTENDED /flags_regexec (integer) - flags passed to the "regexec" command. Must be the logical AND of any of regexdict::REG_{NOTBOL,NOTEOL}. Default: 0 Please read the Unix manpage for "regcomp" for explanation of these flags. To perform case insensitive matches, add regexdict::REG_ICASE to /flags_regcomp. Examples: [(hello) (world)] (^hell) grep -> [(hello)] [(hello) (world)] (not in here) grep -> [] [(hello) (world)] () grep -> [(hello) (world)] statusdict /prgdocdir get (LICENSE) joinpath (http) grep -> [(or visit http://www.nest-initiative.org)] Diagnostics: Raises /InvalidRegexError if the expression string is not a valid regular expression. Author: R Kupper FirstVersion: 23-jul-2008 Availability: standard SLI References: Unix manpage for "regcomp" SeeAlso: regex_find, regcomp, regexec, regexdict */ /grep << /flags_regcomp regexdict/REG_EXTENDED :: /flags_regexec 0 >> Options /grep[/stringtype /filename /stringtype /regexstr] { /regex regexstr /grep /flags_regcomp GetOption regcomp def /result [] def filename (r) file { getline not {exit} if /line Set regex line 0 /grep /flags_regexec GetOption regexec 0 eq {/result result line append def} if } loop pop % the stream result } SLIFunctionWrapper /grep[/arraytype /a /stringtype /regexstr] { /regex regexstr /grep /flags_regcomp GetOption regcomp def /result [] def a { /line Set regex line 0 /grep /flags_regexec GetOption regexec 0 eq {/result result line append def} if } forall result } SLIFunctionWrapper