function [A,ffn,numHeader,repChar,hl,fpos] = txt2mat(varargin) % TXT2MAT read an ascii file and convert a data table to a matrix % % Syntax: % A = txt2mat % A = txt2mat(fn) % [A,ffn,nh,SR,hl,fpos] = txt2mat(fn [,nh,nc,fmt,SR,SX] ) % [A, ...] = txt2mat(fn, ... 'param', value, ...) % [A, ...] = txt2mat(fn, instruct) % % with % % A output data matrix % ffn full file name % nh number of header lines % hl header lines (as a string) % fpos file position of last character read and converted from ascii file % % fn file or path name ('*' is allowed as wildcard in file name) % nh number of header lines % nc number of data columns % fmt format string % SR cell array of replacement strings sr, SR = {sr1,sr2,...} % SX cell array of invalid line strings sx, SX = {sx1,sx2,...} % % 'param', value see below for input parameter/value-pairs % % instruct input struct (each field name corresponds to an input % parameter name) % % TXT2MAT reads the ascii file and extracts the values found in a % data table with columns to a matrix, skipping header lines. % When extracting the data, is used as format string for each line % (see sscanf online doc for details about the format string). % % If is an existing directory, or contains an asterisk wildcard in the % file name, or is an empty string, a file selection dialogue is displayed. % % Additional strings ,,.. can be supplied within a cell array % to perform single character substitution before the data is % converted: each of the first n-1 characters of an character string is % replaced by the -th character. % % A further optional input argument is a cell array containing strings % ,,.. that mark "bad" lines containing invalid data. If every % line containing invalid data can be caught by the , TXT2MAT will % speed up significantly (see EXAMPLE 3a). Any lines that are recognized to % be invalid are completely ignored (and there is no corresponding row in % A). % % If the number of header lines or the number of data columns are % not provided, TXT2MAT performs some automatic analysis of the file format. % This will need the numbers in the file to be decimals (with decimal point % or comma) and the data arrangement to be more or less regular (see also % remark 1). % If is negative, TXT2MAT internally initializes the output matrix % with || columns, but allows for expanding if more numeric values % are found in any line of the file. To this end, TXT2MAT is forced to % switch to line by line conversion. % % If some lines of the data table can not be (fully) converted, the % corresponding rows in A are padded with NaNs. % % For further options and to facilitate the argument assignment, the % param/value notation or an input struct can be used instead of the single % argument syntax txt2mat(ffn,nh,nc,fmt,SR,SX). For usage see EXAMPLE 3a. % The following table lists the param/value-pairs and their corresponding % single argument, if existing: % % Param-string Value type Example value single arg. % 'NumHeaderLines' Scalar 13 nh % 'NumColumns' Scalar 9 nc % 'Format' String ['%d.%d.%d' repmat('%f',1,6)] fmt % 'ReplaceChar' Cell {')Rx ',';: '} SR % 'BadLineString' Cell {'Warng', 'Bad'} SX % 'GoodLineString' Cell {'2009-08-17'} - % 'SelectLineFun' FunHandle @(lineNo) rem(lineNo,2) == 0 - % 'ReplaceStr' Cell {{'True','1'},{'#NaN','#Inf','NaN'}} - % 'ReplaceRegExpr' Cell {{';\s*(?=;)','; NaN'}} - % 'NumericType' String 'single' - % 'RowRange' 2x1-vector [2501 5000] - % 'FilePos' Scalar 0 - % 'ReadMode' String 'auto' - % 'DialogString' String 'Now choose a log file' - % 'InfoLevel' Scalar 1 - % 'MemPar' Scalar 2^17 - % % The param/value-pairs may follow the usual arguments in any order, e.g. % txt2mat('file.txt',13,9,'BadLineString',{'Bad'},'Format','%f'). Only the % single file name argument must be given as the first input. % % Param/value-pairs with additional functionality: % % ˇ 'GoodLineString': ignore all lines that do not contain at least one of % the strings in the cell array (line filtering analogous to % 'BadLineString'; see EXAMPLE 3b). % % ˇ 'SelectLineFun': a single argument element-wise Boolean function that % is applied to the line numbers. If the function returns 'false' for a % certain line number, that line is skipped. Line number counting starts % with 1 (one) after the header lines. When using this option, the number % of header lines should be passed to txt2mat, too. See EXAMPLE 3c. % % ˇ The 'ReplaceStr' argument works similar to the 'ReplaceChar' argument. % It just replaces character sequences instead of single characters. A % cell array containing at least one cell array of strings must be % provided. Such a cell array of strings consists of strings, each of % the first strings is replaced by the -th string. For example, % with {{'R1a','R1b, 'S1'}, {'R2a','R2b','R2c', 'S2'}} % all the 'R'-strings are replaced by the corresponding 'S' string. % In general, replacing whole strings takes more time than 'ReplaceChar', % especially if the strings differ in size. % Expression replacements are performed before character replacements. % % ˇ By the help of the 'ReplaceRegExpr' argument regular expressions can be % replaced. The usage is analogous to 'ReplaceStr'. Regular expression % replacements are carried out before any other replacement (see % EXAMPLE 4 and EXAMPLE 5). % % ˇ 'NumericType' is one of 'int8', 'int16', 'int32', 'int64', 'uint8', % 'uint16', 'uint32', 'uint64', 'single', or 'double' (default), % determining the numeric class of the output matrix A. If the numeric % class does not support NaNs, missing elements are padded with zeros % instead. Reduce memory consumption by choosing an appropriate numeric % class, if needed. % % ˇ The 'RowRange' value is a sorted positive integer two element vector % defining an interval of data rows to be converted (header lines do not % count, but lines that will be recognized as invalid - see above - do). % If the vector's second element exceeds the number of valid data rows in % the file, the data is extracted up to the end of the file. Inf is % allowed as second element. It may save memory and computation time if % only a small part of data has to be extracted from a huge text file. % % ˇ The 'FilePos' value is a nonnegative integer scalar. % characters from the beginning of the file will be ignored, i.e. not be % read. If you run TXT2MAT with a 'RowRange' argument, you may % use the output as an 'FilePos' input during the next run in % order to continue from where you stopped. By that you can split up the % conversion process e.g. when the file is too big to be read as a whole % (see EXAMPLE 6). % % ˇ 'ReadMode': % 'matrix' Read and convert sections of multiple lines simultaneously, % requiring each line to contain the same number of values. % Finding an improper number of values in such a section will % cause an error (see also remark 2). % 'line' Read and convert text line by line, allowing different % numbers of values per line (slower than 'matrix' mode). % 'auto' Try 'matrix' first, continue with 'line' if an error occurs % (default). % 'block' Read and convert sections of multiple lines simultaneously % and fill up the data matrix regardless of how many values % occur in each text line (EXAMPLE 7). Only a warning is issued % if a section's number of values is not a multiple of the % number of columns of the output data matrix. This is the % fastest mode involving numeric conversion. % 'char' Do not convert into a numeric array, but return char vector % of the contents including omission of the header lines, % replacements, line filtering, and file position and row range % selection. Useful for reading and manipulating text files % with non-numeric contents, see EXAMPLE 3b. % With read mode 'char' the file format analysis is disabled. % 'cell' Same as 'char', but put each line of text into a separate % cell of the output, see EXAMPLE 3b. % % ˇ The 'DialogString' argument provides the text shown in the title bar of % the file selection dialogue that may appear. % % ˇ The 'InfoLevel' argument controls the verbosity of TXT2MAT's outputs in % the command window and the message boxes. Currently known values are: % 0, 1, 2 (default) % % ˇ The 'MemPar' argument provides the minimum amount of characters TXT2MAT % will process simultaneously as an internal text section (= a set of % text lines). It must be a positive integer. The value does not affect % the outputs, but computation time and memory usage. The roughly % optimized default is 65536; usually there is no need to change it. % % ------------------------------------------------------------------------- % % REMARKS % % 1) prerequisites for the automatic file format analysis before the % numeric conversion (if the number of header lines and data columns is % not given): % ˇ header lines can be detected by either non-numeric characters or % a strongly deviating number of numeric items in relation to the % data section (<10%) % ˇ tab, space, slash, comma, colon, and semicolon are accepted as % delimiters (e.g. "10/11/2006 08:30 1; 3.3; 0.52" is ok) % ˇ after the optional line filtering and user supplied replacements % have been carried out, the data section must contain the delimiters % and the decimal numbers only (point or comma are accepted as decimal % character). % Note I: if you do not trigger the internal file format analysis, i.e. % you do provide both the number of header lines and the number of data % columns, you also have to care for an eventual decimal _comma_ and % non-whitespace delimiters. Such a comma can be replaced with a '.', % and the whitespaces can either be included into a suitable format % string or be replaced with whitespaces (see e.g. the 'ReplaceChar' % argument). % Note II: if only the number of header lines is given, any character % except '+-1234567890aAeEfFiInN.,' (signs, decimals, NaN, Inf, dot, and % comma) that is found during file analysis is regarded as a possible % separator and therefore replaced by ' ' (space). % % 2) In matrix mode, txt2mat checks that the format string is suitable % and that the number of values read from a section of the file is the % product of the number of text lines and the number of columns. This % may be true even if the number of values per line is not uniform and % txt2mat may be misled. So using matrix mode you should be sure that % all lines that can't be sorted out by a bad line marker string contain % the same number of values. % % 3) Since txt2mat.m is a comparatively large file, generating a preparsed % file txt2mat.p once will speed up the first call during a matlab % session. Set the current directory to where you saved txt2mat.m and % type % >> pcode txt2mat % For further information, see the 'pcode' documentation. % % ========================================================================= % EXAMPLE 1: basic usage % ------------------------------------------------------------------------- % % A = txt2mat; % choose a file and let TXT2MAT analyse its format % % ========================================================================= % EXAMPLE 2: automatic file format analysis % ------------------------------------------------------------------------- % % Supposed your ascii file C:\mydata.log contains the following lines % ť % 10 11 2006 08 35.225 1 3.3 0.52 % 31 05 2008 12 12 0 0.0 0.00 % 7 01 2010 15 23.5 -1 3.3 0.535 % Ť % type % % A = txt2mat('C:\mydata.log',0,8); % % or just % % A = txt2mat('C:\mydata.log'); % % Below, TXT2MAT uses its automatic file layout detection as the header % line and column number is not given. With the file looking like this: % ť % some example data % plus another header line % 10/11/2006 08:35,225 1; 3,3; 0,52 % 31/05/2008 12:12 0; 0,0; 0,00 % 7/01/2010 15:23,5 -1; 3,3; 0,535 % Ť % txt2mat('C:\mydata.log') returns the same output data matrix as above. % % ========================================================================= % EXAMPLE 3a: line filtering by 'bad' markers; replacements % ------------------------------------------------------------------------- % % ť % ;$FILEVERSION=1.1 % ;$STARTTIME=38546.6741619815 % ;---+-- ----+---- --+-- ----+--- + -+ -- -- -- % 3) 7,2 Rx 0300 8 01 A3 58 4D % 4) 7,3 Rx 0310 8 06 6E 2B 9F % 5) 9,5 Warng FFFFFFFF 4 00 00 00 08 BUSHEAVY % 6) 12,9 Rx 0320 8 02 E1 F6 EF % Ť % % You may specify % nh = 3 % header lines, % nc = 12 % data columns, % fmt = '%f %f %x %x %x %x %x %x' % as format string for floats % % and hexadecimals, % sr1 = ')Rx ' % as first replacement string to blank the % % characters ')','R', and 'x' (if you don't want to % % include them in the format string), and % sr2 = ',.' % to replace the decimal comma with a dot, and % sx1 = 'Warng' % as a marker for invalid lines % % A = txt2mat('C:\mydata.log', nh, nc, fmt, {sr1,sr2}, {'Warng'}); % % A = % 3 7.2 768 8 1 163 88 77 % 4 7.3 784 8 6 110 43 159 % 6 12.9 800 8 2 225 246 239 % ... % % If you make use of the param/value-pairs, you can also write more clearly % % t2mOpts = {'NumHeaderLines', 3 , ... % 'NumColumns' , 12 , ... % 'ReplaceChar' , {')Rx ',',.'} , ... % 'Format' , '%f %f %x %x %x %x %x %x' , ... % 'BadLineString' , {'Warng'} }; % % A = txt2mat('C:\mydata.log', t2mOpts{:}); % % ... or you simply use an input struct % % t2mIns.NumHeaderLines = 3; % t2mIns.NumColumns = 12; % t2mIns.ReplaceChar = {')Rx ',',.'}; % t2mIns.Format = '%f %f %x %x %x %x %x %x'; % t2mIns.BadLineString = {'Warng'}; % % A = txt2mat('C:\mydata.log', t2mIns); % % Without the {'Warng'} argument, A would have been % % 3 7.2 768 8 1 163 88 77 % 4 7.3 784 8 6 110 43 159 % 5 9.5 NaN NaN NaN NaN NaN NaN % 6 12.9 800 8 2 225 246 239 % ... % % ========================================================================= % EXAMPLE 3b: line filtering by 'good' markers; return char or cell % ------------------------------------------------------------------------- % % ť % Some colours and numbers % 1 yellow 1 0 0 % 2 green 7 8 10 % 3 red 0 0 0 % 4 green 8 8 9 % 5 green 9 7 7 % 6 yellow 0 2 1 % Ť % % If you only want the data from the lines containing the string 'green': % % t2mOpts = {'NumHeaderLines', 1 , ... % 'NumColumns' , 4 , ... % 'Format' , '%f %*s %f %f %f', ... % 'GoodLineString', {'green'} }; % % A = txt2mat('C:\mydata.log', t2mOpts{:}); % % A = % 2 7 8 10 % 4 8 8 9 % 5 9 7 7 % % If you want to obtain those lines as text, use read mode 'char': % % t2mOpts = {'NumHeaderLines', 1 , ... % 'GoodLineString', {'green'} , ... % 'ReadMode' , 'char' }; % % [A,ffn,nh,SR,hl] = txt2mat('C:\mydata.log', t2mOpts{:}); % % A = % 2 green 7 8 10 % 4 green 8 8 9 % 5 green 9 7 7 % % whos A % Name Size Bytes Class Attributes % A 1x49 98 char % % Some examples of what you could do with the char vector A: % % - write it to a new file: % % fid = fopen('C:\mynewdata.log','w'); % fwrite(fid, hl); % write header % fwrite(fid, A); % write data % fclose(fid); % % ť % Some colours and numbers % 2 green 7 8 10 % 4 green 8 8 9 % 5 green 9 7 7 % Ť % % - proceed with functions like textscan: % % C = textscan(A,'%f %s %f %f %f'); % % C = {[2;4;5], {'green';'green';'green'}, [7;8;9], [8;8;7], [10;9;7]} % % To put each line into a separate cell of a cell array of strings, use the % very similar read mode 'cell': % % t2mOpts = {'NumHeaderLines', 1 , ... % 'GoodLineString', {'green'} , ... % 'ReadMode' , 'cell' }; % % [A,ffn,nh,SR,hl] = txt2mat('C:\mydata.log', t2mOpts{:}); % % A = % '2 green 7 8 10' % '4 green 8 8 9' % '5 green 9 7 7' % % ========================================================================= % EXAMPLE 3c: line filtering by line number % ------------------------------------------------------------------------- % % ť % line number and magic % 1 30 39 48 1 10 19 28 % 2 38 47 7 9 18 27 29 % 3 46 6 8 17 26 35 37 % 4 5 14 16 25 34 36 45 % 5 13 15 24 33 42 44 4 % 6 21 23 32 41 43 3 12 % 7 22 31 40 49 2 11 20 % Ť % % If you only want to read every 3rd line starting from line 4: % % N = 3; % n1 = 4; % selFun = @(L) rem(L,N)==rem(n1,N) & L>=n1; % % fn = 'C:\mydata.txt'; % t2mOpts = {'NumHeaderLines', 1 , ... % 'SelectLineFun' , selFun }; % % [A,ffn,nh,SR,hl] = txt2mat(fn, t2mOpts{:}); % % A = % 4 5 14 16 25 34 36 45 % 7 22 31 40 49 2 11 20 % % % Reading every 2nd line from line 3 to 6: % % N = 2; % selFun = @(L) rem(L,N)==1; % % t2mOpts = {'NumHeaderLines', 1 , ... % 'RowRange' , [3,6] , ... % 'SelectLineFun' , selFun }; % % [A,ffn,nh,SR,hl] = txt2mat_06_56(fn, t2mOpts{:}); % % A = % 3 46 6 8 17 26 35 37 % 5 13 15 24 33 42 44 4 % % ========================================================================= % EXAMPLE 4: regular expression replacements % ------------------------------------------------------------------------- % % Supposed your ascii file C:\mydata.log begins with the following lines: % ť % datetime % ppm % ppm Nm % datetime real8 real8 real8 % 30.10.2006 14:24:06,131 6,4459 478,519 6,5343 % 30.10.2006 14:24:17,400 6,4093 484,959 6,5343 % 30.10.2006 14:24:17,499 6,4093 484,959 6,5343 % Ť % you might specify % nh = 2 % header lines, % nc = 9 % data columns, % fmt = ['%d.%d.%d' repmat('%f',1,6)] % as format string for % % integers and hexadecimals, % sr1 = ': ' % as first replacement string to blank the ':' % sr2 = ',.' % to replace the decimal comma with a dot, and % % A = txt2mat('C:\mydata.log', nh, nc, fmt, {sr1,sr2}); % % A = % 30 10 2006 14 24 6.131 6.4459 478.519 6.5343 % 30 10 2006 14 24 17.4 6.4093 484.959 6.5343 % 30 10 2006 14 24 17.499 6.4093 484.959 6.5343 % ... % % % A = txt2mat('C:\mydata.log','ReplaceRegExpr',{{'\.(\d+)\.',' $1 '}}); % % yields the same result, but uses the built-in file layout analysis to % determine the number of header lines, the number of columns, the % delimiters, and the decimal character. You only help TXT2MAT by % telling it to replace dots surrounding the month number with spaces via % the regular expression replacement. So you can use the latter command on % similar files which have a different or previously unknown number of % header lines etc., too. % % ========================================================================= % EXAMPLE 5: regular expression replacements % ------------------------------------------------------------------------- % % If the data table of your file contains some gaps that can be identified % by some repeated delimiters (here ;) % ť % ; 02; 03; 04; 05; % 11; ; 13; 14; 15; % 21; ; 23; ;; % ; 32; 33; 34; 35; % Ť % you can fill them with NaNs by the help of 'ReplaceRegExpr': % % A = txt2mat('C:\mydata.log','ReplaceRegExpr',... % {{'((?<=;\s*);)|(^\s*;)','NaN;'}}); % % A = % NaN 2 3 4 5 % 11 NaN 13 14 15 % 21 NaN 23 NaN NaN % NaN 32 33 34 35 % % ========================================================================= % EXAMPLE 6: processing a file in chunks % ------------------------------------------------------------------------- % % If you want to process the contents of mydata.log step by step, % converting one million lines at a time: % % fp = 0; % file position to start with (beginning of file) % A = NaN; % initialize output matrix % nhl = 12; % number of header lines for the first call % % while numel(A)>0 % [A,ffn,nh,SR,hl,fp] = txt2mat('C:\mydata.log','RowRange',[1,1e6], ... % 'FilePos',fp,'NumHeaderLines',nhl); % nhl = 0; % there are no further header lines % % % process intermediate results... % end % % ========================================================================= % EXAMPLE 7: read mode 'block' and 'line' % ------------------------------------------------------------------------- % % You can use the read mode 'block' on very large files with a constant % number of values per line to save some import time compared to the % 'matrix' mode. Besides, since TXT2MAT then does not check for line breaks % within the (internally processed) sections of a file, you can use the % block mode to fill up any output matrix with a fixed number of columns. % ť % 1 2 3 4 5 % 6 7 8 9 10 % % 11 12 13 14 15 % 16 17 18 19 20 % 21 22 % 23 24 25 % 26 27 28 29 30 % % Ť % % A = txt2mat('C:\mydata.txt',0,5,'ReadMode','block') % % A = % 1 2 3 4 5 % 6 7 8 9 10 % 11 12 13 14 15 % 16 17 18 19 20 % 21 22 23 24 25 % 26 27 28 29 30 % % % Instead, if you want to preserve the line break information, use the % (slower) read mode 'line': % % A = txt2mat('C:\mydata.txt',0,5,'ReadMode','line') % % or % % A = txt2mat('C:\mydata.txt',0,-1) % % A = % 1 2 3 4 5 % 6 7 8 9 10 % NaN NaN NaN NaN NaN % 11 12 13 14 15 % 16 17 18 19 20 % 21 22 NaN NaN NaN % 23 24 25 NaN NaN % 26 27 28 29 30 % % The first command reads up to 5 elements per line, starting from the % first, and puts them to a Nx5 matrix, whereas the second one % automatically expands the column size of the output to fit in the maximum % number of elements occuring in a line. This is effected by the negative % column number argument that also implies read mode 'line' here. % % ========================================================================= % % See also SSCANF % --- Author: ------------------------------------------------------------- % Copyright 2005-2014 Andres % $Revision: 6.60.0 $ $Date: 2014/03/23 21:52:03 $ % --- E-Mail: ------------------------------------------------------------- % x=-2:3; % disp(char(round([polyval([-0.32,0.43,1.75,-5.90,-0.95,116],x),... % polyval([-4.44,9.12,29.8,-33.6,-52.9, 98],x)]))) % you may also contact me via the author page % http://www.mathworks.com/matlabcentral/fileexchange/authors/30255 % --- History ------------------------------------------------------------- % 05.61 % ˇ fixed bug: possible wrong headerlines output when using 'FilePos' % ˇ fixed bug: produced an error if a bad line marker string was already % found in the first data line % ˇ corrected user information if sscanf fails in matrix mode % ˇ added some more help lines % 05.62 % ˇ allow negative NumColumns argument to capture a priori unknown % numbers of values per line % 05.82 beta % ˇ support regular expression replacements ('ReplaceRegExpr' argument) % ˇ consider user supplied replacements when analysing the file layout % 05.86 beta % ˇ some code clean-up (argincheck subfunction, ...) % 05.86.1 % ˇ fixed bug: possibly wrong numeric matlab version number detection % 05.90 % ˇ consider skippable lines when analysing the file layout % ˇ code rearrangements (subfun for line termination detection, ...) % 05.96 % ˇ subfuns to find line breaks / bad-line pos and to initialize output A % ˇ better handling of errors and 'degenerate' files, e.g. exit without % an error if the file selection dialogue was cancelled % 05.97 % ˇ fixed bug: error in file analysis if first line contains bad line % marker % ˇ fixed bug: a bad line marker is ignored if the string is split up by % two consecutive internal sections % ˇ better code readability in findLineBreaks subfunction % 05.97.1 % ˇ simplifications by skipping the header when reading from the file; % the header is now read separately and is not affected by any % replacements % ˇ corrected handling of bad line markers that already appear in header % 05.98 % ˇ corrected search for long bad line marker strings that could exceed % text dimensions % ˇ speed-up by improved finding of line break positions % 06.00 % ˇ introduction of 'high speed' read mode "block" requiring less line % break information % ˇ 'MemPar' buffer value changed to scalar % ˇ reduced memory demand by translating smaller text portions to char % ˇ modified help % 06.01 % ˇ fixed bug: possible error message in file analysis when only header % line number is given % 06.04 % ˇ better handling of replacement strings containing line breaks % ˇ allow '*' in file name to use file name as open file dialogue filter % 06.12 % ˇ 'good line' filter as requested by Val Schmidt % 06.17.1 % ˇ enable 'good line' filtering during automatic file analysis % ˇ new read modes 'char' and 'cell' to provide txt2mat's preprocessing % features esp. for non-numeric data, too % 06.17.3 % ˇ version number workaround for MCR execution (Leonard's remark) % 06.40 % ˇ input argument check by inputparser (R2007a), allowing input struct % ˇ minor changes in code and documentation % 06.60 % ˇ added option to select lines by line number ('SelectLineFun'), e.g. % to skip every n-th line as suggested by Kaare % ˇ reduced memory footprint and improved speed during good/bad line % filtering by working in chunks % % --- Wish list ----------------------------------------------------------- %% Get input arguments % check the arguments by argincheck: arg = argincheck(varargin); % returns % arg.val.(argname) -> value of the input % arg.has.(argname) -> T/F argument was given % arg.num.(argname) -> number of values for some non-scalar inputs % some abbreviations ffn = arg.val.FileName; numHeader = arg.val.NumHeaderLines; numColon = arg.val.NumColumns; readMode = arg.val.ReadMode; formatStr = arg.val.Format; repChar = arg.val.ReplaceChar; filePos = arg.val.FilePos; memPar = arg.val.MemPar; numRC = arg.num.ReplaceChar; numRS = arg.num.ReplaceStr; numRR = arg.num.ReplaceRegExpr; numBL = arg.num.BadLineString; numGL = arg.num.GoodLineString; % ~~~~~ special handling of file name argument arg.val.FileName ~~~~~~~~~~~ % 1) no file or path name is given -> open file dialogue if ~arg.has.FileName || isempty(ffn) [fn,pn] = uigetfile('*.*', arg.val.DialogString); ffn = fullfile(pn,fn); % 2) a path name is given -> open file dialogue with *.* filter spec elseif exist(ffn,'dir') == 7 curcd = cd; cd(ffn); [fn,pn] = uigetfile('*.*', arg.val.DialogString); ffn = fullfile(pn,fn); cd(curcd); % 3) a valid file name is given -> take it as it is elseif exist(ffn,'file') [fname,fname,ext] = fileparts(ffn); %#ok fn = [fname,ext]; % 4) an asterisk in the file name -> open file dialogue, use filter spec % - OR - % nonexisting file -> produce error message and return else [pathstr, fname, ext] = fileparts(ffn); doOpenDialog = (isempty(pathstr) || exist(pathstr,'dir')==7) && ... numel(strfind([fname, ext], '*')) > 0; if doOpenDialog if ~isempty(pathstr) curcd = cd; cd(pathstr) end [fn,pn] = uigetfile({[fname, ext];'*.*'}, arg.val.DialogString); ffn = fullfile(pn,fn); if ~isempty(pathstr) cd(curcd); end else % wrong name error('txt2mat:invalidFileName','no such file or directory'); end end % recheck file name (necessary e.g. after ESC in open file dialogue) if exist(ffn,'file')~=2 [A,ffn,numHeader,repChar,hl,fpos] = deal([]); if arg.val.InfoLevel>=1 disp('Exiting txt2mat: No existing file given.') end return end % generate a shortened form of the file name: if length(fn) < 28 fnShort = fn; else fnShort = ['...' fn(end-24:end)]; end arg.val.FileName = ffn; % ~~~~~ special handling of file name argument arg.val.FileName ~~~~~~~end~ clear varargin %% Analyze data format % try some automatic data format analysis if needed (by function anatxt) doAnalyzeFile = ~all([arg.has.NumHeaderLines, arg.has.NumColumns]); %, is_argin_conv_str]); % commented out as so far anatxt's formatStr is only '%f' % switch off file analysis if read mode is 'char' or 'cell' doAnalyzeFile = doAnalyzeFile && ~strcmpi(arg.val.ReadMode,'char') && ~strcmpi(arg.val.ReadMode,'cell'); if doAnalyzeFile % call subfunction anatxt: [anaNumHeader, anaNumColon, ~, anaRepChar, anaReadMode, ... anaNumAnalyzed, anaHeader, anaFileErr, anaErr] = anatxt(arg); % quit if errors occurred if ~isempty(anaErr) [A,repChar,fpos,hl] = deal([]); numHeader = anaNumHeader; if arg.val.InfoLevel>=1 disp(['Exiting txt2mat: file analysis: ' anaErr]) end return end % accept required results from anatxt: if ~arg.has.NumHeaderLines numHeader = anaNumHeader; end if ~arg.has.NumColumns numColon = anaNumColon; end %if ~arg.has.Format % unused % formatStr = anaFormat; %end if ~arg.has.ReadMode readMode = anaReadMode; end % add new replacement character strings from anatxt: isNewRC = ~ismember(anaRepChar, repChar); numRC = numRC + sum(isNewRC); repChar = [repChar,anaRepChar(isNewRC)]; % display information: if arg.val.InfoLevel >= 1 disp(repmat('*',1,length(ffn)+2)); disp(['* ' ffn]); if numel(anaFileErr)==0 sr_display_str = ''; for idx = 1:numRC; sr_display_str = [sr_display_str ' ť' repChar{idx} 'Ť']; %#ok end disp(['* read mode: ' readMode]); disp(['* ' num2str(anaNumAnalyzed) ' data lines analysed' ]); disp(['* ' num2str(numHeader) ' header line(s)']); disp(['* ' num2str(abs(numColon)) ' data column(s)']); disp(['* ' num2str(numRC) ' string replacement(s)' sr_display_str]); else disp(['* fread error: ' anaFileErr '.']); end disp(repmat('*',1,length(ffn)+2)); end % if % return if anatxt did not detect valid data if anaNumColon==0 A = []; hl = ''; fpos = filePos; return end end %% Detect line termination character if arg.val.InfoLevel >= 1 hw = waitbar(0,'detect line termination character ...'); set(hw,'Name',[mfilename ' - ' fnShort]); hasWaitbar = true; else hasWaitbar = false; end lbfull = detectLineBreakCharacters(ffn); % lbfull line break character(s) as uint8, i.e. % [13 10] (cr+lf) for standard DOS / Windows files % [10] (lf) for Unix files % [13] (cr) for Mac files % The DOS style values are returned as defaults if no such line breaks are % found. lbuint = lbfull(end); lbchar = char(lbuint); numLbfull = numel(lbfull); %% Open file and set position indicator to end of header % ... and extract header separately if not already done logfid = fopen(ffn); if numHeader > 0 if doAnalyzeFile % header lines have already been extracted hl = anaHeader; lenHeader = numel(hl); fseek(logfid,filePos+lenHeader,'bof'); else if arg.has.FilePos fseek(logfid,filePos,'bof'); end %*% todo: use function getLines here read_len = 65536; % (quite small) size of text sections just for header line extraction do_read = true; num_lb_curr = 0; countLoop = 0; while do_read [f8p,lenf8p] = fread(logfid,read_len,'*uint8'); % current text section ldcp_curr = find(f8p==lbuint); % line break positions in current text section num_lb_curr = num_lb_curr + numel(ldcp_curr); % number of line breaks so far do_read = (lenf8p == read_len) && (num_lb_curr < numHeader); countLoop = countLoop + 1; end if num_lb_curr >= numHeader lenHeader = ldcp_curr(end-(num_lb_curr-numHeader)) + (countLoop-1)*read_len; if countLoop == 1 % take the complete header from the first section hl = char(f8p(1:lenHeader)).'; fseek(logfid,filePos+lenHeader,'bof'); else % the header did not fit into a single section, so re-read % it as a whole fseek(logfid,filePos,'bof'); hl = char(fread(logfid,lenHeader).'); end else % exit here as we have found less line breaks than the given % number of header lines! fseek(logfid,filePos,'bof'); hl = char(fread(logfid).'); fpos = ftell(logfid); fclose(logfid); [A,repChar] = deal([]); if arg.val.InfoLevel>=1 disp(['Exiting txt2mat: ' num2str(numHeader) ' header lines expected, but only ' num2str(num_lb_curr) ' line breaks found.']) close(hw) end return end end else lenHeader = 0; hl = ''; if arg.has.FilePos fseek(logfid,filePos,'bof'); end end %% Read in ASCII file - case 1: portions only, as RowRange is given. % RowRange should be given if the file is too huge to be read at once by % fread. In this case multiple freads are used to read in consecutive % sections of the text. By counting the line breaks those rows of the text % that match the RowRange argument are added to the 'core' variable f8 that % is later used for the numeric conversion. % By definition, a line begins with its first character and ends with its % last termination character. if hasWaitbar waitbar(0.01,hw,'reading file ...'); end if arg.has.RowRange do_read = true; % loop condition num_lb_prev = 0; read_len = memPar; f8 = []; while do_read [f8p,lenf8p] = fread(logfid,read_len,'*uint8'); % current text section ldcp_curr = find(f8p==lbuint); % line break positions in current text section num_lb_curr = numel(ldcp_curr); % add lines of interest to f8 if (arg.val.RowRange(1) <= num_lb_prev+num_lb_curr+1) && (num_lb_prev < arg.val.RowRange(2)) if arg.val.RowRange(1) <= num_lb_prev + 1 % lines of interest started before current section sdx = 1; % start index is beginning of section => the part of the section to be added to f8 includes the start of the section else % lines of interest start within current section num_lines_to_omit = arg.val.RowRange(1)-1-num_lb_prev; % how many lines not to add sdx = ldcp_curr(num_lines_to_omit)+1; % start right after the omitted lines end if arg.val.RowRange(2) > num_lb_curr+num_lb_prev % lines of interest end beyond current section edx = lenf8p; % end index is length of section => the part of the section to be added to f8 includes the end of the section else % lines of interest end within current section num_lines_to_add = arg.val.RowRange(2)-num_lb_prev; % how many lines to add edx = ldcp_curr(num_lines_to_add); % corresponding end index end f8 = [f8; f8p(sdx:edx)]; %#ok fpos = ftell(logfid)-lenf8p+edx; % position of the latest added character end % quit loop if all rows of interest are read or if end of file is reached if num_lb_prev >= arg.val.RowRange(2) || lenf8p=1 disp('Exiting txt2mat: no numeric data found.') close(hw) end return end %% Clean up whitespaces at the end of file f8 = cleanUpFinalWhitespace(f8,lbfull); %% check line break position awareness hasReplacements = any([numRC,numRS,numRR] > 0 ); % as finding the line breaks is time-critical, "LbAwareness" is % introduced to tell us what we know about line break positions: % 0: nothing % 1: the positions of the final line break in every section % 2: the above + the number of lines up to each of those line breaks % 3: all line break positions % determine the minimum reqired LbAwareness, and set a waitbar progress % factor >1 if there's no sscanf read: wbFactor = 1; switch lower(readMode) case 'char' minLbAwareness = double(hasReplacements); wbFactor = 2; case 'block' minLbAwareness = 1; case {'matrix','auto'} minLbAwareness = 2; case 'line' minLbAwareness = 3; case 'cell' minLbAwareness = 3; wbFactor = 1.9; end %% filter lines (rows) % select lines by line number and 'bad' and/or 'good' marker strings if arg.has.SelectLineFun || (numBL + numGL > 0) if hasWaitbar waitbar(wbFactor*0.10,hw,'filtering lines ...'); end [f8, idcLb, cntLb, secLbIdc] = filterLines(f8, lbuint, memPar, arg); LbAwareness = 3; else LbAwareness = 0; end %% Find line break positions if necessary if LbAwareness < minLbAwareness if hasWaitbar waitbar(wbFactor*0.20,hw,'updating line break positions ...'); end % Find out if we have to expect text length changes due to the % replacemets doExpectLengthChange = false; % default if numRR > 0 % always expect changes by regular expressions doExpectLengthChange = true; else % check for string replacements that will change the length for edx = 1:numRS if any(diff(cellfun('length', arg.val.ReplaceStr{edx}))) doExpectLengthChange = true; break end end end if doExpectLengthChange || strcmpi(readMode,'block') % - make K1 doFindAll = false; doCount = false; LbAwareness = 1; else if strcmpi(readMode,'line') || strcmpi(readMode,'cell') % - make K3 doFindAll = true; doCount = true; LbAwareness = 3; else % readmode is 'auto' or 'matrix' % - make K2 doFindAll = false; doCount = true; LbAwareness = 2; end end [idcLb,cntLb,secLbIdc] = findLineBreaks(f8, lbuint, memPar, doFindAll, doCount); end %% Replace (regular) expressions and characters doReplaceLb = false; % default, to be checked below if numRR > 0 has_length_changed = true; else has_length_changed = false; % flag for changes of length of f8 by replacements end if hasReplacements if hasWaitbar waitbar(wbFactor*0.20,hw,'replacing strings ...'); end numSectionLb = numel(secLbIdc); % If a ReplaceStr begins with a line break character, such a character % will temporarily be prepended to each replacement section to apply % the replacement to the _first_ line of a section, too. % Besides, check for any occurence of the break character in the % ReplaceStr in order to preventively trigger an update of the line % break positions afterwards. % Set defaults before checking: doPrependLb = false; numPrepend = 0; if numRS>0 % put all the characters from the ReplaceStr strings into an % uint8-array: uint8Replace = uint8(char([arg.val.ReplaceStr{:}])); % check if any row starts with a line break: if any(uint8Replace(:,1)==lbuint) doPrependLb = true; numPrepend = 1; end if any(uint8Replace(:)==lbuint) doReplaceLb = true; end end for sdx = 2:numSectionLb if doPrependLb f8_akt = char([lbuint, f8(idcLb(secLbIdc(sdx-1))+1 : idcLb(secLbIdc(sdx))).']); else f8_akt = char(f8(idcLb(secLbIdc(sdx-1))+1 : idcLb(secLbIdc(sdx))).'); end if numRS > 0 || numRR > 0 len_f8_akt = idcLb(secLbIdc(sdx)) - idcLb(secLbIdc(sdx-1)); % length of current section before replacements % Replacements, e.g. {'odd','one','1'} replaces 'odd' and 'one' by '1' % Regular Expression Replacements: ============================ for vdx = 1:numRR % step through replacements arguments srarg = arg.val.ReplaceRegExpr{vdx}; % pick a single argument... for xdx = 1:(numel(srarg)-1) f8_akt = regexprep(f8_akt, srarg{xdx}, srarg{end}); % ... and perform replacements end % for end % for % Expression Replacements: ==================================== for vdx = 1:numRS % step through replacements arguments srarg = arg.val.ReplaceStr{vdx}; % pick a single argument... for xdx = 1:(numel(srarg)-1) f8_akt = strrep(f8_akt, srarg{xdx}, srarg{end}); % ... and perform replacements if ~has_length_changed && (len_f8_akt~=numel(f8_akt)) has_length_changed = true; % detect a change of length of f8 end end % for end % for % update f8-sections by f8_akt ================================ exten = numel(f8_akt) - len_f8_akt; % extension by replacements if exten == 0 if doPrependLb f8( idcLb(secLbIdc(sdx-1))+1 : idcLb(secLbIdc(sdx)) ) = uint8(f8_akt(1+numPrepend:end)).'; else f8( idcLb(secLbIdc(sdx-1))+1 : idcLb(secLbIdc(sdx)) ) = uint8(f8_akt).'; end else if doPrependLb f8 = [f8(1:idcLb(secLbIdc(sdx-1))); uint8(f8_akt(1+numPrepend:end)).'; f8(idcLb(secLbIdc(sdx))+1:end)]; else f8 = [f8(1:idcLb(secLbIdc(sdx-1))); uint8(f8_akt).' ; f8(idcLb(secLbIdc(sdx))+1:end)]; end % update linebreak indices of the following sections % (but we don't know the lb indices of the current one anymore): idcLb(secLbIdc(sdx:end)) = idcLb(secLbIdc(sdx:end)) + exten; end end % if numRS > 0 || numRR > 0 % Character Replacements: ========================================= for vdx = 1:numRC % step through replacement arguments srarg = repChar{vdx}; % pick a single argument for xdx = 1:(numel(srarg)-1) rep_idx = idcLb(secLbIdc(sdx-1))+strfind(f8_akt,srarg(xdx))-numPrepend; f8(rep_idx) = uint8(srarg(end)); % perform replacement end % for end if hasWaitbar && ~mod(sdx,256) waitbar(wbFactor*(0.20+0.25*((sdx-1)/(numSectionLb-1))),hw) end end clear f8_akt end % if %% ReadMode 'char': exit here with char array if strcmpi(readMode,'char') A=char(f8.'); if arg.val.InfoLevel>=1 close(hw) end return end %% Update linebreak indices % see above... % if the final line break might have changed, clean up trailing whitespaces % here again if doReplaceLb || numRR > 0 f8 = cleanUpFinalWhitespace(f8,lbfull); end if has_length_changed || (LbAwareness < minLbAwareness) || doReplaceLb if hasWaitbar waitbar(0.45,hw,'updating line break positions ...'); end if strcmpi(readMode,'block') % - make K1 doFindAll = false; doCount = false; LbAwareness = 1; elseif strcmpi(readMode,'line') || strcmpi(readMode,'cell') % - make K3 doFindAll = true; doCount = true; LbAwareness = 3; else % readmode is 'auto' or 'matrix' % - make K2 doFindAll = false; doCount = true; LbAwareness = 2; end [idcLb,cntLb,secLbIdc] = findLineBreaks(f8, lbuint, memPar, doFindAll, doCount); end % Determine the total number of line breaks (including the leading 'zero' % line break and the eventually added final line break) depending on % LbAwareness. If LbAwareness is less than 2, we can't know that number. if LbAwareness == 2 num_lf = cntLb(end)+1; elseif LbAwareness == 3 num_lf = numel(idcLb); else num_lf = NaN; end %% ReadMode 'cell': return lines in a cell array if strcmpi(readMode,'cell') f8 = char(f8).'; % A = arrayfun(@(m,n) {(f8(m:n)}, ... % lf_idc(1:end-1)+1,lf_idc(2:end)-num_lbfull); % but arrayfun is slower here, so use V6 code (for loop) only: A = repmat({''},num_lf-1,1); for m = 1:num_lf-1 A{m} = f8(idcLb(m)+1:idcLb(m+1)-numLbfull); end if arg.val.InfoLevel>=1 close(hw) end return end %% ReadMode 'block': wilfully fill up output matrix if strcmpi(readMode,'block') if hasWaitbar waitbar(0.5,hw,'converting in ''block'' mode ...'); end numColonBlock = abs(numColon); % number of columns in output matrix isNumelOk = true; % initialize flag "in every section the number of elements is a multiple of number of columns" numSectionLb = numel(secLbIdc); % 1 + number of sections to process doSetNan = true; % flag "output matrix will be initialized with NaNs" % convert first section ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ startIdcF8 = idcLb(secLbIdc(1))+1; endIdcF8 = idcLb(secLbIdc(2)); % THE conversion of this section by sscanf: [Atmp,count,errmsg,nextindex] = ... sscanf(char(f8(startIdcF8 : endIdcF8)), formatStr); %#ok numAtmp = numel(Atmp); % examine how many elements we found in this section numRowsCurr = ceil(numAtmp/numColonBlock); % how many rows will contain these elements numelMissing = numRowsCurr*numColonBlock-numAtmp; % how many elements are missing to fill up the last of these rows a = initializeMatrix(1,1,arg.val.NumericType,doSetNan); if numSectionLb < 3 % there is only one section, so just generate the final output % matrix here: A = reshape([Atmp;repmat(a,numelMissing,1)],numColonBlock,numRowsCurr).'; if numelMissing>0 isNumelOk = false; end else % there are multiple sections, so initialize the output matrix % first ... if isnan(num_lf) % guess final size of A for preallocating expandFactor = diff(idcLb(secLbIdc([1,end])))/diff(idcLb(secLbIdc([1,2]))); numRowsGuessed = round(numRowsCurr * expandFactor); else numRowsGuessed = num_lf; end A = initializeMatrix(numRowsGuessed,numColonBlock,arg.val.NumericType,doSetNan); % ... and put the first elements to it: startRow = 1; endRow = numRowsCurr; Atmp = reshape([Atmp;repmat(a,numelMissing,1)],numColonBlock,numRowsCurr).'; A(startRow:endRow,1:numColonBlock) = Atmp; % If the first section was incomplete, the first elements of the % second section will be added to the last row of the first % section. So keep in mind the elements of the incomplete row here: if numelMissing>0 isNumelOk = false; repeatRow = 1; ARepeat = A(endRow,1:(numColonBlock-numelMissing)).'; else repeatRow = 0; ARepeat = []; end % now step through the following sections for sdx = 2:numSectionLb-1 % the text positions of the current section: startIdcF8 = idcLb(secLbIdc(sdx))+1; endIdcF8 = idcLb(secLbIdc(sdx+1)); % THE conversion of this section by sscanf: [Atmp,count,errmsg,nextindex] = ... sscanf(char(f8(startIdcF8 : endIdcF8)), formatStr); %#ok numAtmp = numel(Atmp); if numAtmp == 0 Atmp = double(Atmp); end % as with the first section, add the new values the output % matrrix numRowsCurr = ceil( (numAtmp-numelMissing) / numColonBlock ); numelMissing = numRowsCurr*numColonBlock-(numAtmp-numelMissing); startRow = endRow+1-repeatRow; endRow = endRow+numRowsCurr; Atmp = reshape([ARepeat;Atmp;repmat(a,numelMissing,1)],numColonBlock,numRowsCurr+repeatRow).'; A(startRow:endRow,1:numColonBlock) = Atmp; % remember elements of an incomplete row for the next section if numelMissing>0 isNumelOk = false; repeatRow = 1; ARepeat = A(endRow,1:(numColonBlock-numelMissing)).'; else repeatRow = 0; ARepeat = []; end if hasWaitbar && ~mod(sdx,256) waitbar(0.5+0.5*((sdx-1)/(numSectionLb-1)),hw) end end if numRowsGuessed > endRow A = A(1:endRow,:); % A(endRow+1:numRowsGuessed,:) = []; end end if ~isNumelOk warning('txt2mat:NumberOfElements', 'Number of elements did not fill up a complete row') end end %% ReadMode 'matrix': try converting large sections % sscanf will be applied to consecutive working sections consisting of % rows. The number of numeric values must then be a multiple of % the number of columns. Otherwise, or if sscanf produces an error, inform % the user and eventually proceed to the (slower) line-by-line conversion. errmsg = ''; % Init. error message variable if strcmpi(readMode,'auto') || strcmpi(readMode,'matrix') if hasWaitbar waitbar(0.5,hw,'converting in ''matrix'' mode ...'); end try numColonMatrix = abs(numColon); errorType = 'none'; % A = initializeMatrix(num_lf-1,numColonMatrix,arg.val.NumericType,false); % Usually, in 'matrix' mode, we have LbAwareness == 2. As the way % we calculate the number of rows in a section depends on % LbAwareness, we check that here: hasNotAllLb = LbAwareness < 3; numSectionLb = numel(secLbIdc); %*% for testing purposes: aggregate multiple sections to a larger one %sectionStep =1; % how many sections to aggregate %selectedSectionIdc = min(2:sectionStep:numSectionLb+sectionStep-1, numSectionLb); %*% in this case, use max(1,sdx-sectionStep) instead of sdx-1 below selectedSectionIdc = 2:numSectionLb; for sdx = selectedSectionIdc % start and end indices of the current section in the text: startIdcF8 = idcLb(secLbIdc(sdx-1))+1; endIdcF8 = idcLb(secLbIdc(sdx)); % THE conversion of this section by sscanf: [Atmp,count,errmsg,nextindex] = ... sscanf(char(f8(startIdcF8 : endIdcF8)), formatStr); % the correponding row indices of the output matrix: if hasNotAllLb startRow = cntLb(sdx-1)+1; endRow = cntLb(sdx); else startRow = secLbIdc(sdx-1); endRow = secLbIdc(sdx)-1; end num_lines_loop = endRow - startRow + 1; %~% error handling ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ if ~isempty(errmsg) % there's an sscanf error message errorType = 'sscanf'; break elseif numel(Atmp) ~= numColonMatrix * num_lines_loop % we did not read the expected number of numeric elements errorType = 'numel'; numelExpected = numColonMatrix * num_lines_loop; numelFound = numel(Atmp); break end %~% end error handling ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ % put the values to the right dimensions and add them to A Atmp = reshape(Atmp,numColonMatrix,num_lines_loop)'; A(startRow:endRow,:) = Atmp; if hasWaitbar && ~mod(sdx,256) waitbar(0.5+0.5*((sdx-1)/(numSectionLb-1)),hw) end end % for sdx = 2:numSectionLb % error diagnosis and user information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ switch errorType case 'sscanf' if (arg.val.InfoLevel >= 2) && ( nextindex <= endIdcF8 - startIdcF8 + 1 ) % If sscanf did not process the whole string, display % the text line where it stopped. % line break indices in the current section idcLbCurr = [0, strfind(f8(startIdcF8 : endIdcF8).', lbchar)]; % find line break index of the abortion line idxErrorLine = find(idcLbCurr-nextindex > 0, 1 ); % text content of the abortion line errorLineText = f8(startIdcF8 + (idcLbCurr(idxErrorLine-1):idcLbCurr(idxErrorLine)-numLbfull-1) ).'; % display information about the error cause disp(['Sscanf error after reading ' num2str((startRow-1)*numColonMatrix+count) ' numeric values.']) disp(['Text content of the critical row (no. ' num2str(numHeader+startRow-1+idxErrorLine-1) ' without deleted lines): ']) disp(errorLineText) end % if case 'numel' if arg.val.InfoLevel >= 2 % We don't know the exact lines containing the wrong % number of values. As a guess, just display the % positions of the longest or the shortest text lines % (by simply counting characters). % line break indices in the current section idcLbCurr = [0, strfind(f8(startIdcF8 : endIdcF8).', lbchar)]; % corresponding text line lengths lenLine = diff(idcLbCurr); [lenLineSorted,idclenLineSorted] = sort(lenLine); maxNumDisplayed = min(5,numel(lenLine)); if numelFound < numelExpected disp(['Found less elements (' num2str(numelFound) ') than expected (' num2str(numelExpected) ') in the current section.']) disp('As a hint, these are the text lines containing the least characters:') disp(['lines no. [' num2str(numHeader+startRow-1+idclenLineSorted(1:maxNumDisplayed)) '] having [' num2str(lenLineSorted(1:maxNumDisplayed),' %i') '] characters, resp.']) else disp(['Found more elements (' num2str(numelFound) ') than expected (' num2str(numelExpected) ') in the current section.']) disp('As a hint, these are the text lines containing the most characters:') disp(['lines no. [' num2str(numHeader+startRow-1+idclenLineSorted(end:-1:end-maxNumDisplayed+1)) '] having [' num2str(lenLineSorted(end:-1:end-maxNumDisplayed+1),' %i') '] characters, resp.']) end end error('Unexpected number of elements in read mode ''matrix''.') end % end error diagnosis and user information ~~~~~~~~~~~~~~~~~~~~~~~~ catch %#ok % catch further errors (old catch style) if ~exist('errmsg','var') || isempty(errmsg) errmsg = lasterr; %#ok (old catch style) end end % try end % Quit on error if 'matrix'-mode was enforced: if strcmpi(readMode,'matrix') && ~isempty(errmsg) if arg.val.InfoLevel >= 1 close(hw) end error(errmsg); end %% ReadMode 'line': convert line-by-line clear Atmp if strcmpi(readMode,'line') || ~isempty(errmsg) num_data_per_row = zeros(num_lf-1,1); if ~strcmpi(readMode,'line') numColon = -abs(numColon); if arg.val.InfoLevel >= 2 disp('Due to error') disp(strrep([' ' errmsg],char(10),char([10 32 32]))) disp('txt2mat will now try to read line by line...') end % if end if LbAwareness < 3 idcLb = findLineBreaks(f8, lbuint, memPar, true, false); num_lf = numel(idcLb); end % initialize result matrix A depending on matlab version: width_A = max(abs(numColon),1); [A,A1] = initializeMatrix(num_lf-1,width_A,arg.val.NumericType,true); if hasWaitbar if strcmpi(readMode,'line') waitbar(0.5,hw,{'reading line-by-line ...'}) else poshw = get(hw,'Position'); set(hw,'Position',[poshw(1), poshw(2)-4/7*poshw(4), poshw(3), 11/7*poshw(4)]); waitbar(0.5,hw,{'now reading line-by-line because of error:';['[' errmsg ']']}) set(findall(hw,'Type','text'),'interpreter','none'); end drawnow end % extract numeric values line-by-line: for ldx = 1:(num_lf-1) a = sscanf(char(f8( (idcLb(ldx)+1) : idcLb(ldx+1)-1 )),formatStr)'; num_data_per_row(ldx) = numel(a); % If necessary, expand A along second dimension (allowed if % numColon < 0) if (num_data_per_row(ldx) > width_A) && (numColon < 0) A = [A, repmat(A1,size(A,1),... num_data_per_row(ldx)-width_A)]; %#ok width_A = num_data_per_row(ldx); end A(ldx,1:min(num_data_per_row(ldx),width_A)) = a(1:min(num_data_per_row(ldx),width_A)); % display waitbar: if hasWaitbar && ~mod(ldx,10000) waitbar(0.5+0.5*(ldx./(num_lf-1)),hw) end % if end % for % display info about number of numeric values per line if arg.val.InfoLevel >= 2 if numColon>=0 reference = numColon; elseif numColon == -1; reference = width_A; else reference = -numColon; end disp('txt2mat row length info:') idc_less_data = find(num_data_per_rowreference); num_less_data = numel(idc_less_data); num_more_data = numel(idc_more_data); num_equal_data = num_lf-1 - num_less_data - num_more_data; info_ca(1:3,1) = {[' ' num2str(num_equal_data)];[' ' num2str(num_less_data)];[' ' num2str(num_more_data)]}; info_ca(1:3,2) = {[' row(s) found with ' num2str(reference) ' values'],... ' row(s) found with less values',... ' row(s) found with more values'}; info_ca(1:3,3) = {' ';' ';' '}; if num_less_data>0 info_ca{2,3} = [' (row no. ', num2str(numHeader+idc_less_data(1:min(10,num_less_data))'), repmat(' ...',1,num_less_data>10), ')']; end if num_more_data>0 info_ca{3,3} = [' (row no. ', num2str(numHeader+idc_more_data(1:min(10,num_more_data))'), repmat(' ...',1,num_more_data>10), ')']; end disp(strcatcell(info_ca)); end % if arg.val.InfoLevel >= 2 end % if if arg.val.InfoLevel >= 1 close(hw) end %% : : : : : subfunction ANATXT : : : : : function [anaNumHeader, anaNumColon, anaFormat, anaRepChar, anaReadMode, ... anaNumAnalyzed, anaHeader, anaFileErr, anaErr] = anatxt(arg) % ANATXT analyse data layout in a text file for txt2mat % % Usage: % [nh, nc, fmt, SR, RM, llta, hl, ferrmsg, aerrmsg] = ... % anatxt(arg); % % nh number of header lines % nc number of columns % fmt format string (curr. always '%f') % SR character replacement string % RM recommended read mode % llta lines analysed after header % hl header line characters % ferrmsg file operation error message % aerrmsg other error messages from this function % % arg txt2mat's input argument struct % Copyright 2006-2014 Andres % $Revision: 4.00 $ $Date: 2014/03/18 14:05:08 $ % todo: especially this function needs a cleanup... % some preparations ffn = arg.val.FileName; filePos = arg.val.FilePos; repChar = arg.val.ReplaceChar; repStr = arg.val.ReplaceStr; repReg = arg.val.ReplaceRegExpr; numHeader = arg.val.NumHeaderLines; numRR = arg.num.ReplaceRegExpr; numRS = arg.num.ReplaceStr; numRC = arg.num.ReplaceChar; numBL = arg.num.BadLineString; numGL = arg.num.GoodLineString; [anaNumColon, anaNumAnalyzed] = deal(0); [anaReadMode, anaHeader, anaErr] = deal(''); anaRepChar = {}; anaNumHeader = numHeader; %% Read in file % definitions numCharRead = 65536; % minimum number of characters to read minLines = 10; % minimum number of lines to read if isfinite(numHeader) minLines = minLines + numHeader; end valueRatio = 0.1; % this ratio will tell if a row has enough values anaFormat = '%f'; % assume floats only (so far) hasFileErr = false; % init anaFileErr = ''; % init fid = fopen(ffn); if filePos > 0 status = fseek(fid,filePos,'bof'); if status ~= 0 hasFileErr = true; anaFileErr = ferror(fid,'clear'); end end if ~hasFileErr % detect line termination character lbfull = detectLineBreakCharacters(ffn); lbuint = lbfull(end); lbchar = char(lbuint); % read in the first part of the file [f8,numLb,posLb] = getLines(fid, minLines, numCharRead, 0, 0, false, lbfull); % getLines get a set of consecutive lines from file % [hl,numLb,posLb,isAtEnd] = getLines(fid, minLines, minChars, offset, ... % origin, inclWsAtEnd, lbfull, lenSection) end fclose(fid); % care for some exceptions if hasFileErr anaErr = 'file operation error'; return end if isempty(f8) anaErr = 'empty file'; return end if numLb <= numHeader anaErr = 'file has not more lines than given number of header lines'; return end % remember the original text before deletions and replacements f8Orig = char(f8.'); posLbOrig = posLb; if numHeader > 0 % select post-header-part of f8 f8 = f8(posLb(numHeader+1):end); numLb = numLb - numHeader; posLb = posLb(numHeader+1:end) - posLb(numHeader+1); end %% filter lines (rows) doFilter = arg.has.SelectLineFun || (numBL + numGL > 0); if doFilter [f8, posLb, numLb, ~, isOk] = filterLines(f8, lbuint, numCharRead, arg); end %% Replace regular expressions, strings, and characters, if needed if numRS>0 || numRC>0 || numRR>0 % If a ReplaceStr begins with a line break character, such a character % will temporarily be prepended to apply the replacement to the _first_ % line, too. prependChar = ''; % prepend nothing by default if numRS>0 % put all the characters from the ReplaceStr strings into an % uint8-array: uint8Replace = uint8(char([repStr{:}])); % check if any row starts with a line break: if any(uint8Replace(:,1)==lbuint) prependChar = lbchar; end end numPrepend = numel(prependChar); f8=[prependChar, char(f8.')]; if numRR>0 for vdx = 1:numRR % step through regex replacement arguments srarg = repReg{vdx}; % pick a single replacement argument for sdx = 1:(numel(srarg)-1) f8 = regexprep(f8, srarg{sdx}, srarg{end}); % replace it end end end if numRS>0 for vdx = 1:numRS % step through string replacement arguments srarg = repStr{vdx}; % pick a single replacement argument for sdx = 1:(numel(srarg)-1) f8 = strrep(f8, srarg{sdx}, srarg{end}); % replace it end end end if numRC>0 for vdx = 1:numRC % step through char replacement arguments srarg = repChar{vdx}; % pick a single replacement argument for sdx = 1:(numel(srarg)-1) f8( strfind(f8,srarg(sdx)) ) = srarg(end); % replace it end end end f8 = uint8(f8(1+numPrepend:end).'); % update line break indices isLB = f8==lbuint; posLb = [0;find(isLB)]; numLb = numel(posLb)-1; end %% Find character types % further representations of the text as required below f8c = char(f8.'); f8d = double(f8.'); % types of characters: prnAscii = uint8([32:127, 128+32:255]); % printable ASCIIs dec_nr_p = sort(uint8('+-1234567890dDeE.NanIiFfA')); % decimals with NaN, Inf, signs and . sep_wo_k = uint8([9 32 47 58 59]); % separators excluding comma sep_wi_k = uint8([9 32 44 47 58 59]); % separators including comma (Tab Space ,/:;) komma = uint8(','); % , other = setdiff(prnAscii, [sep_wi_k, dec_nr_p]); % printables without separators and decimals % characters not expected to appear in the data lines: is_othr = ismembc(f8d,double(other)); % switch to double for compatibility is_beg_othr = diff([false, is_othr]); % true where groups of such characters begin idc_beg_othr = find(is_beg_othr==1); % start indices of these groups [~, sidx] = sort([posLb(2:end).',idc_beg_othr]); % in sidx, the numbers (1:num_lb) representing the linebreaks are placed between the indices of the start indices from above num_beg_othr_per_line = diff([0,find(sidx<=numLb)]) - 1; % number of character groups per line % numbers enclosing a dot: % idc_digdotdig = regexp(f8c, '[\+\-]?\d+\.\d+([deDE][\+\-]?\d+)?', 'start'); idc_digdotdig = regexp(f8c, '[\+\-]?\d+\.\d+([deDE][\+\-]?\d+)?'); [~, sidx] = sort([posLb(2:end).',idc_digdotdig]); num_beg_digdotdig_per_line = diff([0,find(sidx<=numLb)]) - 1; % numbers enclosing a comma: % idc_digkomdig = regexp(f8c, '[\+\-]?\d+,\d+([eE][\+\-]?\d+)?', 'start'); idc_digkomdig = regexp(f8c, '[\+\-]?\d+,\d+([eE][\+\-]?\d+)?'); [~, sidx] = sort([posLb(2:end).',idc_digkomdig]); num_beg_digkomdig_per_line = diff([0,find(sidx<=numLb)]) - 1; % numbers without a dot or a comma: % idc_numbers = regexp(f8c, '[\+\-]?\d+([eE][\+\-]?\d+)?', 'start'); idc_numbers = regexp(f8c, '[\+\-]?\d+([eE][\+\-]?\d+)?'); [~, sidx] = sort([posLb(2:end).',idc_numbers]); num_beg_numbers_per_line = diff([0,find(sidx<=numLb)]) - 1; % NaN and Inf items : idc_nan = regexpi(f8c, '\<[\+\-]?(nan|inf)\>'); [~, sidx] = sort([posLb(2:end).',idc_nan]); num_beg_nan_per_line = diff([0,find(sidx<=numLb)]) - 1; % commas enclosed by numeric digits % idc_kombd = regexp(f8c, '(?<=[\d]),(?=[\d])', 'start'); % if compareversion(vn,7) % idc_kombd = regexp(f8c, '(?<=[\d]),(?=[\d])'); % lookaround new to v7.0?? % else idc_kombd = 1+regexp(f8c, '\d,\d'); % end [~, sidx] = sort([posLb(2:end).',idc_kombd]); num_beg_kombd_per_line = diff([0,find(sidx<=numLb)]) - 1; % two sequential commas without a (different) separator inbetween % idc_2kom = regexp(f8c, ',[^\s:;],', 'start'); idc_2kom = regexp(f8c, ',[^\s:;/],'); % commas: is_kom = f8.'==komma; idc_kom = find(is_kom); [~, sidx] = sort([posLb(2:end).',idc_kom]); num_kom_per_line = diff([0,find(sidx<=numLb)]) - 1; %% Analyze if isnan(numHeader) % ~~~~~ there's no user-supplied number of header lines % determine number of header lines: numHeader = max([0, find(num_beg_othr_per_line>0)]); % for now, take the last line containing an 'other'-character if numHeader>=numLb anaErr = 'no numeric data found'; if numHeader>0 anaHeader = char(f8(1:posLb(numHeader+1))); end return end num_beg_numbers_ph = num_beg_numbers_per_line(numHeader+1:end)+num_beg_nan_per_line(numHeader+1:end); % number of lines following % by definition, a line is a valid data line if it contains enough % numbers compared to the average: has_enough_numbers = num_beg_numbers_ph>valueRatio.*mean(num_beg_numbers_ph); numHeader = numHeader + find(has_enough_numbers, 1 ) - 1; % extract header and data section if numHeader>0 f8v_idx1 = posLb(numHeader+1)+1; % beginning of the data section in f8 if doFilter % reconstruct number of header lines from the original text anaNumHeader = find(cumsum(isOk)==numHeader,1,'first'); else anaNumHeader = numHeader; end anaHeader = f8Orig(1:posLbOrig(anaNumHeader+1)); else f8v_idx1 = 1; anaHeader = []; anaNumHeader = 0; end f8 = f8(f8v_idx1:end); % valid data section of f8 anaNumAnalyzed = numLb - numHeader; % number of non-header lines to analyse else % ~~~~~~~~~~~~~~~ a number of header lines was given as input argument if numHeader>0 anaHeader = f8Orig(1:posLbOrig(numHeader+1)); else anaHeader = []; end anaNumAnalyzed = numLb; end % find out decimal separator character ('.' or ',') anaRepChar = {}; % Init. replacement character string SR_idx = 0; % Init. counter of the above sepchar = ''; % Init. separator (delimiter) character decchar = '.'; % Init. decimal character (default) num_values_per_line = -num_beg_digdotdig_per_line + num_beg_numbers_per_line; % Are there commas? If yes, are they decimal commas or delimiters? if any( num_kom_per_line(numHeader+1:end) > 0 ) sepchar = ','; % preliminary take comma for delimiter % Decimal commas are neighboured by two numeric digits ... % and between two commas there has to be another separator if all(num_kom_per_line(numHeader+1:end) == num_beg_kombd_per_line(numHeader+1:end)) ... % Are all commas enclosed by numeric digits? && ~any(num_beg_digdotdig_per_line(numHeader+1:end) > 0) ... % There are no numbers with dots? && ~any(idc_2kom(numHeader+1:end) > 0) % There is no pair of commas with no other separator inbetween? decchar = ','; sepchar = ''; num_values_per_line = -num_beg_digkomdig_per_line + num_beg_numbers_per_line; % number of values per line end end % replacement string for replacements by spaces % other separators is_wo_k_found = ismember(sep_wo_k, f8); % Tab Space : ; is_other_found= ismember(other,f8); % other printable ASCIIs % possible replacement string to replace : and ; sr1 = [sepchar, char(sep_wo_k([0 0 1 1 1]&is_wo_k_found))]; % possible replacement string to replace other characters sr2 = char(other(is_other_found)); % still obsolete as such lines are treated as header lines % Wrong! The above is not true if % the number of header lines is % given by the user. if numel([sr1,sr2])>0 SR_idx = SR_idx + 1; anaRepChar{SR_idx} = [sr1, sr2, ' ']; end % possible replacement string to replace the decimal character if strcmp(decchar,',') SR_idx = SR_idx + 1; anaRepChar{SR_idx} = ',.'; end num_items_per_line = num_values_per_line + num_beg_nan_per_line; anaNumColon = max(num_items_per_line(numHeader+1:end)); % proposed number of columns if isempty(anaNumColon) anaErr = 'no numeric data found'; return end % suggest a proper read mode depending on uniformity of the number of values per % line if numel(unique(num_items_per_line(numHeader+1:end))) > 1 anaReadMode = 'line'; anaNumColon = -anaNumColon; else anaReadMode = 'auto'; end %% : : : : : further subfunctions : : : : : function s = strcatcell(C) % STRCATCELL Concatenate strings of a 1D/2D cell array of strings % % C = {'a ','123';'b','12'} % C = % 'a ' '123' % 'b' '12' % s = strcatcell(C) % s = % a 123 % b 12 num_col = size(C,2); D = cell(1,num_col); for idx = 1:num_col D{idx} = char(C{:,idx}); end s = [D{:}]; function [w, newidcoi, vi] = cutvec(v,li,hi,doKeep,varargin) % CUTVEC remove multiple sections from a vector by linear index intervals % % Syntax: % w = cutvec(v,li,hi,doKeep) % OR % [w, new_idc_oi, vi] = ... % cutvec(v,li,hi,doKeep,old_idc_oi) % % v input vector % li lower endpoints of linear index intervals (sorted vector) % hi upper endpoints of linear index intervals (sorted vector) % doKeep true: remove values outside all intervals % false: remove values within all intervals % old_idc_oi indices of interest in v (optional) % % w output vector consisting of v-sections % new_idc_oi corresponding indices of interest in w % vi logical matrix with w=v(vi) % % Inputs li, hi and doKeep may also be cell arrays of equal size holding % multiple sets of index endpoints and logicals. % % EXAMPLE: % % w = cutvec([1:20],[3,10,16],[7,12,19],1) % % => w = [3 4 5 6 7 10 11 12 16 17 18 19] % % w = cutvec([1:20],[3,10,16],[7,12,19],0) % % => w = [1 2 8 9 13 14 15 20] % % w = cutvec([1:20],{[3,10,16],[1,15]},... % {[7,12,19],[5,20]},{0,1}) % % => w = [1 2 15 20] % % tic, w = cutvec([1:5000000]',[100:500:5000000],[200:500:5000000],0); toc % % % Elapsed time is 0.202949 seconds. % % v = 1:20; % li= [10,18]; % hi= [12,19]; % doKeep = 0; % idcoi = [1,4,7,10,13,18,20]; % % [w, newidcoi, vi] = cutvec(v,li,hi,doKeep,idcoi) % $Revision: 1.23 $ lenV = numel(v); has_idcoi = false; newidcoi=[]; if nargin == 5 % indices of interest are provided idcoi = int32(varargin{1}); if ~issorted(idcoi) error([mfilename ': vector of indices of interest must be sorted!']) end has_idcoi = true; end if iscell(li) vi = endpoint2logical(lenV,li{1},hi{1},logical(doKeep{1})); for ci = 2:numel(li) vi = vi & endpoint2logical(lenV,li{ci},hi{ci},logical(doKeep{ci})); end else vi = endpoint2logical(lenV,li,hi,logical(doKeep)); end if has_idcoi remidc = int32(find(vi)); newidcoi = ismembc2(idcoi,remidc); end w = v(vi); function vi = endpoint2logical(len,li,hi,doInclude) % ENDPOINT2LOGICAL convert endpoints of index intervals to logical index % % Syntax: % vi = endpoint2logical(len,li,hi,doInclude) % % with % % len length of logical index vector % li vector with lower endpoints of linear index intervals % hi vector with upper endpoints of linear index intervals % doInclude true: logical indices are 1 only inside the intervals % false: logical indices are 1 only outside the intervals % % vi logical index vector % initialize output: if doInclude vi = false(len,1); else vi = true(len,1); end for i = 1:numel(li) vi(li(i):hi(i)) = doInclude; end function arg = argincheck(allargin) % ARGINCHECK get input arguments for txt2mat % % arg = argincheck(allargin) % provides input argument information in struct arg with fields % arg.val.(argname) -> value of the input % arg.has.(argname) -> T/F argument was given % arg.num.(argname) -> number of values for some non-scalar inputs % Check input argument occurence (Property/Value-pairs) % 1 'NumHeaderLines', Scalar, 13 % 2 'NumColumns', Scalar, 100 % 3 'Format', String, ['%d.%d.%d' repmat('%f',1,6)] % 4 'ReplaceChar', CellAString {')Rx ',';: '} % 5 'BadLineString' CellAString {'Warng', 'Bad'} % 6 'ReplaceStr', CellAString {{'True','1'},{'False','0'},{'#Inf','Inf'}} % 7 'DialogString' String 'Now choose a Labview-Logfile' % 8 'MemPar' 2x1-Vector [2e7, 2e5] % 9 'InfoLevel' Scalar 2 % 10 'ReadMode' String 'Auto' % 11 'NumericType' String 'single' % 12 'RowRange' 2x1-Vector [1,Inf] % 13 'FilePos' Scalar 1e5 % 14 'ReplaceRegExpr' CellArOfStr {{'True','1'},{'False','0'},{'#Inf','Inf'}} % 15 'GoodLineString' CellAString {'OK'} % 16 'SelectLineFun' FunHandle @(rowNo) rem(rowNo-1,2) < 1 % check for validated argument struct as last input to bypass further input % parsing (undocumented, untested -> todo) hasValidatedArgStruct = false; % default if ~isempty(allargin) && isstruct(allargin{end}) argStruct = allargin{end}; if isfield(argStruct,'isValidated') && argStruct.isValidated hasValidatedArgStruct = true; end end if hasValidatedArgStruct % carry over validated inputs arg = argStruct; else %-- main input parsing p = inputParser; p.KeepUnmatched = true; p.FunctionName = 'txt2mat'; %-- optional inputs that follow the file name p.addOptional( 'NumHeaderLines', NaN , @(x)isempty(x)||(isnumeric(x)&&isscalar(x))) p.addOptional( 'NumColumns' , [] , @(x)isempty(x)||(isnumeric(x)&&isscalar(x))) p.addOptional( 'Format' , '%f', @(x)isempty(x)||(ischar(x)&&any(x=='%'))) p.addOptional( 'ReplaceChar' , {} , @(x)isempty(x)||iscellstr(x)||ischar(x)) p.addOptional( 'BadLineString' , {} , @(x)isempty(x)||iscellstr(x)) %-- param/value only inputs: p.addParamValue('SelectLineFun' , {} , @(x)isa(x,'function_handle')) p.addParamValue('GoodLineString', {} , @(x)isempty(x)||iscellstr(x)) p.addParamValue('ReplaceStr' , {} , @(x)isempty(x)||iscell(x)) p.addParamValue('ReplaceRegExpr', {} , @(x)isempty(x)||iscell(x)) p.addParamValue('NumericType' , 'double', @(x)ischar(x)) p.addParamValue('RowRange' , [1,Inf] , @(x)isnumeric(x)&&(numel(x)==2)) p.addParamValue('FilePos' , 0 , @(x)isnumeric(x)&&isscalar(x)) p.addParamValue('ReadMode' , 'auto', @(x)ischar(x)) p.addParamValue('DialogString' , 'Select File', @(x)ischar(x)) p.addParamValue('InfoLevel' , 2 , @(x)isnumeric(x)&&isscalar(x)) p.addParamValue('MemPar' , 65536, @(x)isnumeric(x)&&isscalar(x)) %-- older param names, still accepted p.addParamValue('ConvString' , '%f', @(x)isempty(x)||ischar(x)) p.addParamValue('ReplaceExpr' , {} , @(x)isempty(x)||iscell(x)) %-- parse inputs: p.parse(allargin{2:end}) % rearrange input argument parsing results to a nested struct called % 'arg', with % arg.val.(name) holding the values of the inputs % arg.has.(name) indicating whether the input was given (t/f) if ~isempty(fieldnames(p.Unmatched)) ufn = fieldnames(p.Unmatched); warning('txt2mat:unmatchedArg', ... ['Unmatched input parameter names were ignored ("' ufn{1} '").']) end arg.val = p.Results; defnames = p.UsingDefaults; argnames = fieldnames(arg.val); isdefcell = [argnames.'; repmat({true},1,numel(argnames))]; arg.has = struct(isdefcell{:}); for k = defnames arg.has.(k{:}) = false; end % ~~~ additional checks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ % error if both old and new param names occur, accept if solely old one % 'ConvString' -> 'Format' if arg.has.ConvString && arg.has.Format error('txt2mat:deprecatedConvString', ... 'use param name ''Format'' only (instead of ''ConvString'')'); elseif arg.has.ConvString arg.has.Format = arg.has.ConvString; arg.val.Format = arg.val.ConvString; end % 'ReplaceExpr' -> 'ReplaceStr' if arg.has.ReplaceExpr && arg.has.ReplaceStr error('txt2mat:deprecatedReplaceExpr', ... 'use param name ''ReplaceStr'' only (instead of ''ReplaceExpr'')'); elseif arg.has.ReplaceExpr arg.has.ReplaceStr = arg.has.ReplaceExpr; arg.val.ReplaceStr = arg.val.ReplaceExpr; end % NumHeaderLines must be a nonnegative integer. if arg.has.NumHeaderLines && arg.val.NumHeaderLines < 0 && ... arg.val.NumHeaderLines ~= round(arg.val.NumHeaderLines) error('txt2mat:wrongNumHeaderLines', ... 'NumHeaderLines must be a nonnegative integer.') end % NumColumns must be an integer scalar. if arg.has.NumColumns && ... arg.val.NumColumns ~= round(arg.val.NumColumns) error('txt2mat:wrongColumns', ... 'NumColumns must be integer.') end % change empty format string to default if isempty(arg.val.Format) arg.val.Format = '%f'; end % wrap a single string ReplaceChar into a cell if ischar(arg.val.ReplaceChar) arg.val.ReplaceChar = {arg.val.ReplaceChar}; %warning('txt2mat:ineptReadmode', ... % 'for future versions, please use a cell array of strings for character replacements.') end arg.num.ReplaceChar = numel(arg.val.ReplaceChar); % add number of bad and good line strings arg.num.BadLineString = numel(arg.val.BadLineString); arg.num.GoodLineString = numel(arg.val.GoodLineString); % add numbers of string and regular expression replacements arg.num.ReplaceStr = numel(arg.val.ReplaceStr); arg.num.ReplaceRegExpr = numel(arg.val.ReplaceRegExpr); % check if ReplaceStr is empty or has correct Find+Replace string pairs if arg.has.ReplaceStr && ~isempty(arg.val.ReplaceStr) && ~( ... ( iscellstr(arg.val.ReplaceStr)&&(numel(arg.val.ReplaceStr)==2) ) ... || all(cellfun(@(x)iscellstr(x)&&(numel(x)==2),arg.val.ReplaceStr)) ) error('txt2mat:ReplaceStr', ... 'ReplaceStr must be a cell array of two-element cell arrays of strings.') end % check if ReplaceRegExpr is empty or has correct Find+Replace string pairs if arg.has.ReplaceRegExpr && ~isempty(arg.val.ReplaceRegExpr) && ~( ... (iscellstr(arg.val.ReplaceRegExpr)&&(numel(arg.val.ReplaceRegExpr)==2)) ... || all(cellfun(@(x)iscellstr(x)&&(numel(x)==2),arg.val.ReplaceRegExpr)) ) error('txt2mat:ReplaceRegExpr', ... 'ReplaceRegExpr must be a cell array of two-element cell arrays of strings.') end % force ReadMode to 'line' if NumColumns < 0 if arg.has.NumColumns && arg.val.NumColumns < 0 arg.val.ReadMode = 'line'; if arg.has.ReadMode && ~strcmpi(arg.val.ReadMode,'line') warning('txt2mat:changedReadmode', ... 'ReadMode is changed to ''line'' as NumColumns is negative.') end end % further checks on RowRange if arg.has.RowRange && ~issorted(arg.val.RowRange) && ... any(arg.val.RowRange ~= round(arg.val.RowRange)) && ... arg.val.RowRange(1) < 1 error('txt2mat:wrongRowRange', ... 'RowRange must be a sorted positive integer 2x1 vector.') end % further checks on FilePos if arg.has.FilePos && arg.val.FilePos < 0 && ... any(arg.val.FilePos ~= round(arg.val.FilePos)) error('txt2mat:wrongFilePos', ... 'FilePos must be a nonnegative integer.') end % further checks on SelectLineFun if arg.has.SelectLineFun try SlfTest = arg.val.SelectLineFun((1:8).'); catch err error('txt2mat:errorSelectLineFun', ... 'SelectLineFun error on test data (1:8).''') end if numel(SlfTest)~=8 error('txt2mat:wrongSelectLineFun', ... 'SelectLineFun must preserve length of input (test data: (1:8).'')') elseif ~islogical(SlfTest) warning('txt2mat:SelectLineFunNonLogical', ... ['SelectLineFun output should be logical, but it is ' class(SlfTest) ]) end end % ~~~ additional checks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ end ~ % confirm arg struct validation for repeated usage (future version) arg.isValidated = true; end % add file name if numel(allargin) >= 1 arg.val.FileName = allargin{1}; arg.has.FileName = true; else arg.val.FileName = ''; arg.has.FileName = false; end function lb = detectLineBreakCharacters(ffn) % DETECTLINEBREAKCHARACTERS find out type of line termination of a file % % lb = detectLineBreakCharacters(ffn) % % with % ffn ascii file name % lb line break character(s) as uint8, i.e. % [13 10] (cr+lf) for standard DOS / Windows files % [10] (lf) for Unix files % [13] (cr) for Mac files % % The DOS style values are returned as defaults if no such line breaks are % found. % www.editpadpro.com/tricklinebreak.html : % Line Breaks in Windows, UNIX & Macintosh Text Files % A problem that often bites people working with different platforms, such % as a PC running Windows and a web server running Linux, is the different % character codes used to terminate lines in text files. % % Windows, and DOS before it, uses a pair of CR and LF characters to % terminate lines. UNIX (Including Linux and FreeBSD) uses an LF character % only. The Apple Macintosh, finally, uses a CR character only. In other % words: a complete mess. lfuint = uint8(10); % LineFeed cruint = uint8(13); % CarriageReturn crlfuint = [cruint,lfuint]; lfchar = char(10); crchar = char(13); crlfchar = [crchar,lfchar]; readlen = 16384; % Cycle through file and read until we find line termination characters or % we reach the end of file. % Possible line breaks are: cr+lf (default), lf, cr logfid = fopen(ffn); has_found_lbs = false; while ~has_found_lbs [f8,cntr] = fread(logfid,readlen,'*char'); pos_crlf = strfind(f8',crlfchar); pos_lf = strfind(f8',lfchar); pos_cr = strfind(f8(1:end-1)',crchar); % here we ignored a cr at the end as it might belong to a cr+lf % combination (later we'll step back one byte in the file position to % avoid overlooking such a single cr) num_lbs = [numel(pos_crlf),numel(pos_lf),numel(pos_cr)]; if all(num_lbs==0) fseek(logfid, -1, 0); % step back one byte % if we reached the end of file without finding any special % character, set the endmost line break character and the complete % line break character to DOS values as defaults if cntr < readlen has_found_lbs = true; % just to exit the while loop lb = crlfuint; % complete line break character set end elseif num_lbs(1)>0 has_found_lbs = true; lb = crlfuint; elseif num_lbs(2)>0 has_found_lbs = true; lb = lfuint; elseif num_lbs(3)>0 has_found_lbs = true; lb = cruint; end end fclose(logfid); function [txt, posLb0, cntLbMod, idxSecEndLb, isOk] = ... filterLines(txt, uintLb, startLenSbs, arg) % FILTERLINES loop through sections of txt and remove unwanted lines % % [txt, posLb0, cntLbMod, idxSecEndLb, isOk] = ... % filterLines(txt, uintLb, startLenSbs, arg) % % Inputs: % txt uint8 representation of the original char string % uintLb line break character as uint8 % startLenSbs initial subsection length % arg struct with fields % has.SelectLineFun logical, tells if a selection function exists % val.SelectLineFun function for line selection % val.GoodLineString cell with good line marker strings % val.BadLineString cell with bad line marker strings % % Outputs: % txt uint8 representation of modified char string % posLb0 line break positions in modified txt (starting with 0) % cntLbMod number of lines in modified txt % idxSecEndLb posLb0(idxSecEndLb) are the positions of the line breaks % at the section borders % isOk true: line from input txt is kept, false: line is removed % some abbreviations: goodStr = arg.val.GoodLineString; badStr = arg.val.BadLineString; numGood = numel(goodStr); numBad = numel(badStr); % initializations for the text section loop doRead = true; idxRdLo = 0; % idxRdLo: start index of a full section to be read from % inside original txt (equals last line break position % from previous section) % iTxtLo, iTxtHi: indices of a subsection inside txt % iSecLo, iSecHi: indices of a subsection inside a section idxWrLo = 0; % start index of a modified section to be written to txt cntLbTxt = 0; % counts lines in original txt cntLbMod = 0; % counts lines in modified txt cntSec = 0; % counts sections in original txt cntSecMod= 0; % counts sections in modified txt lenWork = max(2,startLenSbs); % set initial subsection length to at least 2 numTxt = numel(txt); while doRead % loop through text sections cntSec = cntSec + 1; % prepare building a section of txt that contains line breaks workSec = zeros(lenWork,1, 'uint8'); % initialize content of section isLbSec = false(lenWork,1); % will be true at line break positions % loop through subsections until at least one line break is found to % ensure we have complete lines in the current section hasNoLb = true; iSecHi = 0; while hasNoLb iSecLo = iSecHi+1; iSecHi = iSecLo + lenWork - 1; iTxtLo = idxRdLo + iSecLo; [iTxtHi,ci] = min([numTxt,idxRdLo + iSecHi]); workSbs = txt(iTxtLo:iTxtHi); % current subsection isLbSbs = workSbs==uintLb; workSec(iSecLo:iTxtHi-idxRdLo) = workSbs; % add text to section isLbSec(iSecLo:iTxtHi-idxRdLo) = isLbSbs; % add line break t/f doRead = ci > 1; hasNoLb = ~any(isLbSbs) & ci > 1; end lenWork = iTxtHi-idxRdLo; % adapt future length of subsections posLbSec = find(isLbSec); % line break positions in current section posLbSec0 = [0; posLbSec]; % ", prepend zero lenLineSec = diff(posLbSec0); % length of line (in characters) numLbSec = numel(posLbSec); % number of line breaks in current section lenSec = posLbSec(end); % position of last line break workSec = workSec(1:lenSec); % crop section to last line break idxRdLo = idxRdLo + lenSec; % set start index for next iteration % initialize vector holding all line break positions in modified txt, % including a zero at the beginning (posLb0), and a vector indexing the % end-of-section line breaks inside it (idxSecEndLb) if cntSec == 1 posLb0 = zeros(ceil(numLbSec/lenSec*numTxt)+1,1); idxSecEndLb = ones(max(2,ceil(numTxt/lenSec)),1); end % ~~~ line selection by function and good/bad line strings ~~~~~~~~~~~~ % apply the selection function on the current line numbers if arg.has.SelectLineFun % test line numbers to decide which lines (i.e. rows) to keep % we must use the original line numbers from txt here isLineSel = arg.val.SelectLineFun((cntLbTxt+1:cntLbTxt+numLbSec).'); else isLineSel = true(numLbSec,1); % do not remove any lines end % Find lines marked good or bad. % Start with the good line marker strings. if numGood > 0 idcGoodCurr = cell(numGood,1); for k = 1:numGood % find positions of the current marker in the current section: idcGoodCurr{k} = strfind(char(workSec.'),goodStr{k}).'; end % ~~ get the corresponding line break positions... ~~~~~~~~~~~~~~~~ % sort all marker positions found and remove doublets idcGoodAll = unique(cat(1,idcGoodCurr{:})); % see how they sort into the sorted vector of line break positions [~,ix] = sort([idcGoodAll; posLbSec0]); % a line break position that is no longer followed directly by % another line break position but by one or more marker positions % denotes a line containing at least one marker string [~,ix] = sort(ix); isLineGood = diff(ix(numel(idcGoodAll)+1:end))>1; % ~~ ...done. (Is there a faster solution??) ~~~~~~~~~~~~~~~~~~~~~~ else isLineGood = true(numLbSec,1); end % Then do the same for the bad line marker strings. if numBad > 0 idcBadCurr = cell(numBad,1); for k = 1:numBad idcBadCurr{k} = strfind(char(workSec.'),badStr{k}).'; end idcBadAll = unique(cat(1,idcBadCurr{:})); [~,ix] = sort([idcBadAll; posLbSec0]); [~,ix] = sort(ix); isLineNotBad = diff(ix(numel(idcBadAll)+1:end))==1; else isLineNotBad = true(numLbSec,1); end % ~~~ combine selection critera and update txt ~~~~~~~~~~~~~~~~~~~~~~~~ isLineOk = isLineSel & isLineGood & isLineNotBad; if nargout > 4 if cntSec == 1 isOk = false(ceil(numLbSec/lenSec*numTxt)+1,1); end isOk(cntLbTxt+1:cntLbTxt+numLbSec) = isLineOk; end % update line break counter for original txt (to remind the numbers for % the selection function and to have the indices for isOk output) cntLbTxt = cntLbTxt + numLbSec; if all(isLineOk) % no lines of the current section will be removed if cntSec > 1 % write section to new position into txt txt(idxWrLo+1:idxWrLo+lenSec) = workSec; end % collect line break positions of modified txt posLb0(cntLbMod+2:cntLbMod+numLbSec+1) = idxWrLo+posLbSec; % count line breaks in modified txt cntLbMod = cntLbMod + numLbSec; % start index of next txt write idxWrLo = idxWrLo + lenSec; % define a new section inside the modified txt cntSecMod = cntSecMod + 1; % index for the section ends in posLb0 idxSecEndLb(cntSecMod+1) = idxSecEndLb(cntSecMod)+numLbSec; elseif any(isLineOk) % some lines will be removed lenLineOk = lenLineSec(isLineOk); % start and end indices of groups of continguous lines that will remain selL = posLbSec0( [isLineOk(1:end) & ~[false;isLineOk(1:end-1)]; false] ); selR = posLbSec0( [false; isLineOk(1:end) & ~[isLineOk(2:end); false]] ); % update section: workSec = cutvec(workSec,selL+1,selR,true); posLbMod = cumsum(lenLineOk); numLbMod = numel(lenLineOk); % write modified section back into txt txt(idxWrLo+1:idxWrLo+posLbMod(end)) = workSec; % collect line break positions of modified txt posLb0(cntLbMod+2:cntLbMod+numLbMod+1) = idxWrLo+posLbMod; % count line breaks in modified txt cntLbMod = cntLbMod + numLbMod; % start index of next txt write idxWrLo = idxWrLo + posLbMod(end); % define a new section inside the modified txt cntSecMod = cntSecMod + 1; % index for the section ends in posLb0 idxSecEndLb(cntSecMod+1) = idxSecEndLb(cntSecMod)+numLbMod; end end % remove overallocated parts from the outputs: txt = txt(1:idxWrLo); posLb0 = posLb0(1:cntLbMod+1); idxSecEndLb = idxSecEndLb(1:cntSecMod+1); function [idcLb, cntLb, secLbIdc] = ... findLineBreaks(f8, uintLb, lenSection, doFindAll, doCount) % FINDLINEBREAKS find line break indices % % [idcLb, cntLb, secLbIdc, idcBad] = ... % findLineBreaks(f8, uintLb, lenSection, doFindAll, doCount) % % This function cycles through a text by manageable sections and finds line % break characters - either all or just the last one in each section. If % only the last line break in each section is to be found, findLineBreaks % can provide the corresponding consecutive number of this line break in % the text. % % idcLb (nx1)-vector. Zero + some or all line break positions in f8 % cntLb empty or (nx1)-vector. If not all line breaks have to be % found, but doCount is true, this is the number of of each % line break in f8 that is listed in idcLb (with a zero put % in front). Otherwise cntLb is left empty, as cntLb would % just be trivially [0:numel(idcLb)] % secLbIdc idcLb(secLbIdc) are the positions of the last line % break in each section (including the "zero" line break) % % f8 the text as an uint8 (Nx1)-vector % uintLb uint8-scalar representation of the line break character to % be found (10 or 13; could actually be any character). % lenSection character length of a section % doFindAll true: find and index every line break; false: find only the % last one in a section % doCount count number of every line break in cntLb - this is active % only in the non-trivial case when only the last line % break in a section has to be found % $Revision: 4.00 $ lenF8 = numel(f8); idxLo = 1; % init., start index of a section processed in a loop cntLb = []; numSection = ceil(lenF8/lenSection); if doFindAll % ~~~~~~~~~~~~ find all line break positions ~~~~~~~~~~~~~~ % In what follows, the text will repeatedly be processed in consecutive % sections of length to help avoid memory problems. secLbIdc = ones(numSection+1,1); loopCntr = 0; lbCntr = 0; while idxLo <= lenF8 loopCntr = loopCntr + 1; idxHi = min(idxLo - 1 + lenSection,lenF8); % end index of current section % find line breaks in current section isLb = f8(idxLo:idxHi)==uintLb; crPosLb = find(isLb)+idxLo-1; numCrLb = numel(crPosLb); if loopCntr == 1 % preallocate idcLb with estimated number of line breaks idcLb = zeros((1+numSection+1)*(numCrLb+1), 1); end % collect line break indices idcLb(lbCntr+2:lbCntr+numCrLb+1) = crPosLb; secLbIdc(loopCntr+1) = numel(idcLb); idxLo = idxHi + 1; % start index for the following loop lbCntr = lbCntr + numCrLb; end % while idcLb = idcLb(1:lbCntr+1); else % ~~~~~~ find last line break position of each section only ~~~~~~~ % Preallocate maximum space for output variables: if doCount cntLb = zeros(numSection+1,1); end idcLb = zeros(numSection+1,1); lbCntr = 0; % keep in mind how many line breaks have been found, % as some sections might not contain a line break at all % Find line break indices within lenSection distance while idxLo <= lenF8 idxHi = min(idxLo - 1 + lenSection,lenF8); % end index of current section % parse backwards to find the last line break of the section cntr = 0; doKeepOnLooking = true; while doKeepOnLooking hasNotFound = (f8(idxHi-cntr) ~= uintLb); cntr = cntr+1; doKeepOnLooking = hasNotFound && (cntr < lenSection); end if ~hasNotFound lbCntr = lbCntr + 1; % add the line break to the list idcLb(lbCntr+1) = idxHi-cntr+1; % if desired, count all line breaks of the section if doCount cntLb(lbCntr+1)= cntLb(lbCntr) + sum(f8(idxLo:idxHi)==uintLb); %#ok end end idxLo = idxHi + 1; end % while % if too much space was preallocated, shorten the outputs: if lbCntr= num_lbfull % replace endmost space(s) by a line break: f8(end-num_lbfull+(1:num_lbfull)) = lbfull; else % append a final line break: f8(end+(1:num_lbfull)) = lbfull; end is_ws_at_end = false; end end % while function [f8, l, pLb, isAtEnd] = getLines(fid,minLines,varargin) % getLines get a set of consecutive lines from file % % [hl,numLb,posLb,isAtEnd] = getLines(fid, minLines, minChars, offset, ... % origin, inclWsAtEnd, lbfull, lenSection) % % fid file identifier % minLines minimum number of lines to retrieve % minChars minimum number of characters to retrieve % (optional, default 0) % offset file position to start at, relative to origin % (optional, default 0) % origin (optional) a string whose legal values are % 'bof' Beginning of file % 'cof' Current position in file (default) % 'eof' End of file % inclWsAtEnd include a trailing line in the file that consists of white- % space only and that is not terminated by a line break % (optional, default false) % lbfull line termination character(s) as uint8 % (optional, default [13,10]) % lenSection (internal) length of a processed section % (optional, default 65536) % % hl the lines from the file as uint8 vector - each line is % terminated by a line break, even a final line that was not % terminated in the file % numLb number of lines, i.e. number of line breaks in hl % posLb line break positions in hl, including an added leading zero % isAtEnd true if the end of hl corresponds to the end of file % $Revision: 2.11 $ % set defaults for opt. inputs: % minChars, offset,origin,inclWsAtEnd,lbfull,lenSection optargs = {0, 0, 'cof', false, uint8([13 10]), 65536}; % skip any new inputs if they are empty isEmptyArg = cellfun('isempty', varargin); % overwrite defaults with non-emty arguments in varargin optargs(~isEmptyArg) = varargin(~isEmptyArg); % assign to variables [minChars, offset, origin, inclWsAtEnd, lbfull, lenSection] = optargs{:}; % move to requested file position if offset ~= 0 || ~strcmpi(origin,'cof') fseek(fid,offset,origin); end % get number of the file's remaining bytes from current position: bytePos = ftell(fid); fseek(fid, 0, 'eof'); byteEnd = ftell(fid); fseek(fid, bytePos, 'bof'); numByte = byteEnd-bytePos; spuint = uint8(32); % Space (= ascii whitespace limit) as uint8 lbuint = lbfull(end); lenLb = numel(lbfull); [f8c,nF8c] = fread(fid,lenSection,'*uint8'); % current text section isIn = nF8c < numByte; % not at the end of text? pLbc = find(f8c==lbuint); % current line break positions nLbc = numel(pLbc); % number of line breaks so far gLb = max([0;pLbc]); % greatest line break position % continue reading if not enough line breaks or characters have been read % and if we're not at the end of the file doRead = ((nLbc < minLines) || gLb < minChars) && isIn; if doRead % estimate output sizes and preallocate f8 = zeros(min( max(ceil(nF8c/nLbc*minLines), minChars), ... numByte),1,'uint8'); pLb = zeros(max(1+minLines+ceil(nLbc/nF8c), ... ceil(nLbc/nF8c*minChars) ), 1); end % start to write to outputs f8(1:nF8c,1) = f8c; pLb(1:nLbc+1,1) = [0;pLbc]; f = nF8c; % counts number of characters read from fid l = nLbc; % counts number of line breaks while doRead % continue to read [f8c,nF8c] = fread(fid,lenSection,'*uint8'); % current text section pLbc = find(f8c==lbuint); % position of current line breaks nLbc = numel(pLbc); % number of current line breaks % continue to write to outputs f8(f+1:f+nF8c) = f8c; pLb(l+2:l+nLbc+1,1) = f + pLbc; % prepare reading next section if nLbc > 0 % if there are new line breaks... gLb = pLb(l+nLbc+1); % ...update greatest line break position end f = f + nF8c; l = l + nLbc; isIn = f < numByte; % end of text? doRead = ((l < minLines) || (gLb < minChars) ) && isIn; end if ((l < minLines) || (gLb < minChars)) && f > 0 && ... f8(f) ~= lbuint && ... ( inclWsAtEnd || any(f8(pLb(l+1)+1:f) > spuint ) ) % We're at the end of file but have not yet enough lines and/or chars. % Add a line break to the last line if it has none and if it has % at least some non-white-space characters (i.e. ignore a final line % without a line break that contains only white-space-chars) f8(f+1:f+lenLb) = lbfull; f = f+lenLb; l = l + 1; pLb(l+1,1) = f; end if l < minLines || pLb(1+l) < minChars % having read up to the end of the file, delete overallocated parts of % pLb and f8 pLb = pLb(1:l+1); f8 = f8(1:pLb(l+1)); isAtEnd = true; else % first line break index satisfying minLines and minChars k = minLines + find( pLb(1+minLines:end) >= minChars, 1, 'first'); % select desired parts and correct file position f8 = f8(1:pLb(k)); pLb = pLb(1:k); l = k-1; fseek(fid,pLb(k)-f,'cof'); isAtEnd = pLb(k) >= numByte; end