URL.biz - where people find experts

 
HOME ARTICLES LIST NOW FOR FREE! ABOUT US CONTACT US LOG IN

Unix Fundamentals Module #4 Data Manipulation
4th in series of Unix Fundamentals Modules from JayrConsulting Ltd. A FREE copy of this module in MsWord format (including missing diagrams) can be got from john.roberts@jayrconsulting.co.uk

Go to Web Site

UNIX Fundamentals

MODULE 4 – Data Manipulation



You will cover

* File redirection

* Use of Pipes and tee

* Sorting Data

þ Extracting Data Fields

þ Extracting Lines of Data

þ Searching for Data

þ Quoting Characters



























































MODULE 4 – CONTENTS

4 DATA MANIPULATION 4

4.1 FILE REDIRECTION 4

4.2 PIPES 6

4.2.1 USING TEE IN PIPES 6

4.3 SORTING DATA 7

4.4 EXTRACTING DATA FIELDS 8

4.5 EXTRACTING LINES OF DATA 9

4.5.1 USING THE TAIL COMMAND 9

4.5.2 USING THE HEAD COMMAND 9

4.5.3 USING WC TO COUNT LINES 9

4.5.4 USING TR TO CHANGE DATA 10

4.6 SEARCHING FOR DATA WITH GREP 11

4.6.1 REGULAR EXPRESSIONS 12

4.7 QUOTING CHARACTERS 13

4.8 DISPLAYING FILES 13

4.8.1 DISPLAYING LARGE FILES 13

4.8.2 DISPLAYING NON TEXT FILES 14























4 Data Manipulation



4.1 File Redirection



The shell is always aware of three files. These are called the standard input, the standard output and standard error. They have internal file descriptors 0, 1 and 2 respectively. By default, all three files are associated with the user's terminal session. This means that commands will be read from the keyboard and any output or error messages will be displayed on the screen. All three file descriptors may be redirected to any named file, as follows.



command > filea redirect output to filea. If the file does not exist, it will be created. If the file does exist it will be overwritten.



ls –l > list create a list of files in current directory and write it to a file called list



cat filea > fileb write contents of filea to fileb



cat > filea write ‘text' to filea

text

<^D>



command>>filea redirect output and append to filea. If the file does not exist it will be created. If the file does exist, the output will be appended to it.



command


mail fred < letter_file mail contents of file to user fred











<


vedit filename <
:g/^$/d

:wq!

stop



Note that terminating marker word must be at the beginning of a new line. This script would edit the designated file, delete any blank lines, writes the file away and then return you to the command prompt.



2> filename redirects any error messages to the named file



find / -user steve –print >myfiles 2> /dev/null



List of files produced by the ‘find' command would be written to the file ‘myfiles'. Any error messages produced would be written to the system file ‘/dev/null', which is the UNIX ‘rubbish bin'.



2>&1 redirects error messages to the same file as the standard ouput has been previously directed to.







4.2 Pipes



On UNIX systems, the output of one command can be supplied as the input of the next command by using a ‘Pipe' between them. The symbol for this is ‘|' .



ls –l | pg

would produce a listing of the current directory and then pass it to the ‘pg' command so it can be paged to the screen.

Several commands can be strung together using pipes to produce a complex set of actions against a set of data.



4.2.1 Using tee in Pipes



Normally each command in a ‘pipe line' passes it's output to the next command until the end of the ‘pipeline is reached. However it is possible to divert a copy of the intermediate output from any stage of a pipe by using the tee command.



ls –l|tee testfile|lp



would produce a long listing of the current directory, send a copy of it to the file ‘testfile' and then produce a print job of the same listing.



















4.3 Sorting Data



By default, the UNIX ‘sort' command performs a left justified, ascending, alphanumeric, whole line sort on any data it is presented with. The data may be in files or may be delivered to the ‘sort' command via a ‘pipe'



sort list.file would sort the contents of list.file and display the results to the screen. The original file would remain unaltered.



sort –o list.file list.file



would sort the contents of list.file and write the result back to the original file.



who|sort > wholist would produce a list of logged on users, sort it and then send the result to ‘wholist'



Options are available allowing normal or reverse order sorting of alpha or numeric fields. The fields specified may be variable length separated by a field delimiter character. In the following example, a numeric sort on field three of the file ‘/etc/passwd' is specified, with the field separator specified as being a colon (:).



sort –t: -n +2 –3 /etc/passwd

















4.4 Extracting Data Fields



Individual fields can be extracted from lines of data by using the ‘cut' command. The following example would cut fields 1 and 3 from the system file /etc/passwd. Note that the field separator (:) must also be specified.



cut –f1,3 –d: /etc/passwd



The ‘cut' utility will also work with fixed length fields with no field separator. In this case, the column numbers are given. The example shown would extract columns 1 to 6 and 20 to 30 from a ‘who –u' listing.



who –u|cut –c1-6,20-30





4.5 Extracting Lines of Data



4.5.1 Using the tail command



The tail command extracts the last few lines of data from a file or piped input (default 10). Other values can be specified for number of lines to extract.



tail /etc/passwd produces last 10 lines of /etc/passwd



tail –15 /etc/passwd produces last 15 lines of /etc/passwd



Using a ‘+' sign with tail produces a different result as follows



ps –t console|tail +2 produces the output from the listing starting from the 2nd line onwards !

(this can be useful to remove headings from listings !)



4.5.2 Using the head command



The head command displays the first few lines of a file.(default 10 )



head –3 /etc/passwd shows the first 3 lines of /etc/passwd



4.5.3 Using wc to count lines



This is a ‘word count' command, which has options to count characters, words or lines in a file. The default is ALL options!



wc –l /etc/passwd displays how many entries in /etc/passwd

who|wc –l displays count of logged on users



4.5.4 Using tr to change data



The ‘tr' command is used to translate characters. For example, the following would convert all of the output display from lowercase to uppercase.



who –u| tr ‘[a-z]' ‘[A-Z]'

or

who –u| tr ‘a-z' ‘A-Z'



When used with the ‘-s' option, tr reduces all occurrences of the same character to one occurrence only.



who –u|tr –s ‘ ‘



Would display the output with only one space between each data field. This would mean that commands such as sort and cut could then utilise the space as a field delimiter.























4.6 Searching for Data with grep



The grep command globally searches the named file(s) for the expression given as the first argument, and by default, prints all matching lines on the standard output.

Commonly used options include:- ‘-v' (print NON matching lines only), ‘-c' (count lines only), ‘-n' (print line numbers)



grep “:0:" /etc/passwd select all users with a ‘0' user id or a ‘0' group id from /etc/passwd



grep “^b" /etc/passwd select all lines that start with 'b' from /etc/passwd



ps –ef|grep –v ‘root' select all lines from the ‘ps' listing that DO NOT contain the word ‘root'



grep “^[^ab]" test select all lines from the file ‘test' that DO NOT start with ‘a' or ‘b'





Note: Wildcards used with ‘grep' expressions do NOT work the same as wildcards used with file names. A single character match is given by ‘.' and the multiple character match ‘*' has a different effect. With grep it means “match ZERO or more of the preceding character













4.6.1 Regular Expressions



There are several methods of manipulating data within UNIX shells, including commands like ‘grep' and ‘sed' . They make use of a pattern matching facility known as ‘regular expressions' (re).



The following are some of the characters that have special meaning when used within a regular expression for pattern matching.



Character Meaning

. period, matches ANY single character

* matches ZERO OR MORE of the preceding character

^ following pattern must be at start of line

$ preceding pattern must be at end of line

[char] match character(or range) shown in brackets

[^char] do NOT match characters shown in brackets



The following are some examples of the use of regular expressions. As always, more information can be found in the UNIX ‘man' pages.



grep csh$ /etc/passwd|cut –d: -f1



Would produce a list of user names that have the expression ‘csh' at the end of their account definition in /etc/passwd. Note the ‘$'. The backslash is to escape the ‘$' from the shell.



ls /etc | grep ‘^[dp]'



Would select all files from /etc which start with either a ‘d' or a ‘p', followed by zero or more of any other character. (e.g. would find a file called ‘d' and also a file called ‘disktidy'.



ls /etc | grep ‘^[^dp]....'



Would select all files from /etc which DO NOT start with ‘d' or ‘p' and have a total of five characters in their name.



4.7 Quoting Characters



Quotes can be used to mask certain characters, so that they are accepted literally by the shell.

Single quotes ‘…..' mask ALL characters within them



Double quotes “…" mask most characters but allow expansion of special characters, such as ‘$'



A backslash ' ‘ masks any following SINGLE character.



4.8 Displaying Files



Files can be displayed to the screen by using the ‘cat' command. However, there may be some problems encountered when using this method. One is that the file may be very large and therefore scrolls off the screen, the other is that the file may contain unprintable characters that cause strange effects on the screen.



4.8.1 Displaying large files



Large files can be displayed by using either of the following commands .



more filename Will page the file to the screen. At the pause prompt there are several options that can be taken. These can be found by entering ‘h' at the pause prompt. To quit the listing enter ‘q' at the prompt.



pg filename Similar action to the more command.



ps –ef|pg Will page the ouput of the ‘ps' command



4.8.2 Displaying non text files



Sometimes, when attempting to display a file, there may be problems caused by unprintable characters within the file. To prevent this you should first check to see what the file contains. This can be done with the ‘file' command as follows.



file filename will interpret what the contents of the file are and display the results on the screen.





ctatrnr@dopey> file *

bin: directory

change_local_user.shl: awk program text

copy.out: ascii text

create.shl: commands text

dcfg.datsql: ascii text

log: directory

ctatrnr@dopey>





Any file, as shown above, that DOES NOT have the word ‘text' in it's description, should NOT be displayed with ‘cat', ‘more' or ‘pg' .



These files can be displayed using the Octal Dump command ‘od', as follows.



od –cx filename will display the file in octal dump format converting whatever characters it can into ascii and hexadecimal characters.




 
Other Articles Written By This User


Copyright © 2003 - 2010 URL.biz. All rights reserved.