Showing posts with label AWK. Show all posts
Showing posts with label AWK. Show all posts

An Awk Primer


An Awk Primer

From vectorsite.net
The Awk text-processing programming language is a useful and simple tool for manipulating text. This document provides a quick tutorial for Awk. The Awk syntax used in this document corresponds to that used on UN*X. It may vary slightly on other platforms.
The Awk text-processing language is useful for such tasks as:
  • Tallying information from text files and creating reports from the results.
  • Adding additional functions to text editors like "vi".
  • Translating files from one format to another.
  • Creating small databases.
  • Performing mathematical operations on files of numeric data.
Awk has two faces: it is a utility for performing simple text-processing tasks, and it is a programming language for performing complex text-processing tasks.
The two faces are really the same, however. Awk uses the same mechanisms for handling any text-processing task, but these mechanisms are flexible enough to allow useful Awk programs to be entered on the command line, or to implement complicated programs containing dozens of lines of Awk statements.
Awk statements comprise a programming language. In fact, Awk is useful for simple, quick-and-dirty computational programming. Anybody who can write a BASIC program can use Awk, although Awk's syntax is different from that of BASIC. Anybody who can write a C program can use Awk with little difficulty, and those who would like to learn C may find Awk a useful stepping stone, with the caution that Awk and C have significant differences beyond their many similarities.
There are, however, things that Awk is not. It is not really well suited for extremely large, complicated tasks. It is also an "interpreted" language -- that is, an Awk program cannot run on its own, it must be executed by the Awk utility itself. That means that it is relatively slow, though it is efficient as interpretive languages go, and that the program can only be used on systems that have Awk. There are translators available that can convert Awk programs into C code for compilation as stand-alone programs, but such translators have to be purchased separately.
One last item before proceeding: What does the name "Awk" mean? Awk actually stands for the names of its authors: "Aho, Weinberger, & Kernighan". Kernighan later noted: "Naming a language after its authors ... shows a certain poverty of imagination." The name is reminiscent of that of an oceanic bird known as an "auk", and so the picture of an auk often shows up on the cover of books on Awk.

Awk by example, Part 1 An intro to the great language with the strange name By Daniel Robbins


Awk by example, Part 1

An intro to the great language with the strange name
By Daniel Robbins
Awk is a very nice language with a very strange name. In this first article of a three-part series, Daniel Robbins will quickly get your awk programming skills up to speed. As the series progresses, more advanced topics will be covered, culminating with an advanced real-world awk application demo.
In this series of articles, I'm going to turn you into a proficient awk coder. I'll admit, awk doesn't have a very pretty or particularly "hip" name, and the GNU version of awk, called gawk, sounds downright weird. Those unfamiliar with the language may hear "awk" and think of a mess of code so backwards and antiquated that it's capable of driving even the most knowledgeable UNIX guru to the brink of insanity (causing him to repeatedly yelp "kill -9!" as he runs for coffee machine).
Sure, awk doesn't have a great name. But it is a great language. Awk is geared toward text processing and report generation, yet features many well-designed features that allow for serious programming. And, unlike some languages, awk's syntax is familiar, and borrows some of the best parts of languages like C, python, and bash (although, technically, awk was created before both python and bash). Awk is one of those languages that, once learned, will become a key part of your strategic coding arsenal.

Awk by example, Part 2 By Daniel Robbins


Awk by example, Part 2

By Daniel Robbins
In this sequel to his previous intro to awk, Daniel Robbins continues to explore awk, a great language with a strange name. Daniel will show you how to handle multi-line records, use looping constructs, and create and use awk arrays. By the end of this article, you'll be well versed in a wide range of awk features, and you'll be ready to write your own powerful awk scripts.
Multi-line records
Awk is an excellent tool for reading in and processing structured data, such as the system's /etc/passwd file. /etc/passwd is the UNIX user database, and is a colon-delimited text file, containing a lot of important information, including all existing user accounts and user IDs, among other things. In my previous article, I showed you how awk could easily parse this file. All we had to do was to set the FS (field separator) variable to ":".
By setting the FS variable correctly, awk can be configured to parse almost any kind of structured data, as long as there is one record per line. However, just setting FS won't do us any good if we want to parse a record that exists over multiple lines. In these situations, we also need to modify the RS record separator variable. The RS variable tells awk when the current record ends and a new record begins.

Awk by example, Part 3 String functions and ... checkbooks By Daniel Robbins


Awk by example, Part 3

String functions and ... checkbooks
By Daniel Robbins
In this conclusion to the awk series, Daniel introduces you to awk's important string functions, and then shows you how to write a complete checkbook-balancing program from scratch. Along the way, you'll learn how to write your own functions and use awk's multidimensional arrays. By the end of this article, you'll have even more awk experience, allowing you to create more powerful scripts.
Formatting output
While awk's print statement does do the job most of the time, sometimes more is needed. For those times, awk offers two good old friends called printf() and sprintf(). Yes, these functions, like so many other awk parts, are identical to their C counterparts. printf() will print a formatted string to stdout, while sprintf() returns a formatted string that can be assigned to a variable. If you're not familiar with printf() and sprintf(), an introductory C text will quickly get you up to speed on these two essential printing functions. You can view the printf() man page by typing "man 3 printf" on your Linux system.

Gawk By Paul Rubin and Jay Fenlason


Gawk

By Paul Rubin and Jay Fenlason
Gawk is the GNU Project's implementation of the AWK programming language. It conforms to the definition of the language in the POSIX 1003.2 Command Language And Utilities Standard. This version in turn is based on the description in The AWK Programming Language, by Aho, Kernighan, and Weinberger, with the additional features found in the System V Release 4 version of UNIX awk. Gawk also provides more recent Bell Laboratories awk extensions, and a number of GNU-specific extensions.
Pgawk is the profiling version of gawk. It is identical in every way to gawk, except that programs run more slowly, and it automatically produces an execution profile in the file awkprof.out when done. See the --profile option, below.
The command line consists of options to gawk itself, the AWK program text (if not supplied via the -f or --file options), and values to be made available in the ARGC and ARGV pre-defined AWK variables.

Effective AWK Programming A User's Guide for GNU Awk By Arnold D. Robbins


Effective AWK Programming

A User's Guide for GNU Awk
By Arnold D. Robbins
The name awk comes from the initials of its designers: Alfred V. Aho, Peter J. Weinberger, and Brian W. Kernighan. The original version of awk was written in 1977 at AT&T Bell Laboratories. In 1985 a new version made the programming language more powerful, introducing user-defined functions, multiple input streams, and computed regular expressions. This new version became generally available with Unix System V Release 3.1. The version in System V Release 4 added some new features and also cleaned up the behavior in some of the "dark corners" of the language. The specification for awk in the POSIX Command Language and Utilities standard further clarified the language based on feedback from both the gawk designers, and the original Bell Labs awk designers.
The GNU implementation, gawk, was written in 1986 by Paul Rubin and Jay Fenlason, with advice from Richard Stallman. John Woods contributed parts of the code as well. In 1988 and 1989, David Trueman, with help from Arnold Robbins, thoroughly reworked gawk for compatibility with the newer awk. Current development focuses on bug fixes, performance improvements, standards compliance, and occasionally, new features.

Getting started with AWK


Getting started with awk

HMC Computer Science Department
This qref is written for a semi-knowledgable UNIX user who has just come up against a problem and has been advised to use awk to solve it. Perhaps one of the examples can be quickly modified for immediate use.
awk reads from a file or from its standard input, and outputs to its standard output. You will generally want to redirect that into a file, but that is not done in these examples just because it takes up space. awk does not get along with non-text files, like executables and FrameMaker files. If you need to edit those, use a binary editor like hexl-mode in emacs.
The most frustrating thing about trying to learn awk is getting your program past the shell's parser. The proper way is to use single quotes around the program, like so:
>awk '{print $0}' filename
The single quotes protect almost everything from the shell. In csh or tcsh, you still have to watch out for exclamation marks, but other than that, you're safe.
The second most frustrating thing about trying to learn awk is the lovely error messages:
awk '{print $0,}' filename
awk: syntax error near line 1
awk: illegal statement near line 1
gawk generally has better error messages. At least it tells you where in the line something went wrong:
gawk '{print $0,}' filename
gawk: cmd. line:1: {print $0,}
gawk: cmd. line:1: ^ parse error
So, if you're having problems getting awk syntax correct, switch to gawk for a while.

The GNU Awk User's Guide By Arnold Robbins


The GNU Awk User's Guide

By Arnold Robbins
The name awk comes from the initials of its designers: Alfred V. Aho, Peter J. Weinberger and Brian W. Kernighan. The original version of awk was written in 1977 at AT&T Bell Laboratories. In 1985, a new version made the programming language more powerful, introducing user-defined functions, multiple input streams, and computed regular expressions. This new version became widely available with Unix System V Release 3.1 (SVR3.1). The version in SVR4 added some new features and cleaned up the behavior in some of the “dark corners” of the language. The specification for awk in the POSIX Command Language and Utilities standard further clarified the language. Both the gawk designers and the original Bell Laboratories awk designers provided feedback for the POSIX specification.
Paul Rubin wrote the GNU implementation, gawk, in 1986. Jay Fenlason completed it, with advice from Richard Stallman. John Woods contributed parts of the code as well. In 1988 and 1989, David Trueman, with help from me, thoroughly reworked gawk for compatibility with the newer awk. Circa 1995, I became the primary maintainer. Current development focuses on bug fixes, performance improvements, standards compliance, and occasionally, new features.
In May of 1997, Jürgen Kahrs felt the need for network access from awk, and with a little help from me, set about adding features to do this for gawk. At that time, he also wrote the bulk of TCP/IP Internetworking with gawk (a separate document, available as part of the gawk distribution). His code finally became part of the main gawk distribution with gawk version 3.1.