stuff to click

read news
linux & me
* shell scripting *
unix shells
what is kill-9?
got geek?
hi morgen!

tools used

proud to be WYSIWYG free!

Chris's lack of artistic ability has been cleverly disguised by the GIMP


Apache, jump on it!

Powered by openBSD

getnews.sh: functions in a nutshell and the magic of sed

i like to read the news as a part of my morning routine, along with checking voice mail, instant messages and my email. i was using the javascript feeds from moreover and that worked out OK. i had to fill out a form and they gave me a script based on the questions i answered. that was fine, until i revamped the site. the java script code no longer fit, so i had to either find a new provider or find a way to get the headlines myself. that was when i discovered a unix command called wget. wget would go pick up the HTML headline files, and i could display them. this was better than the java script, but not quite what i had in mind. i moved the whole site to CSS so i didn't have to mess with font tags, and the moreover headlines were just full of font tags that rendered my CSS useless. that was when i remembered sed, and our story begins:


# set variables here:
# (use short names so it's easy to read)

# pickup a file and then wait a minute
# lets the files get written to disk
# and is courteus to your content provider
wget -q -r -l1 -O $fp/foo.fetch http://www.news.com/news?foo ;
sleep 60 ;
wget -q -r -l1 -O $fp/bar.fetch http://www.news.com/news?bar ;
sleep 60 ;

# send the fetched files to the parsenews function
parsenews $fp/foo.fetch $fp/foo.sht
parsenews $fp/bar.fetch $fp/bar.sht

# send the news SSI's to the parsedigest function
parsedigest $fp/foo.sht $fp/foo_digest.sht
parsedigest $fp/bar.sht $fp/bar_digest.sht

# calls a sed script to make
# SSI files for news pages
$sp/parsenews.sed $1 > $2

# calls a sed script to make
# SSI files for news digest page
$sp/parsedigest.sed $1 > $2

fetchnews ;
# pick up files first
# comment out if testing parsing functions 
for c in 1 2 3
# test a file to be sure it's not empty
	if test -s $fp/foo.fetch   
	# if not empty, then do the parsing
		newsfiles ;
		digestfiles ;
	# if empty, redo the fetch and start again
	echo "attempt $c failed"
main # calls the main function, starts the script

this script does three different things. first it picks up files from a website using wget. wget is a gnu utility that pulls files via HTTP or FTP and writes them locally. this file writing causes I/O problems which i will discuss in a minute. it then calls a sed script to strip out unnecessary tags and writes the results to server side include files. those include files are then passed thru a different sed script and lines are cut out to make includes for a digest page. you will need to make sed files to do most of the work. sed scripts are just like shell scripts in that they are text files that are made executable and do something when called. sed is a powerful program for text manpulation and is one of the forerunners to perl. one of the links below can help you figure it out much better than i ever could:

with all the stuff that this little script does, it almost looks like a full blown program. it is quickly turning into something i should use perl for. i would like to stop using wget to write files to disk, since you need to put sleeps in the script to wait for the files to download and be written completely. using flat files as a means of communication is uncool. my next step is to figure out how to use PHP/PERL to pull the content right off the wire and send it to sed. for whatever reason lynx -source doesn't work.

i want to do this whole thing with XML and XSL. i have a working xslt that changes the headline provided in XML to HTML, and i have a command line XSLT processor (sablotron) all picked out (command line == shell script), but the one ported to openBSD isn't fully implimented (it borks my hyperlinks, not good) and the ones that are fully implimeted (xalan) are not ported to openBSD. plus i have no authority to install on this server, so shell scripts it will have to be.

webbak, my first shell script
webbak 2, the sequel
getnews, your news authority
su.bat, superuser for NT
sed tutorial, very informative
    home     blog     read news     linux & me     * shell scripting *     unix shells   what is kill-9?     BOFH     got geek?     l33t