inicio
mail me!
sindicaci;ón
 
N
E
W
B
I
E
 
G
A
M
E
 
P
R
O
G
R
A
M
M
E
R
S
 

Archive for Tutorial

Using Perl and Regular Expressions to Process Html Files - Part 2

Author: John Dixon

In this article we will discuss how to change the contents of an HTML file by running a Perl script on it.

The file we are going to process is called file1.htm:

Note: To ensure that the code is displayed correctly, in the example code shown in this article, square brackets ‘[..]’ are used in HTML tags instead of angle brackets ”.

[html]
[head][title]Sample HTML File[/title]
[link rel=”stylesheet” type=”text/css” href=”style.css”]
[/head]
[body]
[h1]Introduction[/h1]
[p]Welcome to the world of Perl and regular expressions[/p]
[h2]Programming Languages[/h2]
[table border=”1″ width=”400″]
[tr][th colspan=”2″]Programming Languages[/th][/tr]
[tr][td]Language[/td][td]Typical use[/td][/tr]
[tr][td]JavaScript[/td][td]Client-side scripts[/td][/tr]
[tr][td]Perl[/td][td]Processing HTML files[/td][/tr]
[tr][td]PHP[/td][td]Server-side scripts[/td][/tr]
[/table]
[h1]Summary[/h1]
[p]JavaScript, Perl, and PHP are all interpreted programming languages.[/p]
[/body]
[/html]

Imagine that we need to change both occurrences of [h1]heading[/h1] to [h1 class=”big”]heading[/h1]. Not a big change and something that could be easily done manually or by doing a simple search and replace. But we’re just getting started here.

To do this, we could use the following Perl script (script1.pl):

1 open (IN, “file1.htm”);
2 open (OUT, “>new_file1.htm”);
3 while ($line = [IN]) {
4 $line =~ s/[h1]/[h1 class=”big”]/;
5 (print OUT $line);
6 }
7 close (IN);
8 close (OUT);

Note: You don’t need to enter the line numbers. I’ve included them simply so that I can reference individual lines in the script.

Let’s look at each line of the script.

Line 1
In this line file1.htm is opened so that it can be processed by the script. In order to process the file, Perl uses something called a filehandle, which provides a kind of link between the script and the operating system, containing information about the file that is being processed. I’ve called this “opening” filehandle ‘IN’, but I could have used anything within reason. Filehandles are normally in capitals.

Line 2
This line creates a new file called ‘new_file1.htm’, which is written to by using another filehandle, OUT. The ‘>’ just before the filename indicates that the file will be written to.

Line 3
This line sets up a loop in which each line in file1.htm will be examined individually.

Line 4
This is the regular expression. It searches for one occurrence of [h1] on each line of file1.htm and, if it finds it, changes it to [h1 class=”big”].

Looking at Line 4 in more detail:

  • $line - This is a variable that contains a line of text. It gets modified if the substitution is successful.
  • =~ is called the comparison operator.
  • s is the substitution operator.
  • [h1] is what needs to be substituted (replaced).
  • [h1 class=”big”] is what [h1] has to be changed to.

Line 5
This line takes the contents of the $line variable and, via the OUT file handle, writes the line to new_file1.htm.

Line 6
This line closes the ‘while’ loop. The loop is repeated until all the lines in file1.htm have been examined.

Lines 7 and 8
These two lines close the two file handles that have been used in the script. If you missed off these two lines the script would still work, but it’s good programming practice to close file handles, thus freeing up the file handle names so they can be used, for example, by another file.

Running the Script

As the purpose of this article is to explain how to use regular expressions to process HTML files, and not necessarily how to use Perl, I don’t want to spend too long describing how to run Perl scripts. Suffice to say that you can run them in various ways, for example, from within a text editor such as TextPad, by double-clicking the perl script (script1.pl), or by running the script from an MS-DOS window.

(The location of the Perl interpreter will need to be in your PATH statement so that you can run Perl scripts from any location on your computer and not just from within the directory where the interpreter (perl.exe) itself is installed.)

So, to run our script we could open an MS-DOS window and navigate to the location where the script and the HTML file are located. To keep life simple I’ve assumed that these two files are in the same folder (or directory). The command to run the script is:

C:>perl script1.pl

If the script does work (and hopefully it will), a new file (new_file1.htm) is created in the same folder as file1.htm. If you open the file you’ll see the the two lines that contained [h1] tags have been modified so that they now read [h1 class=”big”].

In Part 3 we’ll look at how to handle multiple files.

About the Author:
John is a web developer working for My Health Questions Matter, a company dedicated to helping patients to get the most out of their interaction with health care professionals such as doctors, midwives, and consultants by generating a set of health questions a patient can ask at an appointment.

Using Perl and Regular Expressions to Process Html Files - Part 1

Author: John Dixon

Like many web content authors, over the past few years I’ve had many occasions when I’ve needed to clean up a bunch of HTML files that have been generated by a word processor or publishing package. Initially, I used to clean up the files manually, opening each one in turn, and making the same set of updates to each one. This works fine when you only have a few files to fix, but when you have hundreds or even thousands to do, you can very quickly be looking at weeks or even months of work. A few years ago someone put me on to the idea of using Perl and regular expressions to perform this ‘cleaning up’ process.

Why write an article about Perl and regular expressions I hear you say. Well, that’s a good point. After all the web is full of tutorials on Perl and regular expressions. What I found though, was that when I was trying to find out how I could process HTML files, I found it difficult to find tutorials that met my criteria. I’m not saying they don’t exist, I just couldn’t find them. Sure, I could find tutorials that explained everything I needed to know about regular expressions, and I could find plenty of tutorials about how to program in Perl, and even how to use regular expressions within Perl scripts. What I couldn’t find though, was a tutorial that explained how to open one or more HTML or text files, make updates to those files using regular expressions, and then save and close the files.

The Goal

When converting documents into HTML the goal is always to achieve a seamless conversion from the source document (for example, a word processor document) to HTML. The last thing you need is for your content authors to be spending hours, or even days, fixing untidy HTML code after it has been converted.

Many applications offer excellent tools for converting documents to HTML and, in combination with a well designed cascading style sheet (CSS), can often produce perfect results. Sometimes though, there are little bits of HTML code that are a bit messy, normally caused by authors not applying paragraph tags or styles correctly in the source document.

Why Perl?

The reason why Perl is such a good language to use for this task is because it is excellent at processing text files, which let’s face it, is all HTML files are. Perl is also the de facto standard for the use of regular expressions, which you can use to search for, and replace/change, bits of text or code in a file.

What is Perl?

Perl (Practical Extraction and Report Language) is a general purpose programming language, which means it can be used to do anything that any other programming language can do. Having said that, Perl is very good at doing certain things, and not so good at others. Although you could do it, you wouldn’t normally develop a user interface in Perl as it would be much easier to use a language like Visual Basic to do this. What Perl is really good at, is processing text. This makes it a great choice for manipulating HTML files.

What is a Regular Expression?

A regular expression is a string that describes or matches a set of strings, according to certain syntax rules. Regular expressions are not unique to Perl - many languages, including JavaScript and PHP can use them - but Perl handles them better than any other language.

In part 2, we’ll look at our first example Perl script.

About the Author:
John is a web developer working for My Health Questions Matter, a company dedicated to helping patients to get the most out of their interaction with health care professionals such as doctors, midwives, and consultants by generating a set of health questions a patient can ask at an appointment.

Visual Basic and XNA Game Development

I found this great resource after reading some comments on my last article Beginning Game Development with MS .NET.

I must say I like the layout of these tutorials better, it not only shows you the code, but there is examples with Screen shots, so the walk thru to install and start running your first game with VB and XNA seems like a simple couple of steps.

I will delve further into these tutorials later, but for now, enjoy this great find.

2D Tutorials

 

  • Tutorial 1 - Install XNA for use with VB.NET
  • Tutorial 2 - Create the XNA 3D Device
  • Tutorial 3 - Display a 2D Texture
  • Tutorial 4 - Create Game Area
  • Tutorial 5 - Creating a Rotating 2D Texure
  • Tutorial 6 - Code Cleanup - Round 1
  • Tutorial 7 - Ball Class and Ball Movement
  • Tutorial 8 - Drawing Multiple Balls
  • Tutorial 9 - Installing XNA GSE Beta 2
  • Tutorial 10 - Defining Ball Collisions
  • Tutorial 11 - Ball Positioning
  • Tutorial 12 - Code Cleanup - Round 2
  • Tutorial 13 - Deleting Balls After a Collision
  • Tutorial 14 - Dropping Balls After a Collision
  • Tutorial 15 - The Particle System
  • Tutorial 16 - The GameState
  • Tutorial 17 - Code Cleanup - Round 3
  • Tutorial 18 - Sound and MultiThreading
  • Tutorial 19 - Game Aesthetics
  • Tutorial 20 - Microsoft.XNA.Framework.Game
  • Tutorial 21 - The Content Pipeline
  • Tutorial 22 - The Finishing Touches
  • 3D Tutorials

     

  • Tutorial 1 - Display a Texture
  • Tutorial 2 - Rotating a 3D Model
  • Tutorial 3 - The 3D SkyBox
  • Tutorial 4 - The Quaternion Camera
  • Tutorial 5 - Basic Terrain
  • Tutorial 6 - Basic Terrain With Lighting
  • visual-basic-net-collage-text-book-computer-programming visual basic.net collage text book computer programming
    US $4.99 (0 Bid)
    Auction Ends: Tuesday Oct-07-2008 19:49:27 PDT
    Bid on this Item   | Watch this Item
    visual-basic-2005-computer-book Visual basic 2005, computer book
    US $4.99 (0 Bid)
    Auction Ends: Tuesday Oct-07-2008 22:11:23 PDT
    Bid on this Item   | Buy this Item   | Watch this Item
    matlab-visual-basic-net-book-2nd-edition-brand-new MATLAB Visual Basic .Net Book, 2nd Edition (brand new)
    US $9.95
    Auction Ends: Wednesday Oct-08-2008 9:31:36 PDT
    Bid on this Item   | Buy this Item   | Watch this Item

    Beginning Game Development with MS .NET

    Found a bunch of great learning tools for Microsoft .NET using DirectX over at MSDN. This is a great resource, and Code is provided while he explains it, and also as a download, so you can save on the typing. Not sure if you have installed what you need to get started? Not to worry, in every article he provides links to download the Free software you will need to continue the tutorials and learn.
    A quote from the site

    This series as aimed at beginning programmers who are interested in developing a game for their own use with the .NET Framework and DirectX. The goal of this series is to have fun creating a game and learn game development and DirectX along the way. Game programming and DirectX have their own terms and definitions that can be difficult to understand, but after awhile, you’ll crack the code and be able to explore a new world of possibilities. I will keep things as straightforward as possible and decode terms as they appear. Another part of the learning curve comes from the math you’ll need to deal with DirectX. I am going to point out some resources along the way that will help you brush up on, or learn, the math skills you’ll need to keep going in DirectX.

    In this series, we are going to build a simple game to illustrate the various components of a commercial game. We will cover how to create great looking graphics in 3D, how to handle user input, how to add sound to a game, how to create computer opponents using Artificial Intelligence, and how to model real-world physics. In addition we are going to cover how to make your game playable over the network and how to optimize your game for performance. Along the way, I will show you how to apply principles of object-oriented development and, as well, I will share some of my experience in creating well-organized and elegant code.

    1. Beginning Game Development Part 1 - Introduction
    2. Beginning Game Development Part II - Introduction to DirectX
    3. Beginning Game Development: Part III - DirectX II
    4. Beginning Game Development: Part IV - DirectInput
    5. Beginning Game Development: Part V - Adding Units
    6. Beginning Game Development: Part VI - Lights, Materials and Terrain
    7. Beginning Game Development: Part VII –Terrain and Collision Detection
    8. Beginning Game Development: Part VIII - DirectSound
    9. Beginning Game Development Part IX –Direct Sound Part II
    10. Beginning Game Development Part IX –Direct Sound Part III

    9 Newb AJAX Tutorials

    Recently came across several tutorials that explained AJAX very well.

    Jon Hughes over at Phazm.com has written a great tutorial called Easy as Pie - Ajax Requests. I believe this is the 2nd in Jon’s Easy as Pie series, if you liked the Ajax Requests Tutorial, then also check out Easy as Pie - Unobtrusive Javascript, another great tutorial. I find Jon explans things great in his tutorials (line by line in the Ajax tutorial) and it would be easy for a Newb to pick up.

    Couple others worth checking out :

    You can’t go wrong with any tutorials at W3Schools, if you want to learn any programming Web-Wise, W3Schools is the absolute best place to start. They have been around forever, and I think I even learning my beginner HTML and ASP from them. The have a great AJAX Tutorial of course.

    AJAX:Getting Started over at the Mozilla Developer Center

    Found this one through Google, I thought it was nicely layed out, and easy to understand check out this AJAX PDF Tutorial

    And a couple more from the Freaks over at AjaxFreaks.com, these are 5 beginner level AJAX tutorials for various functions on a website : Simple Introduction to AJAX and XMLHttpRequest, Creating Live Data with AJAX, Practical Usage For AJAX: Random Image Block, Multiple Dynamic DIV Tags with AJAX, Making a Google Suggest-like application.

    start-your-own-web-design-biz-$$$-html-basic-e-book-cd Start your own web design biz $$$ HTML Basic E Book cd
    US $4.99 (0 Bid)
    Auction Ends: Tuesday Oct-07-2008 21:03:32 PDT
    Bid on this Item   | Buy this Item   | Watch this Item
    teach-yourself-java-in-21-days-book-&-cd-web-html Teach Yourself Java in 21 Days-Book & CD/Web,HTML
    US $0.99 (0 Bid)
    Auction Ends: Wednesday Oct-08-2008 5:55:58 PDT
    Bid on this Item   | Buy this Item   | Watch this Item
    learn-how-to-use-html-e-book-manual-guide-~ Learn how to use HTML e Book manual guide ~
    US $24.95 (0 Bid)
    Auction Ends: Wednesday Oct-08-2008 18:01:35 PDT
    Bid on this Item   | Buy this Item   | Watch this Item

    Top 5 Online Education Resources

    This is going to be a listing of my Top 5 Online Education Resources, amongst the links I provide, there are thousands upon thousands of links to sites with Education Resources.  As you can see with just the first 3 links I provide have over 500 links to Free Learning on the web.

    First site that has a great listing of 200 Free Online Classes to Learn Anything is from the Online Education Database website.  This is a great resource if you are looking to learn something from Natural Science, Math, Engineering and Computer Science, Language, Arts & Design, Health, Agriculture & Veterinary Medicine, Law & Politics, Social Science, History, Theology, Business & Finance, Family & Education, plus more.  You have to love the online world when you can get some much education, as a free simple download.

    Also from Online Education Database I found another great page containing the Top 100 Open Courseware Projects.  They break down the list into sections, check out which ever you are interested in : Agriculture , Arts , Architecture , Archaeology , Audio & Video , Biology , Botany , Chemistry , Civil Engineering , Economics , Electronic Engineering , General Engineering , Earth Sciences , Geography & Geology , History , Languages & Linguistics , Law , Literature , Mechanical Engineering , Paleontology , Physics , Political Science , Psychology and Social Sciences.

    Seems Online Education Database is a awesome resource on its own, maybe I should just make the whole post about them =)  another great page I found on there site is 236 Open Courseware Collections, Podcasts, and Videos find subjects on Archives , Broadcast Learning , Directories & Searches , eBooks & eTexts , Encyclopedias , Open Courseware - University , Open Courseware , Podcasts - University , Podcasts - Other , Research , Videos - Universities , Videos - Other , Video,  Directories & Searches.

    Now lets finally leave OED and go onto some other great resources.  This next one is I believe 1 of the first big ones out from the smart people over at MIT.  Check out MITOPENCOURSEWARE website where you will find thousands of full courseware downloads from tonnes of different subjects.  Here is a quick link to all 1800 courses they offer, and a quote from there front page :

    MIT is committed to advancing education and discovery through knowledge open to everyone.  OCW shares free lecture notes, exams, and other resources from more than 1800 courses spanning MIT’s entire curriculum.

    Lastly is the OpenLearningInitiative from Carnegie Mellon.  Although not as huge as the others, it is a good resource for introductory College level courses.  A quote from there front page :

    How theory, strategies, and methods from the learning sciences are applied to the design of open learning environments and how the use and evaluation of those environments inform the learning sciences.

    Colours, Colours, and even more Colours

    If you have ever been frustrated trying to find just the right color for a webpage or image you are working on, these next couple of links are for you.

    I found an excellent FREE Advanced HTML Color Picker which is pretty amazing. It lets you add a small icon or text link to the website so that you can see right off the bat if the color you are picking is good. A small unobtrusive box opens up with sliders and such that let you change the colors, and you can see the color changing before your eyes. Great little app, and easy to install into you webpage.

    A couple of other posts from ColourLovers.com that I found usefull are : 32 Common Color Names for Easy Reference and Ultimate HTML Color Hex Code List.

    After digging into the site a bit, besides the post links above, I believe ColourLovers.com is an all round excellent resource for choosing colors. They let there users create Color Palettes and Patterns and even provide you downloads in several different formats (PS, HTML, etc.) If you click on a Color Palette for example a page will load up showing you the particular Color Palette, and then breaks down each color with HEX and RGB code. This site is great. You could find a nice Palette and have all the HEX and RGB codes you need to make up a website with.

    I made a small Web Safe Colors page myself a long long time ago which I would keep on my desk when ever I was developing a website.

    Photoshop - 5 Great Text Tutorials

    1. Glass Text Tutorial by Xiao of Gfx-Depot
    2. UFC Blood Sport Logo by TeamTutorials
      1. Blue and Chrome Text
      2. Darkness Logo
      3. Versus Inspired Logo
    3. Read the rest of this entry »

    Python/PyGame Tutorials for Newbs

    Doing my usual rounds, checking for new Tutorials, and found a bunch new ones I have seen before.   Most are very good for Newbs looking to get into Game Programming in Python/Pygame

    First off I found a nice 6 part tutorial on Keeping it Small and Simple website, webmaster is a self professed lazy bones, which most of us developers can understand =)  He takes it very slow and simple, and you really should not feel rushed at all when checking out his tutorials.  Step by Step for total beginners.  Check out PyGame Tutorial Part 1 - Getting Started :: PyGame Tutorial Part 2 - Drawing Lines :: PyGame Tutorial Part 3 - Mouse Events :: PyGame Tutorial Part 4 - More on Events :: PyGame Tutorial Part 5 - Pixels :: PyGame Tutorial Part 6 - From Pixel to Worm

    Next I found dCafe which lists a bunch of Python Programming Ebooks, a great resource of ebooks and site links.

    Couple others :

    Perl for Newbs

    Recently I had the good fortune of being sent on a Perl Course and I must say I have much respect for this robust scripting language.  It is very versatile, and I like that since I know C programming, I understood and could follow the code during the course. After reading a bit more, I find that Larry Wall (Perl inventor) actually wrote Perl and derived from C as well as a couple of Unix utilities such as sed, awk, etc.

    Below is a couple of quotes that can explain what Perl is better then I, so read ahead, then check out the tutorials.

    To borrow a quote from Doug Sheppard (see his Beginner tutorial links below) explaning what PERL is :

    Perl is the Swiss Army chainsaw of scripting languages: powerful and adaptable. It was first developed by Larry Wall, a linguist working as a systems administrator for NASA in the late 1980s, as a way to make report processing easier. Since then, it has moved into a large number of roles: automating system administration, acting as glue between different computer systems; and, of course, being one of the most popular languages for CGI programming on the Web.

    And another quote explaining Perl taken from the documentation :

    Perl is a high-level programming language with an eclectic heritage written by Larry Wall and a cast of thousands. It derives from the ubiquitous C programming language and to a lesser extent from sed, awk, the Unix shell, and at least a dozen other tools and languages. Perl’s process, file, and text manipulation facilities make it particularly well-suited for tasks involving quick prototyping, system utilities, software tools, system management tasks, database access, graphical programming, networking, and world wide web programming. These strengths make it especially popular with system administrators and CGI script authors, but mathematicians, geneticists, journalists, and even managers also use Perl. Maybe you should, too.

    Beginners Introduction to Perl Part 1 :: Part 2 :: Part 3 :: Part 4 :: Part 5 :: Part 6 by Doug Sheppard over at Perl.com

    Perl 5 Tutorial (PDF) by Chan Bernard Ki Hong

    Perl and CGI by Abby Buell

    Compilation of Perl Documentation (large file 6+ megs)

    Although not for beginners you should know that there is a large repository of everything that is Perl over at CPAN (Comprehensive Perl Archive Network).  Before you start writing modules and such, head on over to CPAN and see if someone else has already written or is doing what you want to do, and just download the module and link it to your Perl script.  There are tonnes and tonnes of excellent modules located here that let you do almost anything.  Once you start really getting into Perl, this site it a must.

    Next entries »