August 2, 2007

Sometimes, when you’re browsing around the Internet, you find a page that blows you away. A poem, a statement, a joke, a diatribe, whatever. You love it and want to save a copy.

Or what about your eCommerce purchase receipt and registration code. Wouldn’t that be ideal saved with the downloaded software?

You could print it, but what then do you do with all these pages?  If you save it as a file, then your desktop search tool can find it later, much more easily – even if you mis-file or don’t file.

If you go “File, Save”, your browser saves an HTML file and a separate folder containing all the images and stuff from the page. The saved file uses the Page Title for a file name but that can be pretty long or too generic like “home”. And the folder gets the same name. I’ve seen backup programs unable to back these up as the pathname with that loooong folder name plus its contents gets too long to save. But thats another story. The basic issue is, File, Save is simply messy.

I have a couple of suggestions.

1) Microsoft developed a handy file format called “MHT“. Its short for MHTML, short for MIME HTML. Basically, the external files like images are brought inside the web page file, encoding them using MIME. This is like an HTML email message. So all the bits of the web page are packaged up inside the single file. Much tidier and it doesn’t get broken by loosing the folder or changing its name to something sensible.

To use MHT:

In IE, go to “File, Save As” then choose the “Web Archive” format.
(from the File Types pick list at the bottom)

In Opera, File, Save As, Web Archive, a/a.

In Firefox, you’ll need the “Mozilla Archive Format” addin. Then use File, Save As, “MAF MHT format”.

Safari can save Web Archives but can’t open them.

By default, MHT files will open in IE on Windows. They can be opened in Firefox and Opera using File, Open and changing file type to All or MHT.

However, a few heavily scripted pages will not save correctly as an archive. For example, eCommerce pages where you get your purchase receipt and registration code may not save properly (it will tell you). Thats when Plan B comes in:

2) Sometimes you want to save the page, but more as a document for later reference, like that registration code. Archive format doesn’t work because of active scripting. Heres where Printing the file to a file can help. Thats right – you use the print function of the browser to Save a file. This is how “PDF” works. Portable Document Format. Adobe Acrobat has been creating PDF files for many years. Its a common format for eBooks, manuals and so forth. But the Acrobat program to create them is expensive. It can be used for sending high end spreads to the printers, creating forms, having Internet based comment and feedback on a document proposal, marking up construction documents, and much more. But you don’t need all that for this. You just need a PDF printer. You’re in luck. PostScript and PDF production are standards created by Adobe. Older versions are open.

All you need is a free PDF printer. This can be handy if you don’t have an actual page printer as well. Say you are out in the field with a notebook and you need to capture some material for later printing – here is the answer.

The best free PDF printers vary a bit. A couple of good ones  have recently added advertising to their free tools. These days, I suggest Primo PDF, a powerful but easy to use and free “printer” with no ads.

To use it, just download and install. With the web page (or Word document, or whatever) you want to save, just go File, Print and choose Primo PDF as the printer.  It will ask you were you want to save it and what to call it, then produce the PDF file. Done. When you open the file, it will open in the free Acrobat Reader. You can use the text select tool there to, for example, copy and paste that complex registration code with no errors…

