Selected by Lycos as
being in the top 5% of all Web sites. See the
Selected by Point Survey as being in the top 5% of all Web sites.
What is the point of this page?
The Aaron's Personal Newspaper page has two main reasons for existing. The first is to make it easier for me to sort through the material that appears daily on the Web that I find interesting (cool sites, sports, business, general news, comics on the Web, etc...). The second is that I don't know of anyone else on the WWW who is doing this - generating a page by sorting through the content of other pages. In recent years there has been a lot of discussion about intelligent agents. The script which generates this page is actually VERY STUPID but it does go and search the Web to find material that is of interest to me so I think it meets the definition of an "intelligent agent". I also hope to generate a discussion about what this page is doing: building a page by grabbing and editing the contents of other pages. I am doing this but am honestly unsure if it is a good idea and yet, without something like my script generating this newspaper, one is left with looking at all of what another person presents or none of it. Very often, one is only interested in a section of another person's page and it is a very nice thing to be able to avoid the parts you don't care about.
How is your Personal Newspaper Generated?
A script runs daily at 8:45AM Eastern Standard Time to generate the page. The script is written in Perl and takes advantage of Perl's regular expression matching and search capabilities. Basically, for each daily page I am interested in (all URLs hard-coded and search patterns unique to each page hard-coded), the script runs url-get (many thanks to Jack Lund, author of url_get) to get the source html code for each url and put it in a string. Each string is then searched on for the material I am interested in and sometimes edited (strong Heading tags reduced for example) on the fly and then the result is outputted to the daily newspaper page. For the few images I use, the images are grabbed, printed to a local file, and the image file is referenced by the daily newspaper page via an IMG SRC tag. The script does all the work I would otherwise have to do of waiting for dozens of other pages to load. As it is, I load only a single local page each morning when I come in and get all the same material I want (and in a nicely condensed format at that).
Something seems to be wrong with the page? What's going on?
Well, as mentioned above, the script which generates the Newspaper page is VERY stupid and is easily messed up. If any of the owners of the pages which the script references change the format of their pages in even minor ways, the script will not work properly. Email me about the problem and I will get around to fixing it eventually.
Also, if certain entries are simply missing, the reason is that the url_get script was unable to access the appropriate URL when the script was run this morning. When this happens, certain items don't make it into the newspaper. If it happens for several days in a row on the same item, let me know and I will look into the problem.
I own one of the pages you are referencing and I have a complaint!
I am referencing and grabbing the content out of other people's pages without permission. This is not cool really. If you own one or more of the pages I am referencing and don't wish me to do so, please let me know and I will immediately remove any references to your page(s). Email me your complaint. If, on a lesser note, you don't mind me utilizing your page but would like me to add a stronger acknowledgment/notice/comment/whatever than I already have, please let me know what you would like added and I will either add it as you wish or remove the reference to your page(s).
I have a daily page which you might like to add a reference to.
These are the comments I like. Email me the URL of your page and I will see about adding your page to the list of URLs my script accesses for information. This might take some days to do.
Why are some items for Yesterday and some for Today?
Well, there are actually a couple of reasons for this. First of all, with items like "Cool Sites...", there is no real advantage to being totally up-to-date and it happens that many "Cool Sites..." authors (including Glenn Davis himself) don't give a descriptive title for today's chosen site. Titles are only added when the item is moved to the "Previous ..." list and I like to have the titles. Other items, like News are not nearly as useful unless they are for today. Also, the script runs at 8:45AM EST so information can not be more recent than that.
Can I get a copy of your Perl script?
No, unfortunately I don't give out the source code. If you are interested in the technical issues behind the script, however, write me and I might give you a small sample of the script to give you an idea of what it does and how it does it.
I have a comment for you.
Great! Email me your comment and later, if there are enough comments to warrant it, I will make a page which incorporates your comments into a discussion on the merits of this project. Do you think this is a good or bad idea? Do you want your own personalized daily WWW Paper? Do you have any suggestions for ways to do this basic idea while still giving full credit to the original page authors and making it worth their while to allow people to grab selected content from their pages?