Selected by Lycos as
being in the top 5% of all Web sites. See the
Lycos review.
Selected by Point Survey as being in the top 5% of all Web sites.
What is the point of this page?
The Aaron's Personal Newspaper
page has two main reasons for existing. The first is to make it
easier for me to sort through the material that appears daily on the
Web that I find interesting (cool sites, sports, business, general
news, comics on the Web, etc...). The second is that I don't know of
anyone else on the WWW who is doing this - generating a page by
sorting through the content of other pages. In recent
years there has been a lot of discussion about intelligent
agents. The script which generates this page is actually VERY
STUPID but it does go and search the Web to find material that is of
interest to me so I think it meets the definition of an "intelligent
agent". I also hope to generate a discussion about what this page is
doing: building a page by grabbing and editing the contents of other
pages. I am doing this but am honestly unsure if it is a good idea
and yet, without something like my script generating this newspaper,
one is left with looking at all of what another person presents or
none of it. Very often, one is only interested in a section of
another person's page and it is a very nice thing to be able to avoid
the parts you don't care about.
How is your Personal Newspaper Generated?
A script runs daily at 8:45AM Eastern Standard Time to generate the
page. The script is written in Perl and takes advantage of Perl's
regular expression matching and search capabilities. Basically, for
each daily page I am interested in (all URLs hard-coded and search
patterns unique to each page hard-coded), the script runs url-get
(many thanks to Jack Lund, author of url_get) to get the source
html code for each url and put it in a string. Each string is then
searched on for the material I am interested in and sometimes edited
(strong Heading tags reduced for example) on the fly and then the
result is outputted to the daily newspaper page. For the few images I
use, the images are grabbed, printed to a local file, and the image
file is referenced by the daily newspaper page via an IMG SRC tag. The
script does all the work I would otherwise have to do of waiting for dozens
of other pages to load. As it is, I load only a single local page each morning
when I come in and get all the same material I want (and in a nicely condensed
format at that).
Something seems to be wrong with the page? What's going on?
Well, as mentioned above, the script which generates the Newspaper page
is VERY stupid and is easily messed up. If any of the owners of the pages
which the script references change the format of their pages in even minor
ways, the script will not work properly. Email me
about the problem and I will get around to fixing it eventually.
Also, if certain entries are simply missing, the reason is that the url_get script was unable to access the appropriate URL when the script was run this morning. When this happens, certain items don't make it into the newspaper. If it happens for several days in a row on the same item, let me know and I will look into the problem.
I own one of the pages you are referencing and I have a complaint!
I am referencing and grabbing the content out of other people's pages
without permission. This is not cool really. If you own one or more of the
pages I am referencing and don't wish me to do so, please let me know and I
will immediately remove any references to your page(s).
Email me your complaint. If, on a lesser note,
you don't mind me utilizing your page but would like me to add a stronger
acknowledgment/notice/comment/whatever than I already have, please let me know
what you would like added and I will either add it as you wish or remove the
reference to your page(s).
I have a daily page which you might like to add a reference to.
These are the comments I like. Email me
the URL of your page and I will see about adding your page to the list of
URLs my script accesses for information. This might take some days to do.
Why are some items for Yesterday and some for Today?
Well, there are actually a couple of reasons for this. First of all,
with items like "Cool Sites...", there is no real advantage to being totally
up-to-date and it happens that many "Cool Sites..." authors (including Glenn
Davis himself) don't give a descriptive title for today's chosen site.
Titles are only added when the item is moved to the "Previous ..." list and
I like to have the titles. Other items, like News are not nearly as useful
unless they are for today. Also, the script runs at 8:45AM EST so information
can not be more recent than that.
Can I get a copy of your Perl script?
No, unfortunately I don't give out the source code. If you are interested
in the technical issues behind the script, however, write me and I might give
you a small sample of the script to give you an idea of what it does and how
it does it.
I have a comment for you.
Great! Email me your comment and later, if
there are enough comments to warrant it, I will make a page which incorporates
your comments into a discussion on the merits of this project. Do you think
this is a good or bad idea? Do you want your own personalized daily WWW Paper?
Do you have any suggestions for ways to do this basic idea while still giving
full credit to the original page authors and making it worth their while to
allow people to grab selected content from their pages?