bigblogzooimage
Get a
Free Blog

OpenSource BigBlogZoo & SearchSaver

We are happy to release the BigBlogZoo and SearchSaver into the opensource community. The BigBlogZoo is a semantic news aggregator. Imagine a news aggregator with 50,000 preloaded feeds that have been categorized uses the DMOZ RDF categorization schema. It also does some other nifty things with the categorization schema, namely categorized crawling and reaggregation.

BigBlogZoo Downloads

Alternativley you can try here :

BigBlogZoo No JRE
BigBlogZoo

Help

BigBlogZoo Help

SearchSaver is a nice little tool that saves search results in xml or pdf. It is quite useful when you are doing research to sometimes save your results, it basically re-aggregates your results into either a new feed or as pdf. It is also nice for finding feeds that are related to each other and for quickly creating a set of interesting feeds. You can get more info here .

Posted by admin on 05-18-2007 at 05:05 pm
Posted in General with 0 Comments

Timeline Aggregator

Th brainy geezers at MIT have created a DHTML/JavaScript timeline widget. You can see an example of it in use here: Aerospace . As you can see you can visualize events on a scrollable timeline.

I thought it would be a interesting to see if I could quickly whip up a servlet that converted an arbitrary feed into the XML event format the timeline widget expects.

I am going to presume that you can get a simple servlet like this working (ie changes to web.xml, etc). If I am wrong, leave a comment and I will try to help.

Here is the Servlet:

public class TimeProviderServlet extends HttpServlet
{
private static final long serialVersionUID = 0;

public void doGet(HttpServletRequest request, HttpServletResponse response)
throws ServletException, IOException
{
try
{
// get the url
String feedURL = request.getParameter(”url”);
TimeFeedXmlAnalyzer fxa = new TimeFeedXmlAnalyzer(URLDecoder.decode(
feedURL, “UTF-8″));
TimeFeedData data = fxa.fetchFeed();

TimeLiner timeLiner = new TimeLiner();
TimeFeedContent[] entries = data.entries;

for (int i = 0; i < entries.length; i++)
{
timeLiner.addEvent(entries[i].title, entries[i].content,
entries[i].publishedDate, entries[i].link);
}

response.setContentType(”text/xml; charset=UTF-8″);
PrintWriter out = response.getWriter();
out.println(timeLiner.out());
out.close();

}
catch (Exception ex)
{
PrintWriter out = response.getWriter();
ex.printStackTrace(out);
out.close();
}
}
}
Here is the feed fetcher:

public class TimeFeedXmlAnalyzer
{

private final HttpClient httpClient;
private final HttpMethod clientMethod;

public TimeFeedXmlAnalyzer(String source) throws MalformedURLException
{
new URL(source);
this.httpClient = new HttpClient();
this.clientMethod = new GetMethod(source);
clientMethod.setFollowRedirects(true);
}

public TimeFeedData fetchFeed() throws IOException, FeedException,
IllegalArgumentException
{
this.httpClient.executeMethod(this.clientMethod);
final SyndFeedInput input = new SyndFeedInput();
final XmlReader xmlreader = new XmlReader(this.clientMethod
.getResponseBodyAsStream());

final TimeFeedData result = new TimeFeedData(input.build(xmlreader));
this.clientMethod.releaseConnection();
return result;
}

public void finalize() throws Throwable
{
this.clientMethod.releaseConnection();
super.finalize();
}

}

Here is the wrapper for the Feed:

public final class TimeFeedData
{
public String title = null;
public String link = null;
public String description = null;
public final TimeFeedContent[] entries;

public TimeFeedData(final SyndFeed feed)
{
this.title = feed.getTitle();
this.link = feed.getLink();
this.description = feed.getDescription();

List entryList = feed.getEntries();
this.entries = new TimeFeedContent[entryList.size()];
SyndEntry temp;
int len = entryList.size();
for (int i = 0; i

Wrapper for the content:

public String title = null;
public String link = null;
public String content = null;
public Date publishedDate = null;

public TimeFeedContent(final SyndEntry entry)
{
this.title = entry.getTitle();
this.link = entry.getLink();

if (entry.getDescription() != null)
{
this.content = entry.getDescription().getValue();
}
this.publishedDate = entry.getPublishedDate();
}

And lastly the bit that converts:

public class TimeLiner
{
private static final String EVENT = “event”;
private static final String TITLE = “title”;
private static final String LINK = “link”;
private Element root = new Element(”data”);
private Document doc = new Document(root);
private static String START = “start”;
private static String GMT = ” GMT”;
private static final String MMMM_DD_YYYY_HH_MM_SS = “MMMM dd yyyy hh:mm:ss”;

public void addEvent(String title, String description, Date date, String uri)
{
if (date == null)
{
date = new Date();
}
String format = MMMM_DD_YYYY_HH_MM_SS;
SimpleDateFormat sdf = new SimpleDateFormat(format, Locale.UK);

Element event = new Element(EVENT);
Attribute start = new Attribute(START, sdf.format(date) + GMT);
event.setAttribute(start);
Attribute titleatt = new Attribute(TITLE, title);
Attribute linkatt = new Attribute(LINK, uri);
event.setAttribute(titleatt);
event.setAttribute(linkatt);
try
{
event.setText(StringEscapeUtils.escapeXml(description));
}
catch (IllegalDataException e)
{
// do nothing
}
root.addContent(event);
}

public String out()
{
try
{
StringWriter sw = new StringWriter();
XMLOutputter outputter = new XMLOutputter();
outputter.output(doc, sw);
sw.close();
return sw.toString();
}
catch (java.io.IOException e)
{
return “ERROR”;
}
}
}

To get this running you are going to need JDom, ROME, Commons HTTPClient, Commons Logging, Commons Lang, and Commons Codec.

To get it to work simply pass the function Timeline.loadXML the url to your servlet and that url needs to point to the feed, for example: Timeline Servlet

In other words this is the url of the servlet: http://www.syndicatescape.com/timeproviderservlet

add ?url=YOUR_FEED to the end of the url and you get XML.

It could have even been easier, I am not sure if those feed and content wrappers are really neccessary.

You can get more info here about the Timelines here :SIMILE.

Posted by admin on 05-14-2007 at 04:05 pm
Posted in General with 0 Comments

Reaggregation

What does to reaggregate mean? It means to reform elements into a whole. It happens every day on the web, and not always lawfully. Technologies such as RSS (Really Simple Syndication) facilitate reaggregation and by their very nature are considered lawful. The XML structure of RSS is designed to summarize (not reproduce) web content and therefore when a feed is reproduced it is considered fair use. Although people often stuff an entire article into the description field of an RSS field, reaggregating such a feed can still be considered fair use. The person who created the feed knowingly violated the contract of a feed, in other words if you don“t want people to come along and reaggregate your entire article then use the description field correctly.

Posted by admin on 05-10-2007 at 06:05 pm
Posted in General with 0 Comments