You are browsing the archive for Open Data.

Richard Poynder interviews Jordan Hatcher

Jonathan Gray - October 19, 2010 in Interviews, Legal, OKF, Open Data, Open Data Commons, Open Definition, Open Government Data, Open Knowledge Definition, Public Domain, WG Open Licensing

Open Acccess journalist extraordinaire Richard Poynder recently interviewed the Open Knowledge Foundation’s Jordan Hatcher about data licensing, the public domain, and lots more. An excerpt is reproduced below. The full version is available on Richard’s website.

Over the past twenty years or so we have seen a rising tide of alternative copyright licences emerge — for software, music and most types of content. These include the Berkeley Software Distribution (BSD) licence, the General Public Licence (GPL), and the range of licences devised by Creative Commons (CC). More recently a number of open licences and “dedications” have also been developed to assist people make data more freely available.

The various new licences have given rise to terms like “copyleft” and “libre” licensing, and to a growing social and political movement whose ultimate end-point remains to be established.

Why have these licences been developed? How do they differ from traditional copyright licences? And can we expect them to help or hinder reform of the traditional copyright system — which many now believe has got out of control? I discussed these and other questions in a recent email interview with Jordan Hatcher.

A UK-based Texas lawyer specialising in IT and intellectual property law, Jordan Hatcher is co-founder of OpenDataCommons.org, a board member of the Open Knowledge Foundation (OKF), and blogs under the name opencontentlawyer.

clip_image002

Jordan Hatcher

Big question

RP: Can you begin by saying something about yourself and your experience in the IP/copyright field?

JH: I’m a Texas lawyer living in the UK and focusing on IP and IT law. I concentrate on practical solutions and legal issues centred on the intersection of law and technology. While I like the entire field of IP, international IP and copyright are my most favourite areas.

As to more formal qualifications, I have a BA in Radio/TV/Film, a JD in Law, and an LLM in Innovation, Technology and the Law. I’ve been on the team that helped bring Creative Commons licences to Scotland and have led, or been a team member on, a number of studies looking at open content licences and their use within universities and the cultural heritage sector.

I was formerly a researcher at the University of Edinburgh in IP/IT, and for the past 2.5 years have been providing IP strategy and IP due diligence services with a leading IP strategy consultancy in London.

I’m also the co-founder and principal legal drafter behind Open Data Commons, a project to provide legal tools for open data, and the Chair of the Advisory Council for the Open Definition. I sit on the board for the Open Knowledge Foundation.

More detail than you can ask for is available on my web site here, and on my LinkedIn page here.

RP: It might also help if you reminded us what role copyright is supposed to play in society, how that role has changed over time (assuming that you feel it has) and whether you think it plays the role that society assigned to it successfully today.

JH: Wow that’s a big question and one that has changed quite a bit since the origin of copyright. As with most law, I take a utilitarian / legal realist view that the law is there to encourage a set of behaviours.

Copyright law is often described as being created to encourage more production and dissemination of works, and like any law, its imperfect in its execution.

I think what’s most interesting about copyright history is the technology side (without trying to sound like a technological determinist!). As new and potentially disruptive technologies have come along and changed the balance — from the printing press all the way to digital technology — the way we have reacted has been fairly consistent: some try to hang on to the old model as others eagerly adopt the new model.

For those interested in learning more about copyright’s history, I highly recommend the work of Ronan Deazley, and suggest people look at the first sections in Patry on Copyright. They could also usefully read Patry’s Moral Panics and the Copyright Wars. Additionally, there are many historical materials on copyright available at the homepage for a specific research project on the topic here.

Three tranches

RP: In the past twenty years or so we have seen a number of alternative approaches to licensing content develop — most notably through the General Public Licence and the set of licences developed by the Creative Commons. Why do you think these licences have emerged, and what are the implications of their emergence in your view?

JH: I see free and open licence development as happening within three tranches, all related to a specific area of use.

1. FOSS for software. Alongside the GPL, there have been a number of licences developed since the birth of the movement (and continuing to today), all aimed at software. These licences work best for software and tend to fall over when applied to other areas.

2. Open licences and Public licences for content. These are aimed at content, such as video, images, music, and so on. Creative Commons is certainly the most popular, but definitely not the first. The birth of CC does however represent a watershed moment in thinking about open licensing for content.

I distinguish open licences from public licences here, mostly because Creative Commons is so popular. Open has so many meanings to people (as do “free”) that it is critical to define from a legal perspective what is meant when one says “open”. The Open Knowledge Definition does this, and states that “open” means users have the right to use, reuse, and redistribute the content with very few restrictions — only attribution and share-alike are allowed restrictions, and commercial use must specifically be allowed.

The Open Definition means that only two out of the main six CC licences are open content licences — CC-BY and CC-BY-SA. The other four involve the No Derivatives (ND) restriction (thus prohibiting reuse) or have Non Commercial (NC) restrictions. The other four are what I refer to as “public licences”; in other words they are licences provided for use by the general public.

Of course CC’s public domain tools, such as CC0, all meet the Open Definition as well because they have no restrictions on use, reuse, and redistribution.

I wrote about this in a bit more detail recently on my blog.

3. Open Data Licences. Databases are different from content and software — they are a little like both in what users want to do with them and how licensors want to protect them, but are different from software and content in both the legal rights that apply and how database creators want to use open data licences.

As a result, there’s a need for specific open data licences, which is why we founded Open Data Commons. Today we have three tools available. It’s a new area of open licensing and we’re all still trying to work out all the questions and implications.

Open data

RP: As you say, data needs to be treated differently from other types of content, and for this reason a number of specific licences have been developed — including the Public Domain Dedication Licence (PDDL), the Public Doman Dedication Certificate (PDDC) and Creative Commons Zero. Can you explain how these licences approach the issue of licensing data in an open way?

JH: The three you’ve mentioned are all aimed at placing work into the public domain. The public domain has a very specific meaning in a legal context: It means that there are no copyright or other IP rights over the work. This is the most open/free approach as the aim is to eliminate any restrictions from an IP perspective.

There are some rights that can be hard to eliminate, and so of course patents may still be an issue depending on the context, (but perhaps that’s conversation for another time).

In addition to these tools, we’ve created two additional specific tools for openly licensing databases — the ODbL and the ODC-Attribution licences.

RP: Can you say something about these tools, and what they bring to the party?

JH: All three are tools to help increase the public domain and make it more known and accessible.

There’s some really exciting stuff going on with the public domain right now, including with PD calculators — tools to automatically determine whether a work is in the public domain. The great thing about work in the public domain is that it is completely legally interoperable, as it eliminates copyright restrictions.

See the rest of the interview on Open and Shut

Related posts:

  1. Interview with Jordan Hatcher on legal tools for open data
  2. Jordan Hatcher talk on Open Data Licensing at iSemantics
  3. Open Licenses vs Public Licenses

Workshop on Open Bibliographic Data and the Public Domain, 7th October 2010

Jonathan Gray - October 5, 2010 in Open Data, Public Domain Works, WG Open Bibliographic Data, WG Public Domain

A brief reminder that our workshop on Open Bibliographic Data and the Public Domain (which we blogged about a few months ago) is taking place on Thursday 7th October. Details are as follows:

Here’s the blurb:

This one day workshop will focus on open bibliographic data and the public domain. In particular it will address questions like:

  • What is the role of freely reusable metadata about works in calculating which works are in the public domains in different jurisdictions?
  • How can we use existing sources of open data to automate the calculation of which works are in the public domain?
  • What data sharing policies in libraries and cultural heritage institutions would support automated calculation of copyright status?
  • How can we connect databases of information about public domain works with digital copies of public domain works from different sources (Wikipedia, Europeana, Project Gutenberg, …)?
  • How can we map existing sources of public domain works in different countries/languages more effectively?

The day will be very much focused on productive discussion and ‘getting things done’ — rather than presentations. Sessions will include policy discussions about public domain calculation under the auspices of Communia (a European thematic network on the digital public domain), as well as hands on coding sessions run by the Open Knowledge Foundation. The workshop is a satellite event to the 3rd Free Culture Research Conference on 8-9th October.

If you would like to participate, you can register at:

If you have ideas for things you’d like to discuss, please add them at:

To take part in discussion on these topics before and after this event, please join:

Related posts:

  1. Workshop on Open Bibliographic Data and the Public Domain
  2. Notes from Workshop on Open Bibliographic Data and the Public Domain
  3. Open bibliographic data promotes knowledge of the public domain

Workshop on Open Bibliographic Data and the Public Domain

Jonathan Gray - August 17, 2010 in Bibliographica, Events, OKF Projects, Open Data, Public Domain, Public Domain Works, WG Open Bibliographic Data, WG Public Domain, Working Groups

We are pleased to announce a one day workshop on Open Bibliographic Data and the Public Domain. Details are as follows:

Here’s the blurb:

This one day workshop will focus on open bibliographic data and the public domain. In particular it will address questions like:

  • What is the role of freely reusable metadata about works in calculating which works are in the public domains in different jurisdictions?
  • How can we use existing sources of open data to automate the calculation of which works are in the public domain?
  • What data sharing policies in libraries and cultural heritage institutions would support automated calculation of copyright status?
  • How can we connect databases of information about public domain works with digital copies of public domain works from different sources (Wikipedia, Europeana, Project Gutenberg, …)?
  • How can we map existing sources of public domain works in different countries/languages more effectively?

The day will be very much focused on productive discussion and ‘getting things done’ — rather than presentations. Sessions will include policy discussions about public domain calculation under the auspices of Communia (a European thematic network on the digital public domain), as well as hands on coding sessions run by the Open Knowledge Foundation. The workshop is a satellite event to the 3rd Free Culture Research Conference on 8-9th October.

If you would like to participate, you can register at:

If you have ideas for things you’d like to discuss, please add them at:

To take part in discussion on these topics before and after this event, please join:

Related posts:

  1. Workshop on Open Bibliographic Data and the Public Domain, 7th October 2010
  2. Notes from Workshop on Open Bibliographic Data and the Public Domain
  3. Open bibliographic data promotes knowledge of the public domain

Opening up university infrastructure data

Jonathan Gray - July 20, 2010 in External, Guest post, Open Data, Public Domain

The following guest post is from Christopher Gutteridge, Web Projects Manager at the Electronics and Computer Science (ECS), University of Southampton and member of the OKF’s Working Group on Open Bibliographic Data.

We announced on Tuesday (13th July 2010) that all the RDF made available about our school would be placed in the public domain.

Around five years ago we (The School of Electronics and Computer Science, University of Southampton) had a project to create open data of our infrastructure data. This included staff, teaching modules, research groups, seminars and projects. This year we have been overhauling the site based on what we’ve learned in the interim. We made plenty of mistakes, but that’s fine and what being a university is all about. We’ll continue to blog about what we’ve learned.

We have formally added a “CC0″ public domain license to all our infrastructure RDF data, such as staff contact details, research groups and publication lists. One reason few people took an interest in working with our data is that we didn’t explicitly say what was and wasn’t OK, and people are disinclined to build anything on top of data which they have no explicit permission to use. Most people want to instinctively preserve some rights over their data, but we can see no value in restricting what this data can be used for. Restricting commercial use is not helpful and restricting derivative works of data is non-sensical!

Here’s an Example; Someone is building a website to list academics by their research area and they use our data to add our staff to this. How does it benefit us to force them to attribute our data to us? They are already assisting us by making our staff and pages more discoverable, why would we want to provide a restriction?. If they want to build a service that compiles and republishes data they would need to track every license and that’s going to be a bother of a similar scale to the original BSD Clause 3.

Our attitude is that we’d like an attribution where convenient, but not if it’s a bother. must-attribute is a legal requirement, we say “please-attribute”. It’s our hope that this step will help other similar organisations take the same step with the confidence of not being the first to do so.

The CC0 license does not currently extend to our research publications documents (just the metadata) or to research data. It is my personal view that research funders should make it a requirement of funding that a project publishes all data produced, in open formats, along with any custom software used to produce it, or required to process it, along with the source and (ideally) the complete cvs/git/svn history. This is beyond the scope of what we’ve done recently in ECS, but the University is taking the management of research data very seriously and it is my hope that this will result in more openness.

Another mistake we have learned from is that we made a huge effort to correctly model and describe our data as semantically accurately as possible. Nobody cares enough about our data to explain to their tool what an “ECS Person” is. We’re in the process of adding in the more generic schemes like FOAF and SIOC etc. The awesome thing about the RDF format is that we can do this gently and incrementally. So now everybody is both (is rdf:type of) a ecs:Person and a foaf:Person. (example). The process of making this more generic will continue for a while, and we may eventually expire most of the extraneous ecs:xyz site-specific relationships except where no better ones exist.

The key turning point for us was when we started trying to us this data to solve our own problems. We frequently build websites for projects and research groups and these want views on staff, projects, publications etc. Currently this is done with an SQL connection to the database and we hope the postgrad running the site doesn’t make any cock-ups which result in data being made public which should not have been. We’ve never had any (major) problems with this approach, but we think that loading all our RDF data into a SPARQL server (like an SQL server, but for RDF data and connects with HTTP) is a better approach. The SPARQL server only contains information we are making public so the risks of leaks (eg. staff who’ve not given formal permission to appear on our website) is minimised. We’ve taken our first faltering steps and discovered immediately that our data sucked (well, wasn’t as useful as we’d imagined). We’d modelled it with an eye to accuracy, not usefulness, believing if you build it they will come. The process of “eating our own dogfood” rapidly revealed many typos, and poor design decisions which had not come to light in the previous 4 or 5 years!

Currently we’re also thinking about what the best “boilerplate” data is to put in each document. Again, we’re now thinking about how to make it useful to other people rather than how to accurately model things.

There’s no definitive guidance on this. I’m interested to hear from people who wish to consume data like this to tell us what they *need* to be told, rather than what we want to tell them. Currently we’ve probably got an overkilll!

cc:attributionName “University of Southampton, School of Electronics and Computer Science”
dc11:description “This rdf document contains information about a person in the Department of Electronics and Computer Science at the University of Southampton.”
dc11:title “Southampton ECS People: Professor Nigel R Shadbolt”
dc:created “2010-07-17T10:01:14Z”
rdfs:comment “This data is freely available, but you may attribute it if you wish.→→If you’re using this data, we’d love to hear about it at webmaster@ecs.soton.ac.uk.”
rdfs:label “Southampton ECS People: Professor Nigel R Shadbolt”

One field I believe should be standard which we don’t have is where to send corrections to. Some of the data.gov.uk is out of date and an instruction on how to correct it would be nice and benefit everyone.

At the same time we have started making our research publication metadata available as RDF, also CC0, via our EPrints server. It helps that I’m also lead developer for
the EPrints project! By default any site upgrading to EPrints 3.2.1 or later will get linked data being made available automatically (albeit, with an unspecified license).

Now let me tell you how open linked data can save a university time and money!

Scenario: The university cartography department provides open data in RDF form describing every building, it’s GPS coordinates and it’s ID number. (I was able to create such a file for 61 university buildings in less than an hours work. It is already freely published on maps on our website so no big deal making it available.

The university teaching support team maintain a database of learning spaces, and the features they contain (projectors, seating layout, capacity etc.) and what building each one is in. They use the same identifier (URI) for buildings as the cartography dept. but don’t even need to talk to them, as the scheme is very simple. Let’s say:
http://data.exampleuniversity.ac.uk/location/building/23

Each team undertakes to keep their bit up to date, which is basically work they were doing anyway. They source any of their systems from this data so there’s only one place to maintain it. They maintain it in whatever form works for them (SQL, raw RDF, textfile, Excel file in a shared directory!) and data.exampleuniversity.ac.uk knows how to get at this and provide it in well formed RDF.

The timetabling team wants to build a service to allow lecturers and students to search for empty rooms with certain features, near where they are now. (This is a genuine request made of our Timetable team at Southampton that they would like a solution for)

The coder tasked with this gets the list of empty rooms from the timetabling team, possibly this won’t be open data, but it still uses the same room IDs (URIs). eg. http://data.exampleuniversity.ac.uk/location/building/23/room/101

She can then mash this up with the learning-space data and the building location data to build a search to show empty rooms, filtered by required feature(s). She could even take the building you’re currently in and sort the results by distance away from you. The key thing is that she doesn’t have to recreate any existing data, and as the data is open she doesn’t need to jump through any hoops to get it. She may wish to register her use so that she’s informed of any planed outages or changes to the data she’s using but that’s about it. She has to do no additional maintenance as the data is being sourced directly from the owners. You could do all this with SQL, but this approach allows
people to use the data with confidence without having to get a bunch of senior managers to agree a business case. An academic from another university, running a conference at exampleuniversity can use the same information without having to navigate any of the politics and bureaucracy and improve their conference sites value to delegates by joining each session to it’s accurate location. If they make the conference programme into linked data (see http://programme.ecs.soton.ac.uk/ for my work in this area!) then a 3rd party could develop an iPhone app to mash up the programme & university building location datasets and help delegates navigate.

But the key thing is that making your information machine readable, discoverable and openly licensed is of most value to your own members in an organisation. It stops duplication of work and reduces time wasted trying to get a copy of data other staff maintain.

“If HP knew what HP knows, we’d be three times more profitable.” – Hewlett-Packard Chairman and CEO Lew Platt

I’ve been working on a mindmap to brainstorm every potential entity a university may eventually want to identify with a URI. Many of these would benefit from open data. Please contact me if you’ve got ones to add! It would be potentially useful to start recommending styles for URIs for things like rooms, courses and seminars as most of our data will be of a similar shape, and it makes things easier if we can avoid needless inconsistency!

Related posts:

  1. 8.4 Million Grant to University of Manchester to Expand Semi-Open Data Repository
  2. Opening up linguistic data at the American National Corpus
  3. Cornell University Library keeps reproductions of public domain works in the public domain

Open bibliographic data promotes knowledge of the public domain

Jonathan Gray - April 6, 2010 in Guest post, OKF Projects, Open Data, Public Domain, Public Domain Works, WG Open Bibliographic Data, WG Public Domain, Working Groups

The following guest post is from John Mark Ockerbloom, library scientist at the University of Pennsylvania Libraries and editor of The Online Books Page. He blogs at Everybody’s Libraries.

I’ve recently gotten involved with two Open Knowledge Foundation working groups, one on open bibliographic data and one on identifying public domain materials. Folks who follow my Everybody’s Libraries blog have seen me write about the importance of the public domain and open bibliographic records to the future of library services. But it’s also worth noting how the two issues complement each other.

If you want to identify the set of works that are in the public domain in your jurisdiction, for instance, you’ll need to do a lot of bibliographic research. As I describe in a 2007 paper on copyright and provenance, to determine the copyright status of a work you may need to know details about the time and place of first publication, the authors and their lifespans, the copyright notices and registrations associated with a work, and the relationship of the work to other works. Much of this data is included in bibliographic records, or can be more easily located when you have these bibliographic records in hand. And as I’ve described in detail at an ALA presentation, the more open bibliographic data is available, the easier it is for lots of different people (and programs) to analyze it. So promoting open bibliographic data also promotes knowledge of the public domain.

Going the other way, information about the public domain also helps build open bibliographic data. Over the past several years, I’ve been compiling information on copyright registrations and renewals, which in the US are very important for determining public domain status. (As has been noted previously, many books, periodicals, and images from the mid-20th century did not renew their copyrights as required and are now in the public domain in the US.) The catalog of copyright registrations is a US government work, not subject to copyright restrictions. And the catalog itself is a rich source of bibliographic data, with information on book titles, authors, and even publication details. Moreover, this data includes descriptions and identifiers not for specific editions, but for the higher-level FRBR concept of expressions, which can encompass many editions. This higher-level data is increasingly important in the newer, comprehensive catalogs that many groups (ranging from OCLC to the Open Library Project) are now developing. And there’s still more that can be done to get this copyright registration data online, or into forms that can be easily searched and analyzed.

In short, open bibliographic information and copyright information reinforce each other. By joining the Open Knowledge Foundation working groups on these topics, I hope to promote the synergies between them, and between people and groups working on liberating this information. If you’re interested in any of these issues, I hope you get involved as well. More information can be found on the OKFN website.

Related posts:

  1. Workshop on Open Bibliographic Data and the Public Domain
  2. Notes from Workshop on Open Bibliographic Data and the Public Domain
  3. Workshop on Open Bibliographic Data and the Public Domain, 7th October 2010