TEDective
The overarching aim of the TEDective is making European public procurement data explorable for non-experts. To get started, head over to our Explorer UI (opens in a new tab) or use the API (opens in a new tab) directly. If you are a developer or want to self-host the TEDective stack, you might want to check out our developer documentation. If you are interested in doing network analysis on the data, you might want to check out our user documentation.
Why is this needed?
Despite a range of previous public efforts to parse and analyse TED data, there currently exists no offering that fulfils all of the following requirements with regards to the provision of TED data:
- It is built in the open and published under a free software license.
- It offers a current, cleaned and deduplicated (see below) version of TED data.
- Data is available in more formats for network analysis: Our custom format OCDSGraph is designed to make it easier to do graph analytics on the data. We also plan to release the data in the FollowTheMoney (opens in a new tab) format.
- It is (currently) the only non-commercial eForms -> OCDS mapper available
Sustainably providing long-term access to European tender data in a way that fulfils these three requirements enables numerous applications that might be of interest to civil society, business and government which could greatly enhance the transparency and accountability of European business activity. There are a range of interesting questions that can be answered with this data if it was available in a well-documented and easy-to-understand format that is interoperable with tender data published elsewhere.
What data quality problems does TEDective solve?
- Deduplication of organisation data: TED data often contains multiple entries for the same organisation, which makes it difficult to analyse the data. We use Splink (opens in a new tab) for our data linkage.
- EUR Conversion: TED data often contains multiple currencies. This makes it difficult to compare contracts. So, we convert all the amounts to EUR using the historical conversion rate. We use frankfurter.app (opens in a new tab) for this purpose.
- Graph Analytics: TED data is published in a complex XML format. The new eForms is even more expansive to fit complicated legal requirements. To understand relationships between public bodies and their supply networks, we need to simplify the data model. We have developed the OCDSGraph format for this purpose. OCDSGraph is an opinionated extension of OCDS that is designed to make it easier to do graph analytics on the data. We rely on KuzuDB (opens in a new tab) for storing and querying OCDSGraph data.
What's inside?
TED XML notices and TED eForms are downloaded and parsed by TEDective ETL into KuzuDB (opens in a new tab). An API built with FastAPI (opens in a new tab) sits in front of this database and provides access to OCDS entities, such as organizations, awards, releases or contracts. On the top of it there is an experimental explorer UI built with Next.js (opens in a new tab) and react-force-graph (opens in a new tab).
What other tools exist?
-
TheyBuyForYou (opens in a new tab) (a project by "a consortium of 10 leading companies, universities, research centres, government departments and local authorities in the UK, Norway, Italy, Spain and Slovenia" funded by the EU Horizon 2020 (opens in a new tab) programme. The project cost the EU around €3.3 million and was developed over two years until December 2020. It is now largely dysfunctional and out-of-date. Some code seems to be publicly available (opens in a new tab) but is provided without an explicit license)
-
DigiWhist's (opens in a new tab) opentender.eu (opens in a new tab) (seems somewhat abandoned, repo is still lightly maintained. Data is updated less than once a month and the frontend code is not open-source. One of the DigiWhist researchers foudned TenderX (opens in a new tab), a private for-profit tender/company data offering)
ℹ️This dataset seems to be used (opens in a new tab) by the OCDS tool for scraping globally available OCDS data releases.
-
TenderBase (opens in a new tab) (seems abandoned, last commit in 2018, but still publishes up-to-date data on their website http://www.tenderbase.eu (opens in a new tab), Warning: no SSL)
-
OpenTED (opens in a new tab) (seems abandoned, last commit 2015; didn't work with OCDS as it wasn't developed at the time)
-
opented (opens in a new tab) (very old attempt at parsing TED data that didn't turn out to work)
-
OpenTED Browser (opens in a new tab) (an academic paper about)
-
ExtracTED (opens in a new tab) (according to the README, this was used to parse data between 2014-2016; last commit 5 years ago)
-
eu-hack (opens in a new tab) (last commit 15 months ago, author is a data scientist at Amazon and target format is CSV, I could not run his code and achieve an error-free parsing of more recent TED data)
History
Check out our blog for some of the project's history.
This section is work-in-progress. Please stay tuned!