Graph-native public procurement data format
TEDective's OCDSGraph format is a custom extension of the OCDS (opens in a new tab) specification. Using OCDSGraph, we simplify the data focusing on the flow of money and other organisational dynamics between public administrations and their supplier networks.
OCDSGraph is based on the property graph model, which is a natural fit for procurement data because it allows us to represent complex relationships between buyers and suppliers. Here is the basic structure of OCDSGraph:
- Nodes: Represent entities such as organizations, tenders, awards, and contracts.
- Edges: Represent relationships between entities. For example, an edge might connect a tender to the organization that issued it.
From TED to OCDSGraph
We will be using the ITER (opens in a new tab) project as an example for the whole process. From parsing XML to querying OCDSGraph.
ITER, which in Latin means ‘the way,’ will be the world’s biggest experiment on the path to fusion energy. It will be the first fusion device to generate more heat than used to start the fusion reaction, relying on an range of technologies which are essential to deliver fusion power in future. Using TEDective, it's possible to explore such interesting projects and dive deeper into the inner workings of them and their funding and spending.
It comes in two forms: XML and eForms. Currently as of this date (10th April, 2024) TEDective is not able to parse eForms.
This is what the European Union uploads:
<CODED_DATA_SECTION>
<REF_OJS>
<COLL_OJ>S</COLL_OJ>
<NO_OJ>1</NO_OJ>
<DATE_PUB>20170103</DATE_PUB>
</REF_OJS>
<NOTICE_DATA>
<NO_DOC_OJS>2017/S 001-000001</NO_DOC_OJS>
<URI_LIST>
<URI_DOC LG="DA">http://ted.europa.eu/udl?uri=TED:NOTICE:000001-2017:TEXT:DA:HTML</URI_DOC>
...
<URI_DOC LG="HR">http://ted.europa.eu/udl?uri=TED:NOTICE:000001-2017:TEXT:HR:HTML</URI_DOC>
</URI_LIST>
<LG_ORIG>EN</LG_ORIG>
<ISO_COUNTRY VALUE="ES"/>
<IA_URL_GENERAL>http://www.fusionforenergy.europa.eu/</IA_URL_GENERAL>
<IA_URL_ETENDERING>https://industryportal.f4e.europa.eu/IP_PAGES/ADM.aspx</IA_URL_ETENDERING>
<ORIGINAL_CPV CODE="92100000">Motion picture and video services</ORIGINAL_CPV>
<ORIGINAL_CPV CODE="79960000">Photographic and ancillary services</ORIGINAL_CPV>
<ORIGINAL_NUTS CODE="FR">FRANCE</ORIGINAL_NUTS>
<CA_CE_NUTS CODE="ES511">Barcelona</CA_CE_NUTS>
<VALUES>
<VALUE TYPE="ESTIMATED_TOTAL" CURRENCY="EUR">360000.00</VALUE>
</VALUES>
</NOTICE_DATA>
<CODIF_DATA>
<DS_DATE_DISPATCH>20161216</DS_DATE_DISPATCH>
<DT_DATE_FOR_SUBMISSION>20170213 17:00</DT_DATE_FOR_SUBMISSION>
<AA_AUTHORITY_TYPE CODE="5">European Institution/Agency or International Organisation</AA_AUTHORITY_TYPE>
<TD_DOCUMENT_TYPE CODE="3">Contract notice</TD_DOCUMENT_TYPE>
<NC_CONTRACT_NATURE CODE="4">Services</NC_CONTRACT_NATURE>
<PR_PROC CODE="1">Open procedure</PR_PROC>
<RP_REGULATION CODE="3">European Institution/Agency or International Organisation</RP_REGULATION>
<TY_TYPE_BID CODE="1">Submission for all lots</TY_TYPE_BID>
<AC_AWARD_CRIT CODE="2">The most economic tender</AC_AWARD_CRIT>
<MA_MAIN_ACTIVITIES CODE="8">Other</MA_MAIN_ACTIVITIES>
<HEADING>AGC02</HEADING>
<INITIATOR>AG</INITIATOR>
<DIRECTIVE VALUE="2014/24/EU"/>
</CODIF_DATA>
</CODED_DATA_SECTION>
... Continuing with the other translations
Unfortunately, you can not spot the name of the buyer or any contact information until about 500 lines below in the english translation part:
<CONTRACTING_BODY>
<ADDRESS_CONTRACTING_BODY>
<OFFICIALNAME>The European Joint Undertaking for ITER and the Development of Fusion Energy (‘Fusion for Energy’)</OFFICIALNAME>
<ADDRESS>Carrer Josep Pla, 2, Torres Diagonal Litoral, B3</ADDRESS>
<TOWN>Barcelona (Barcelona)</TOWN>
<POSTAL_CODE>08019</POSTAL_CODE>
<COUNTRY VALUE="ES"/>
<PHONE>+34 933201800</PHONE>
<E_MAIL>tenders-adm@f4e.europa.eu</E_MAIL>
<NUTS CODE="ES511"/>
<URL_GENERAL>http://www.fusionforenergy.europa.eu/</URL_GENERAL>
<URL_BUYER>https://industryportal.f4e.europa.eu/IP_PAGES/ADM.aspx</URL_BUYER>
</ADDRESS_CONTRACTING_BODY>
<DOCUMENT_FULL/>
<URL_DOCUMENT>https://industryportal.f4e.europa.eu/IP_PAGES/ADM.aspx</URL_DOCUMENT>
<ADDRESS_FURTHER_INFO_IDEM/>
<ADDRESS_PARTICIPATION_IDEM/>
<URL_TOOL>https://industryportal.f4e.europa.eu/IP_PAGES/ADM.aspx</URL_TOOL>
<CA_TYPE VALUE="EU_INSTITUTION"/>
<CA_ACTIVITY_OTHER>Fusion energy.</CA_ACTIVITY_OTHER>
</CONTRACTING_BODY>
<OBJECT_CONTRACT>
<TITLE>
<P>Provision of audiovisual and photographic services to Fusion for Energy.</P>
</TITLE>
<REFERENCE_NUMBER>F4E-AFC-0786.</REFERENCE_NUMBER>
<CPV_MAIN>
<CPV_CODE CODE="92100000"/>
</CPV_MAIN>
<TYPE_CONTRACT CTYPE="SERVICES"/>
<SHORT_DESCR>
<P>The objective of this call for tenders is to conclude a framework service contract (‘FWC’ or ‘contract’) to deliver audiovisual and photographic services that will report on the progress of Europe's contribution to ITER project, through short film clips/videos and pictures exported in various formats.</P>
<P>The contractor will be responsible for the video recording, pre- and post-production.</P>
<P>No purchase of equipment shall be financed under this contract: the contractor shall have the resources and equipment to execute the photo, video and audio recording in line with the latest industry standards.</P>
</SHORT_DESCR>
<VAL_ESTIMATED_TOTAL CURRENCY="EUR">360000.00</VAL_ESTIMATED_TOTAL>
<NO_LOT_DIVISION/>
<OBJECT_DESCR ITEM="1">
<CPV_ADDITIONAL>
<CPV_CODE CODE="79960000"/>
</CPV_ADDITIONAL>
<NUTS CODE="FR"/>
<MAIN_SITE>
<P>The contractor will be responsible for the filming and shooting at the ITER site in Cadarache (France) and in other locations where F4E components are manufactured within Europe.</P>
</MAIN_SITE>
<SHORT_DESCR>
<P>Estimated per year:</P>
<P>— 4 to 6 short videos to report the progress of the work on the ITER site in Cadarache (France),</P>
<P>— 2 to 8 short videos to report progress on the production of the ITER components,</P>
<P>— 1 longer and more creative video to report on a more in-depth topic (such as the achievement of the past year),</P>
<P>— 4 to 8 photo sessions on the ITER worksite in Cadarache (France).</P>
<P>For more details please consult the tender specifications,</P>
<P>— 2 to 8 photo sessions to report on the progress of ITER components' production.</P>
</SHORT_DESCR>
<AC_PROCUREMENT_DOC/>
<VAL_OBJECT CURRENCY="EUR">360000.00</VAL_OBJECT>
<DURATION TYPE="MONTH">48</DURATION>
<RENEWAL/>
<RENEWAL_DESCR>
<P>12 months, renewable up to 3 times for further periods of 12 months, with a total maximum duration of 48 months.</P>
</RENEWAL_DESCR>
<NO_ACCEPTED_VARIANTS/>
<NO_OPTIONS/>
<NO_EU_PROGR_RELATED/>
</OBJECT_DESCR>
</OBJECT_CONTRACT>
-
Apart from the fact that it's quite difficult to read, there is also a lot of superfluous data.
-
It starts to get even more complex once we start including relationships between tenders and the public bodies that announce them or between awards and the organizations who win them. The fact that TED Data does no deduplication on organizations makes it even more difficult to work with.
-
TEDective tries to improve the situation by first converting to OCDS, and then loading a subset of this data into a graph database via the OCDSGraph format. The goal is to make this data more accessible for network analysis.
OCDS & OCDSGraph
The Open Contracting Data Standard (OCDS) (opens in a new tab) is a data standard for public contracting. It is designed to make public procurement data more open, accessible, and usable. OCDS is a JSON-based format that is used to publish data on public contracting processes, including planning, tendering, awarding, and contracting.
However, OCDS is not designed for network analysis. To make it easier to do graph analytics on the data, we have developed the OCDSGraph format. OCDSGraph is an opinionated extension of OCDS that is designed to make it easier to do graph analytics on the data. We rely on KuzuDB (opens in a new tab) for storing and querying OCDSGraph data.
Here is an example of several nodes (Organizations, awards etc.) as they're represented in OCDSGraph:
{
"nodes": [
{
"_label": "Organization",
"name": "The European Joint Undertaking for ITER and the Development of Fusion Energy ('Fusion for Energy')",
"id": "067473cf-166c-5d39-88f1-352f106cc448",
"roles": "buyer",
"identifierNationalID": null,
"addressStreetAddress": "Carrer Josep Pla, 2, Torres Diagonal Litoral, B3",
"addressLocality": "Barcelona (Barcelona)",
"addressRegion": "ES511",
"addressPostalCode": "08019",
"addressCountryName": "ES",
"contactPointName": null,
"contactPointEmail": "tenders-adm@f4e.europa.eu",
"contactPointTelephone": "+34 933 20 18 00",
"contactPointFaxNumber": null,
"contactPointUrl": "http://www.fusionforenergy.europa.eu/",
"detailsUrl": "http://www.fusionforenergy.europa.eu/"
},
{
"_label": "Release",
"id": "000001-2017",
"ocid": "ocds-jyvdv7-1f7ec846-d8d4-532f-864b-638fa815c1eb",
"date": "2017-01-03",
"initiationType": "tender",
"language": "en",
"tedURL": "https://ted.europa.eu/udl?uri=TED:NOTICE:000001-2017:TEXT:EN:HTML"
},
{
"_label": "Tender",
"id": "58620a55-4c4b-4f38-8ba3-cd08d1343fee",
"ocid": null,
"date": null,
"initiationType": null,
"language": null,
"tedURL": null,
"name": null,
"roles": null,
"identifierNationalID": null,
"addressStreetAddress": null,
"addressLocality": null,
"addressRegion": null,
"addressPostalCode": null,
"addressCountryName": null,
"contactPointName": null,
"contactPointEmail": null,
"contactPointTelephone": null,
"contactPointFaxNumber": null,
"contactPointUrl": null,
"detailsUrl": null,
"title": "Provision of audiovisual and photographic services to Fusion for Energy [Provision of audiovisual and photographic services to Fusion for Energy.]",
"description": "The objective of this call for tenders is to conclude a framework service contract ('FWC' or 'contract') to deliver audiovisual and photographic services that will report on the progress of Europe's contribution to ITER project, through short film clips/videos and pictures exported in various formats.",
"status": "active",
"valueAmount": 360000,
"valueCurrency": "EUR",
"dateSigned": null
}
]
}
We can see the whole chain of connection. In the example we have chosen to showcase consists only of a organization, release, and tender:
A more completed chain would look something like this:
In this more completed part we can see all elements: - Release (Orange) - Awards (Blue) - Contract (Green) - Tender (Purple) - Suplier which is the organization in the top left (Transparent Red)