🚧 This website is still under construction. Please stay tuned. 🚧
Developers
ETL Pipeline
eForms Processing

eForms processing

The processing starts with the Luigi scheduler executing ProcessTEDNotices job, which is dependent on all notices being downloaded and currency rates being fetched (more on that here).

The diagram below shows the high level representation of the flow. Processing flow

Form type detection

The form type detection is based on searching for following patterns:

# For TED XML
.//NOTICE_DATA/NO_DOC_OJS

# For eForms
.//cbc:NoticeTypeCode

Supported TED XML for types are: F01, F02, and F03 types.

Supported eForms types are:

"planning",
"competition",
"change",
"result",
"dir-awa-pre",
"cont-modif"

For more detailed documentation about eForms types and their actual meaning we refer to the official eForms SDK documentation (opens in a new tab).

Transforming to OCDS

The process of transforming the TED XML data is described briefly in the OCDSGraph section.

While there is lots of information that eForms bring about the tendering process, currently ETL focuses on retrieving the most important OCDS objects out of single notice:

Finding and mapping Release

That's how the initial Release-OCDS object looks like:

release = ocds.Release(
  id=str(uuid.uuid5(uuid.NAMESPACE_URL, doc_id)),
  ocid=ocid_prefix + str(uuid.uuid5(uuid.NAMESPACE_URL, doc_id)),
  language="en",
  date=date,
  initiationType=ocds.InitiationType.tender,
  tag=[ocds.TagEnum.tender],
  tedURL=ted_url,
)
ℹ️

Note that XML parser needs to handle namespaces (opens in a new tab), this is done by processing function, and in this document we treat namespaces as taken care of.

The document ID is retrieved from .//efbc:NoticePublicationID and the issue date from .//cbc:IssueDate. Before proceding to the parsing Tender, we determine Tender's tag and map it using this table:

form_type_mapping = {
    "change": {"tags": [ocds.TagEnum.tenderUpdate], "tender_status": None},
    "competition": {"tags": [ocds.TagEnum.tender], "tender_status": ocds.Status.active},
    "cont-modif": {"tags": [ocds.TagEnum.awardUpdate, ocds.TagEnum.contractUpdate], "tender_status": None},
    "dir-awa-pre": {"tags": [ocds.TagEnum.award, ocds.TagEnum.contract], "tender_status": ocds.Status.complete},
    "planning": {"tags": [ocds.TagEnum.tender], "tender_status": ocds.Status.planned},
    "result": {"tags": [ocds.TagEnum.award, ocds.TagEnum.contract], "tender_status": ocds.Status.complete},
}

Finding and mapping Tender

That's how Tender OCDS object looks like within the ETL:

tender = ocds.Tender(
  id=str(uuid.uuid4()),
  status=tender_status,
  title=form.find(".//cac:ProcurementProject/cbc:Name", namespaces).text,
  description=description,
  language="en",
  legalBasis=ocds.Classification(scheme="CELEX"),
  value=extractors.extract_eform_value(form, "tender", date, namespaces),
)

Note that Value is extracted separately which will be described in dedicated section. Tender's title is obtain from .//cac:ProcurementProject/cbc:Name and description from .//cac:ProcurementProject/cbc:Note if present.

Finding buyer

Each release (and tender) specifies its buyer organization. In order to find it, ETL first gather all organizations listed in given notice: .//efac:Organizations/efac:Organization and then iterates through list of those, searching for organization who's ID (obtained from ./efac:Company/cac:PartyIdentification/cbc:ID) is in .//cac:ContractingParty/cac:Party/cac:PartyIdentification/cbc:ID.

Finding supplier(s)

Suppliers are extracted from .//efac:NoticeResult/efac:TenderingParty/efac:Tenderer/cbc:ID. If found the following details are extracted:

name = org.find("./efac:Company/cac:PartyName/cbc:Name", namespaces).text
identifier = ocds.Identifier(
  scheme="National-ID",
  id=str(org.find("./efac:Company/cac:PartyLegalEntity/cbc:CompanyID", namespaces).text) if org.find("./efac:Company/cac:PartyLegalEntity/cbc:CompanyID", namespaces) is not None else str(""),
)
address = extractors.extract_eform_address(org, namespaces)
contactPoint = extractors.extract_eform_contactPoint(org, namespaces)
details = None

supplier = ocds.Organization(
  id = str(uuid.uuid5(uuid.NAMESPACE_URL, org_id)),
  roles={"supplier"}
)
supplier.name = name
supplier.identifier = identifier
supplier.address = address
supplier.contactPoint = contactPoint
supplier.details = details

Note that similarly to the Value for Tender, ContactPoint and Address are extracted using dedicated function. The reason behind it, is that those attributes are in fact OCDS objects on their own, so we moved their extraction to the dedicated extractor functions to improve readability.

Finding Contract and Award(s)

This step is only done in case the field .//efac:NoticeResult/efac:LotResult of a notice is present. Otherwise both fields are empty sets, they are still present as they are required in Release OCDS object.

Finding Award(s)

The award needs an ID (generated during processing), title, which is extracted from .//efac:NoticeResult/efac:SettledContract/cbc:Title and status which is set to active if an award is present.

Finding Contract

Contract looks like that as an OCDS object:

contract = ocds.Contract(
  id=str(uuid.uuid4()),
  awardID=str(award_id),
  title=award_title,
  status=ocds.ContractStatus("active"),
  period=None, # Not important for current use usage of TEDective
  value=award_val,
  dateSigned=award_date,
  items=None, # Same as above
  documents=None, # Same as above
)

As may be seen, Contract object is tightly coupled with Award.

Extracting Value, ContactPoint and Address

Value, ContactPoint and Address are also separate OCDS objects, even if they are only "useful" as attributes of other objects. Therefore they have to be extracted separately from their "parent" objects.

Value

The function for extracting value is defined as follows: extract_eform_value(elem: etree._Element, ocds_object: str, date, namespaces: dict) -> Optional[ocds.Value]

It first checks which search path to use based on provided ocds_object argument. It can be either Tender or Award.

For a Tender it extracts following paths (if not empty):

  • amount: .//cac:ProcurementProject/cac:RequestedTenderTotal/cbc:EstimatedOverallContractAmount
  • currency .//cac:ProcurementProject/cac:RequestedTenderTotal/cbc:EstimatedOverallContractAmount/currencyID

For an Award it looks for these:

  • amount: .//efac:NoticeResult/cbc:TotalAmount
  • currency: .//efac:NoticeResult/cbc:TotalAmount/currencyID

Extracted values are passed to dedicated function that checks if currency is Euro, and if this is not the case, it translates the present amount to Euro based on conversion rate at the date when Tender/Award were issued. This step is described in dedicated section

ContactPoint

ContactPoint object consist of following attributes:

ocds.ContactPoint(
  name=name,
  telephone=phone,
  email=email,
  faxNumber=None,
  url=None
)

Fax number is no longer supplied within eForms, but it is still required by the OCDS, therefore hardcoded here as null value. Url is also set as null value from the similar reasons.

Rest values is extracted as follows:

  • name: ./efac:Company/cac:PartyName/cbc:Name
  • telephone: ./efac:Company/cac:Contact/cbc:Telephone
  • email: ./efac:Company/cac:Contact/cbc:ElectronicMail

Address

Address fields are extracted from following fields:

./efac:Company/cac:PostalAddress/cbc:StreetName
./efac:Company/cac:PostalAddress/cbc:CityName
./efac:Company/cac:PostalAddress/cbc:CountrySubentityCode
./efac:Company/cac:PostalAddress/cbc:PostalZone
./efac:Company/cac:PostalAddress/cac:Country/cbc:IdentificationCode