Stotles logo
Awarded

Media Authentication Evaluation Datasets

Published

Supplier(s)

Advanced Skills Initiative Ltd

Value

298,650 GBP

Description

Summary of the work This task should produce media authentication evaluation datasets - labelled, well-structured datasets of real and falsified media, across multiple modalities. These should include deepfakes, GAN-generated imagery, diffusion model outputs, image splicing, generated text, generated audio, and image-caption pairs. Additional subsets should be created by applying anti-forensic techniques. Expected Contract Length 3-4 Months Latest start date Monday 9 January 2023 Budget Range 300-350K Why the Work is Being Done The Machine Speed Strategic Analysis (MSSA) project seeks to apply AI to ISR of the sub-threshold information environment. Important aspects of this environment are online news and social media. Falsified content can be used within these domains to, e.g., spread disinformation or gather intelligence. We believe AI can be used to help check and validate content at scale, to help analysts in the open source intelligence space. To ensure that such AI techniques are trustworthy, we need to evaluate their performance using high-quality, unseen validation datasets. Problem to Be Solved Media authentication methods, including deepfake detection, often suffer from poor cross-dataset generalization. They can appear to perform well on the standard datasets on which they are trained, but then loose effectiveness when applied to 'in the wild' data. Before we can give any media authentication tools to analysts, we need to ensure that they will generalise well. To do this, we need large, bespoke datasets created using a variety of media synthesis methods including deepfakes, diffusion models, and text generation. These data must not match those found in standard datasets (such as DFDC), and should include some anti-forensic techniques. See SOR for further detail. Who Are the Users The users will be Dstl technical staff. We will use the datasets to evaluate the performance of tools and techniques produced separate to this contract. To ensure effectiveness, the datasets will need to be well-labelled and split into subsets corresponding to separate modalities. Further, modalities should be comprised of clean / unsynthesised, synthesised (no anti-forensics) and synthesised (with anti-forensics). Early Market Engagement N/A Work Already Done N/A Existing Team No additional supplier. Regular correspondence with Dstl technical partner. Current Phase Not applicable Skills & Experience • AI & data science experience - track record of applying state-of-the-art techniques , particularly in the fields of computer vision, image processing, and natural language generation. • Experience in the creation and curation of datasets, including an appreciation of licensing considerations and constraints. • Experience with the verification and validation of AI outputs (to ensure that high quality data are created). Nice to Haves • AI & data science experience - track record of applying state-of-the-art techniques , particularly in the fields of computer vision, image processing, and natural language generation. • Experience in the creation and curation of datasets, including an appreciation of licensing considerations and constraints. • Experience with the verification and validation of AI outputs (to ensure that high quality data are created). • Previous experience working with Dstl through other frameworks. • : Access to proprietary media dataset creation techniques, from which unique outputs can be produced. Work Location N/A Working Arrangments Remote work, with possibility for occasional face-to-face meetings if required. Regular catch-up via MS Teams (likely every two weeks or as discussed with Dstl technical partner). Security Clearance BPSC (work will be carried out at OFFICIAL) Additional T&Cs N/A No. of Suppliers to Evaluate 5 Proposal Criteria • Technical solution • Approach and methodology • How the approach or solution meets user needs • How the approach or solution meets your organisation's policy or goal • Estimated timeframes for the work • How they've identified risks and dependencies and offered approaches to manage them • Team structure • Value for money Cultural Fit Criteria • A willingness to hold regular catch-up / sprint meetings with Dstl technical staff, and the ability to adjust work based on feedback in such meetings. • Evidence of transparent and collaborative decision making • Willingness to use MS Teams for remote meetings Payment Approach Fixed price Evaluation Weighting Technical competence 70% Cultural fit 10% Price 20% Questions from Suppliers 1. Please confirm that only datasets will be a required output. Deliverables will be:1. Review of literature and open source tools to enable a plan for novel and unseen data to be produced. 2. Labelled, well-structure datasets of real and falsified media, across multiple modalities. 3. Report and presentation summarising the work undertaken. 2. Please confirm the expected classification level that curated and/or synthesised datasets would be held at. Classification is OFFICIAL 3. The first three questions in the nice to have section are the same as the essential skills question. Can you confirm if both sets of questions are to be answered or if the questions will be changed? Apologies the Essential skills are:"1) AI & data science experience - track record of applying state-of-the-art techniques , particularly in the fields of computer vision, image processing, and natural language generation.2) Experience in the creation and curation of datasets, including an appreciation of licensing considerations and constraints.3) Experience with the verification and validation of AI outputs (to ensure that high quality data are created). " Nice to have are: 1) Previous experience working with Dstl through other frameworks. 2) Access to proprietary media dataset creation techniques, from which unique outputs can be produced. 4. Please confirm that the repetition of the ‘essential skills and experience’ attributes under the ‘nice-to-have skills and experience’ criteria (with the additional of two further attributes) was made in error – Can we assume a repetition of our previous responses to the previous heading will be acceptable? This was made in error - Apologies the Essential skills are:"1) AI & data science experience - track record of applying state-of-the-art techniques , particularly in the fields of computer vision, image processing, and natural language generation.2) Experience in the creation and curation of datasets, including an appreciation of licensing considerations and constraints.3) Experience with the verification and validation of AI outputs (to ensure that high quality data are created). " 5. Please also confirm your preference for how these are to be delivered – i.e. will a data platform for storage/retrieval be required? 3rd party data not required. open source, unaltered data can be used to create the synthetic datasets. However, the use of data that has not previously been used e.g. deepfake datasets would be advantageous ensuring the authentication datasets are as representative of ‘in-the-wild’ conditions as possible. What we do not want is for our held back evaluation data to contain a large amount of data that are also likely to be used in the training of authentication algorithms. A data platform should not be required,anticipate that a structure e.g.zip archives will be sufficient. data may be delivered using encrypted drive. 6. Please confirm whether plans for purchasing 3rd party data is a must-have in the proposal, or whether a mixture of open-source and synthetic datasets would be sufficient. 3rd party data not required. open source, unaltered data can be used to create the synthetic datasets. However, the use of data that has not previously been used e.g. deepfake datasets would be advantageous ensuring the authentication datasets are as representative of ‘in-the-wild’ conditions as possible. What we do not want is for our held back evaluation data to contain a large amount of data that are also likely to be used in the training of authentication algorithms. A data platform should not be required,anticipate that a structure e.g.zip archives will be sufficient. data may be delivered using encrypted drive. 7. Of the questions posed, 3 of them seem to be repeated; 3 of the ‘essential skills’ are repeated in the nice-to-haves. Can you clarify whether this is intentional and if should respond to both sets? Apologies the Essential skills are:"1) AI & data science experience - track record of applying state-of-the-art techniques , particularly in the fields of computer vision, image processing, and natural language generation.2) Experience in the creation and curation of datasets, including an appreciation of licensing considerations and constraints.3) Experience with the verification and validation of AI outputs (to ensure that high quality data are created). "Nice to have are:1) Previous experience working with Dstl through other frameworks.2) Access to proprietary media dataset creation techniques, from which unique outputs can be produced. 8. • Cost of procuring datasets, who will bare this or do we factor into the 300-350k budget? o All costs should be factored into the £300-350k budget. To be clear, the use of existing open data for the category of ‘real’ datasets is not prohibited, though the addition of unseen data would be considered beneficial. 9. • Would you be able to provide further context on the statement ‘including an appreciation of licensing considerations and constraints’. o Some datasets, and some data generation / synthesis methods have non-commercial clauses in their licence agreements that prevent their use in this task. In some cases, alternate implementations of the same / similar techniques exist which avoid this constraint. Further, some tools are copy-left, meaning that we cannot modify any code, or use any code to create a new tool, without open-sourcing the output (not an acceptable option for us). As such, these copy-left tools can only be used ‘as-is’ to create data (which should largely not be a problem for us as this is a dataset creation task). 10. • Reference previous experience working with DSTL through other frameworks – can you confirm if this includes previous work with DSTL on DOS5? o Yes, the ‘other’ should be ignored in this instance. Experience working with Dstl through any commercial framework is welcomed 11. • ‘Access to proprietary media dataset creation techniques’. Please could you clarify if you have any proprietary techniques in mind, or whether you have already engaged with prospective suppliers who have proprietary techniques? o This was not written with any particular techniques in mind, nor have we engaged with suppliers in this regard. Rather, this means that if a supplier has already developed a proprietary generation technique (e.g., a deepfake creation algorithm) this would expedite the creation of bespoke dataset(s).

Timeline

Publish date

a year ago

Award date

a year ago

Buyer information

Explore contracts and tenders relating to Defence Science & Technology Laboratory

Go to buyer profile
To save this opportunity, sign up to Stotles for free.
Save in app
  • Looking glass on top of a file iconTender tracking

    Access a feed of government opportunities tailored to you, in one view. Receive email alerts and integrate with your CRM to stay up-to-date.

  • ID card iconProactive prospecting

    Get ahead of competitors by reaching out to key decision-makers within buying organisations directly.

  • Open folder icon360° account briefings

    Create in-depth briefings on buyer organisations based on their historical & upcoming procurement activity.

  • Teamwork iconCollaboration tools

    Streamline sales workflows with team collaboration and communication features, and integrate with your favourite sales tools.

Stop chasing tenders, start getting ahead.

Create your free feed

Explore other contracts published by Defence Science & Technology Laboratory

Explore more open tenders, recent contract awards and upcoming contract expiries published by Defence Science & Technology Laboratory.

Explore more suppliers to Defence Science & Technology Laboratory

Sign up