Train, tune and distribute models with generative AI and machine learning capabilities
Ways to Find Datasets: Dataset Search
nasa.data.gov: The NASA (National Aeronautics and Space Administration) open data portal offers tens of thousands of datasets which are often used in the annual NASA Space Apps Challenge
UNdata: You can find data about agriculture, crime, education, energy, industry, labor, national accounts, population and tourism at UNdata. The statistics available through UNdata are produced by United Nations Statistics and Population Divisions as well as other United Nations agencies.
License and Privacy Considerations
It is easier to use factual datasets such as measurements, tabular data, land mass, reservoirs, weather – avoiding personal data, such as names, pictures of people that may have privacy concerns which vary from country to country.
Occasionally you will find datasets which will state that they are for academic use only. The owners’ are usually fine with the dataset being used in a hackathon setting, but it is best to check. An example of such a dataset is a multimodal (image and text) Deep Learning For Disaster Response dataset https://gitlab.com/awadailab/crisis_multimodal> which states that it is available for download only for academic purposes. In this case, we have confirmed with the author that she is agreeable that the dataset may be used in hackathons, particularly those for social good. You can take a similar approach. And please note if you move on and start selling the software you created in the hackathon, or make it part of a product, then you should not use datasets that are marked for academic use.
Many datasets, where there is a license specified, will have a Creative Commons (CC) license. An example of such a dataset is the earthquake data EEW Be aware that the CC by NC variant means that the dataset cannot be used for commercial purposes.
Copy