This repo collects all the code and examples to use the MOSTLY AI SDK in AWS Sagemaker.
The MOSTLY AI SDK is an open source library, which generates synthetic data that is highly representative, highly realistic, and considered 'as good as real'. While maintaining high accuracy and protecting the privacy of your subjects, you can openly process and share the generated synthetic data with others.
Benefits
- Trust and Risk Mitigation: By utilizing synthetic data generated by the platform, you can effectively mitigate privacy risks associated with sensitive information. Comply with data protection regulations and build trust with stakeholders.
- AI Explainability: With the installation of the MOSTLY AI Synthetic Data Platform, enhance the explainability of your AI models. Leverage synthetic data that accurately represents your original data's statistical characteristics and referential integrity.
- Fraud and Anomaly Detection Training: Our product empowers you to train your fraud and anomaly detection models using the generated synthetic data. Enhance your ability to identify and mitigate potential threats effectively.
- Bias Mitigation: Leverage the power of the MOSTLY AI Synthetic Data Platform to address bias in your data. Generate diverse and representative synthetic datasets to foster fairness and inclusivity in your AI applications.
Use Cases
- Data Democratization: Empower your organization by democratizing access to privacy-compliant synthetic data. Enable stakeholders across departments to leverage valuable insights securely.
- Data Anonymization: Protect the privacy of your subjects and comply with data privacy regulations. Utilize synthetic data that is immune to re-identification attacks.
- Realistic Test Data: Generate synthetic data that accurately reflects real-world scenarios. Enable comprehensive testing and validation of your systems and algorithms.
- Fairness and Explainability: Use synthetic data to address bias and enhance transparency in AI models. Ensure fair and explainable outcomes.
- Cross-Border Data Sharing: Safely share synthetic data across borders, overcoming legal and privacy barriers. Preserve the value and representativeness of the original data.
- Data Augmentation: Amplify your datasets with synthetic data to increase sample sizes, improve model performance, and explore "what-if" scenarios.
- Data Diversity: Foster diversity in your datasets by generating synthetic data that captures a wide range of characteristics. Ensure comprehensive and unbiased analysis.
Copy the example folder to your notebook folder. Use the notebook mostlyai-sdk-sagemaker-example.ipynb
to run through the example.