GitHub jumps on the bandwagon and will use your data to train AI

GitHub updated how it uses data to improve AI-powered coding assistance. Starting April 24, interaction data from Copilot Free, Pro, and Pro+ users may be used to train and improve GitHub’s models unless users opt out. Copilot Business and Copilot Enterprise users are not included in this change.

GitHub Copilot privacy update

Users who have already opted out do not need to take further action, as their preferences remain in place.

If users choose to participate, GitHub may use interaction data to improve model performance. This includes better understanding of development workflows, generating more accurate code suggestions, and identifying potential issues earlier.

GitHub previously trained its models on publicly available data and curated code samples. The company now uses user interaction data, where permitted, to refine its models.

The company may collect prompts sent to Copilot, generated suggestions, accepted or modified outputs, code context, comments and documentation, file names, repository structure, and feedback on suggestions. This data supports service operation and, if enabled by the user, model training.

Interaction data from Copilot Business and Enterprise users, users who have opted out, and enterprise-owned repositories is not used for training. GitHub states that content from private repositories, issues, and discussions “at rest” is not used to train models. Copilot processes code during use to provide suggestions, and data from these interactions may be used for training only if the user has not opted out.

The company may share data used for model improvement with its affiliates, including Microsoft. It does not share this data with independent third-party AI model providers.

“We believe the future of AI-assisted development depends on real-world interaction data from developers like you. It’s why we’re using Microsoft interaction data for model training and will begin using interaction data from GitHub employees as well,” Mario Rodriguez, CPO at GitHub, said.

“If you choose to help us improve our models with your interaction data, thank you. Your contributions make a meaningful difference in building AI tools that serve the entire developer community. If you prefer not to participate, that’s fine too—you will still be able to take full advantage of the AI features you know and love,” Rodriguez concluded.

Don't miss