{"id":74617,"date":"2024-04-16T06:00:00","date_gmt":"2024-04-16T13:00:00","guid":{"rendered":"https:\/\/blog.mozilla.org\/?p=74617"},"modified":"2024-08-27T12:31:59","modified_gmt":"2024-08-27T19:31:59","slug":"open-source-llms-large-language-models-mozilla-ai","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/","title":{"rendered":"Open source in the age of LLMs"},"content":{"rendered":"\n<p>(To read the complete Mozilla.ai publication featuring all our OSS contributions, please visit the <a href=\"https:\/\/blog.mozilla.ai\/open-source-in-the-age-of-llms\/\">Mozilla.ai blog<\/a>)<\/p>\n\n\n\n<p class=\"has-text-align-left\">Like our parent company, Mozilla.ai\u2019s <a href=\"https:\/\/blog.mozilla.ai\/introducing-mozilla-ai-investing-in-trustworthy-ai\/\">founding story<\/a> is rooted in open-source principles and community collaboration. Since our start last year, our key focus has been exploring state-of-the-art methods for evaluating and fine-tuning large-language models (LLMs).<\/p>\n\n\n\n<p>Throughout this process, we\u2019ve been diving into the open-source ecosystem around LLMs.&nbsp; What we\u2019ve found is an electric environment where everyone is building. As Nathan Lambert <a href=\"https:\/\/www.interconnects.ai\/p\/they-want-to-learn\">writes in his post<\/a>, \u201cIt\u2019s 2024, and they just want to learn.\u201d<\/p>\n\n\n\n<p>\u201cWhile everything is on track across multiple communities, that also unlocks the ability for people to tap into excitement and energy that they\u2019ve never experienced in their career (and maybe lives).\u201d<\/p>\n\n\n\n<p>The energy in the space, with <a href=\"https:\/\/originality.ai\/blog\/huggingface-statistics\">new model releases every day<\/a>, is made even more exciting by the promise of open source where, as I\u2019ve <a href=\"https:\/\/twitter.com\/vboykis\/status\/1741270933979984052\">observed before<\/a>, anyone can make a contribution and have it be meaningful regardless of credentials, and there are plenty of contributions to be made. If the fundamental question of the web is, \u201c<a href=\"https:\/\/www.ftrain.com\/wwic\">Why wasn\u2019t I consulted<\/a>,\u201d open-source in machine learning today offers the answer, \u201cYou are as long as you can productively contribute PRs, come have a seat at the table.\u201d<\/p>\n\n\n\n<p>Even though some of us have been active in open-source work for some time, building and contributing to it at a team and company level is a qualitatively different and rewarding feeling. And it&#8217;s been especially fun watching upstream make its way into both the communities and our own projects.&nbsp;<\/p>\n\n\n\n<p>At a high level, here\u2019s what we\u2019ve learned about the process of successful open-source contributions:&nbsp;<\/p>\n\n\n\n<p><strong>1. <a href=\"https:\/\/jvns.ca\/blog\/2017\/08\/06\/contributing-to-open-source\/\">Start small<\/a> when you\u2019re starting with a new project. <\/strong>If you\u2019re contributing to a new project for the first time, it takes time to understand the project\u2019s norms: how fast they review, who the key people are, their preferences for communication, code review style, build systems, and more. It\u2019s like starting a new job entirely from scratch. <br><br>Be gentle with both yourself and the reviewers and pick something like a documentation task, or a \u201cgood first issue\u201d label just to get a feel for how things work.<\/p>\n\n\n\n<p><strong>2. Be easy to work with. <\/strong>There are specific norms around working with open source, and they closely <a href=\"https:\/\/jacobian.org\/2017\/nov\/1\/you-have-two-jobs\/\">follow this fantastic post<\/a> of understanding how to be an effective developer &#8211; \u201cAs a developer you have two jobs: to write code, and be easy to work with.\u201d<\/p>\n\n\n\n<p>In open source, being easy to work with means different things to different people, but I generally see it as:<\/p>\n\n\n\n<p>a. Submitting clean PRs with working code that passes tests or gets as close as possible. No one wants to fix your build.&nbsp;<\/p>\n\n\n\n<p>b. Making small code changes by yourself, and proposing larger architecture changes in a group before getting them down in code for approval. Asking \u201cWhat do you think about this?\u201d Always try to also propose a solution instead of posing more problems to maintainers: they are busy!<\/p>\n\n\n\n<p>c. Write unit tests if you\u2019re adding a significant feature, where significant is anything more than a single line of code.&nbsp;<\/p>\n\n\n\n<p>d. Remembering <a href=\"https:\/\/www.meyerperin.com\/posts\/2022-04-02-chestertons-fence.html\">Chesterton\u2019s fence<\/a>: that code is there for a reason, study it before you suggest removing it.&nbsp;<\/p>\n\n\n\n<p><strong>3. Assume good intent, but make intent explicit. <\/strong> When you\u2019re working with people in writing, asynchronously, potentially in other countries or timezones, it\u2019s extremely easy for context, tone, and intent to get lost in translation. Implicit knowledge <a href=\"https:\/\/vickiboykis.com\/2021\/03\/26\/the-ghosts-in-the-data\/\">becomes rife.<\/a>&nbsp; Assume people are doing the best they can with what they have, and if you don\u2019t understand something, ask about it first.<\/p>\n\n\n\n<p><strong>4. The AI ecosystem moves quickly.<\/strong> Extremely quickly. New models come out every day and are implemented in downstream modules by tomorrow. Make sure you\u2019re ok with this speed and match the pace. Something you can do before you do PRs is to follow issues on the repo, and follow the repo itself so you get a sense for how quickly things move\/are approved. If you\u2019re into fast-moving projects, jump in. Otherwise, pick one that moves at a slower cadence.<\/p>\n\n\n\n<p><strong>5. The LLM ecosystem is currently bifurcated between HuggingFace and OpenAI compatibility: <\/strong>An <a href=\"https:\/\/vickiboykis.com\/2024\/02\/28\/gguf-the-long-way-around\/#how-we-use-llm-artifacts\">interesting pattern has developed<\/a> in my development work on open-source in LLMs. It\u2019s become clear to me that, in this new space of developer tooling around transformer-style language models at an industrial scale, you are generally conforming to be downstream of one of two interfaces:<\/p>\n\n\n\n<p>a. models that are trained and hosted using HuggingFace libraries and particularly the HuggingFace hub as infrastructure.<\/p>\n\n\n\n<p>b. Models that are available via API endpoints, particularly as hosted by OpenAI.<br><br>If you want to be successful in this space today, you as a library or service provider have to be able to interface with both of these.&nbsp;<\/p>\n\n\n\n<p><strong>6. Sunshine is the best disinfectant. <\/strong>As the recent <a href=\"https:\/\/boehs.org\/node\/everything-i-know-about-the-xz-backdoor\">xz issue proved<\/a>, open code is better code and issues get fixed more quickly. This means, don\u2019t be afraid to work out in the open. All code has bugs, even yours and mine, and discovering those bugs is a natural process of learning and developing better code rather than a personal failing.&nbsp;<\/p>\n\n\n\n<p>We\u2019re looking forward to both continuing our contributions, upstreaming them and learning from them as we continue our product development work.&nbsp;<br><br>Read the whole publication and subscribe to future ones on the <a href=\"http:\/\/mozilla.ai\">Mozilla.ai blog<\/a>.<\/p>\n\n\n","protected":false},"excerpt":{"rendered":"<p>(To read the complete Mozilla.ai publication featuring all our OSS contributions, please visit the Mozilla.ai blog) Like our parent company, Mozilla.ai\u2019s founding story is rooted in open-source principles and community collaboration. Since our start last year, our key focus has been exploring state-of-the-art methods for evaluating and fine-tuning large-language models (LLMs). Throughout this process, we\u2019ve [&hellip;]<\/p>\n","protected":false},"author":144,"featured_media":74618,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[464197],"tags":[317823,464225,464224],"coauthors":[464234],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v22.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Open Source In The Age Of LLMs<\/title>\n<meta name=\"description\" content=\"The process of providing successful open-source contributions has multiple important layers. Here&#039;s what we&#039;ve learned.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/\",\"url\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/\",\"name\":\"Open Source In The Age Of LLMs\",\"isPartOf\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2024\/04\/photo-1529061498291-c978a148ee6e.jpg\",\"datePublished\":\"2024-04-16T13:00:00+00:00\",\"dateModified\":\"2024-08-27T19:31:59+00:00\",\"author\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/33edd7d4d73723140487082573041c83\"},\"description\":\"The process of providing successful open-source contributions has multiple important layers. Here's what we've learned.\",\"breadcrumb\":{\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/#primaryimage\",\"url\":\"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2024\/04\/photo-1529061498291-c978a148ee6e.jpg\",\"contentUrl\":\"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2024\/04\/photo-1529061498291-c978a148ee6e.jpg\",\"width\":2000,\"height\":1333,\"caption\":\"Photo by Wes Hicks \/ Unsplash\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/blog.mozilla.org\/en\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Open source in the age of LLMs\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/#website\",\"url\":\"https:\/\/blog.mozilla.org\/en\/\",\"name\":\"The Mozilla Blog\",\"description\":\"News and Updates about Mozilla\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blog.mozilla.org\/en\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/33edd7d4d73723140487082573041c83\",\"name\":\"Mozilla\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/image\/f32381c01597770b1131dff44b9d6de1\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f84bd67e8e3ab3bcc9676910aecf5700?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/f84bd67e8e3ab3bcc9676910aecf5700?s=96&d=mm&r=g\",\"caption\":\"Mozilla\"},\"url\":\"https:\/\/blog.mozilla.org\/en\/author\/mozilla\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Open Source In The Age Of LLMs","description":"The process of providing successful open-source contributions has multiple important layers. Here's what we've learned.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/","schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/","url":"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/","name":"Open Source In The Age Of LLMs","isPartOf":{"@id":"https:\/\/blog.mozilla.org\/en\/#website"},"primaryImageOfPage":{"@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/#primaryimage"},"image":{"@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/#primaryimage"},"thumbnailUrl":"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2024\/04\/photo-1529061498291-c978a148ee6e.jpg","datePublished":"2024-04-16T13:00:00+00:00","dateModified":"2024-08-27T19:31:59+00:00","author":{"@id":"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/33edd7d4d73723140487082573041c83"},"description":"The process of providing successful open-source contributions has multiple important layers. Here's what we've learned.","breadcrumb":{"@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/#primaryimage","url":"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2024\/04\/photo-1529061498291-c978a148ee6e.jpg","contentUrl":"https:\/\/blog.mozilla.org\/wp-content\/blogs.dir\/278\/files\/2024\/04\/photo-1529061498291-c978a148ee6e.jpg","width":2000,"height":1333,"caption":"Photo by Wes Hicks \/ Unsplash"},{"@type":"BreadcrumbList","@id":"https:\/\/blog.mozilla.org\/en\/mozilla\/ai\/open-source-llms-large-language-models-mozilla-ai\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/blog.mozilla.org\/en\/"},{"@type":"ListItem","position":2,"name":"Open source in the age of LLMs"}]},{"@type":"WebSite","@id":"https:\/\/blog.mozilla.org\/en\/#website","url":"https:\/\/blog.mozilla.org\/en\/","name":"The Mozilla Blog","description":"News and Updates about Mozilla","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blog.mozilla.org\/en\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/33edd7d4d73723140487082573041c83","name":"Mozilla","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/blog.mozilla.org\/en\/#\/schema\/person\/image\/f32381c01597770b1131dff44b9d6de1","url":"https:\/\/secure.gravatar.com\/avatar\/f84bd67e8e3ab3bcc9676910aecf5700?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/f84bd67e8e3ab3bcc9676910aecf5700?s=96&d=mm&r=g","caption":"Mozilla"},"url":"https:\/\/blog.mozilla.org\/en\/author\/mozilla\/"}]}},"_links":{"self":[{"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/posts\/74617"}],"collection":[{"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/users\/144"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/comments?post=74617"}],"version-history":[{"count":0,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/posts\/74617\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/media\/74618"}],"wp:attachment":[{"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/media?parent=74617"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/categories?post=74617"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/tags?post=74617"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/blog.mozilla.org\/en\/wp-json\/wp\/v2\/coauthors?post=74617"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}