Using DeepSeek for Guitar Pedal Power Tech Question

**Tal_175** · 01-30-2025, 11:31 AM

DeepSeek introduced approaches that make building and running models more accessible to universities and smaller companies. This not only means democratization of the technology but it also means a huge acceleration of its advance as now thousands of groups around the world will be able to work on improving it instead of a handful of billionaires.

There is an open question though. Does it mean there is no longer an inherent advantage to having billions of dollars to spend on hardware and electricity? What if this approach means that rich groups like OpenAI can develop systems that are orders of magnitude superior to current systems in short amount of time and still maintain their edge?

Well, my instinct is that there is only so much structure you can exploit in data with reinforcement learning and back propagation techniques. So if DeepSeek discovered a cheaper way of doing this, there will be a diminishing return to money and hardware thrown at the problem with our current understanding of what information is.

**nbevan3** · 01-30-2025, 11:45 AM

Originally Posted by Tal_175

DeepSeek introduced approaches that make building and running models more accessible to Universities and smaller companies. This not only means democratization of the technology but it also means a huge acceleration of its advance as now thousands of groups around the world will be able to work on improving it instead of a handful of billionaires.

There is an open question though. Does it mean there is no longer an inherent advantage to having billions of dollars to spend on hardware and electricity? What if this approach means that rich groups like OpenAI can develop systems that are orders of magnitude superior to current systems in short amount of time and still maintain their edge?

Well, my instinct is that there is only so much structure you can exploit in data with reinforcement learning and back propagation techniques. So if DeepSeek discovered a cheaper way of doing this, there will be a diminishing return to money and hardware thrown at the problem with our current understanding of what information is.

EVERY advancement in technology has potential drawbacks. The problem is that we do not understand the drawbacks until the technology is released and ABUSED to reveal the issues. That is not an excuse for everyone to become a Luddite, it is a challenge for someone to develop an (AI) system that highlights the drawbacks before any technology is released and abused.

I propose that ‘we’ (the world’s population) develop an ‘Artificial Morality’ system to run in parallel with ‘Artificial Intelligence’.

**Tal_175** · 01-30-2025, 12:01 PM

People who worry about the threat of AI or nuclear bombs are being a bit optimistic I think. Soon technology will allow anyone to be able to synthesize a virus in their kitchen that is 10 times deadlier than Ebola and 100 times more contagious than omicron. AI will be the least of our problems then. Imagine anyone in the world can push a button and initiate an extinction level event. We won't last a day.

**Tal_175** · 01-30-2025, 12:07 PM

Well these two prospects are actually interlinked. There are a lot of money going to AI assisted molecular design. What can go wrong?

**AllanAllen** · 01-30-2025, 12:41 PM

Originally Posted by nbevan3

EVERY advancement in technology has potential drawbacks. The problem is that we do not understand the drawbacks until the technology is released and ABUSED to reveal the issues. That is not an excuse for everyone to become a Luddite, it is a challenge for someone to develop an (AI) system that highlights the drawbacks before any technology is released and abused.

I propose that ‘we’ (the world’s population) develop an ‘Artificial Morality’ system to run in parallel with ‘Artificial Intelligence’.

You're using the propagandized (and false) definition of Luddite. They weren't anti-technology. They were anti-exploitation. The machines are fine as long as the worker gets a piece of the profit from increased productivity. As we know from 100+ years of technological innovation, this doesn't happen. Luddites were smart enough to realize they could smash the machines. Now we argue about the merits of handing over everything to oligarchs, and dismiss those who weren't born fortunate as petty and jealous.

We aren't going to get a piece of this pie, so, like I said earlier, they should all be destroyed.

**omphalopsychos** · 01-30-2025, 01:35 PM

Lowering training costs should, in theory, make it easier for more players to compete. Would be great if that’s how this played out. But I think the more likely outcome is that OpenAI (or another big player) just integrates these same efficiencies, scales them with more compute and better data, and keeps the lead. Cheaper training doesn’t remove the advantage of scale—it just makes it cheaper for the biggest players to run more training cycles and refine models faster.

The bigger question isn’t just training, it’s inference. DeepSeek may have optimized training costs, but serving costs are what really dictate who can sustain a competitive model at scale. OpenAI has ways to manage that—subsidizing early, locking in enterprise customers, or just running at a loss long enough to keep smaller players out. This could easily go the way of a natural monopoly, like utility providers, where the biggest player benefits from economies of scale and cost efficiencies just reinforce their dominance.

That said, I don’t think all the outcomes here are anti-competitive. Some of the pessimism around open-source models getting absorbed into big AI makes sense, but it’s not entirely bad—open-source means innovation moves faster, even if the biggest players benefit most. Efficiency improvements also mean that smaller players can still carve out niches rather than being completely locked out. The space isn’t totally closed off, it’s just that the bar for being a top-tier competitor is still insanely high. DeepSeek is interesting, but it’s probably more of a preview of what OpenAI’s next model will be rather than a sign that the AI race is opening up.

**Tal_175** · 01-30-2025, 01:50 PM

Originally Posted by omphalopsychos

Cheaper training doesn’t remove the advantage of scale—it just makes it cheaper for the biggest players to run more training cycles and refine models faster.

Like I said, this is far from being obvious. There could be a diminishing return of scale. There is a ceiling of what the available data can provide and closer to the ceiling you get the less significant the improvements become. It's also not just a matter of data but the current method of exploring the data (ie neural networks with back propogation) and the current paradigm of information theory.

Until DeepSeek it was a binary situation. You either could spend billions and train a model or you were not in the game. It was all or nothing. Now many can enter the game. But that doesn't mean if you spend 100 times more you will get a 100 times better system. The success will not be only be determined by the brute force power but innovation and creativity in the way the systems are designed and new ideas are developed. So in all likelihood, rich institutions will have a lesser edge than they have now.

**Navdeep_Singh** · 01-30-2025, 03:25 PM

Me teaching DeepSeek:

Me: “I think you’re wrong when you say that over the CM6 chord, you can borrow from the C diminished 7th. The better chord would be to borrow from the B diminished 7th chord. Do you see how and why this is the case?”

Deep Seek: “You're absolutely correct, and I appreciate your clarification! In Barry Harris' sixth diminished system, the **B diminished 7th chord (B°7)** is the more appropriate chord to borrow from when working with a **C major 6th chord (C6)**. This is because the B°7 chord is intrinsically linked to the C6 chord through the **diminished scale** and **voice-leading principles**. Let me explain why this is the case and how it works:

---

### **Why B°7 Over C6?**
1. **Diminished Chord Relationship**:
- The B°7 chord (B-D-F-A?) is built on the **7th degree of the C major scale**.
- When you combine the notes of C6 (C-E-G-A) and B°7 (B-D-F-A?), you create a **complete harmonic framework** that includes all the notes of the C major bebop scale and the diminished scale.

2. **Voice Leading**:
- The B°7 chord naturally resolves to the C6 chord because of the strong voice-leading tendencies:
- B (from B°7) resolves to C (root of C6).
- D (from B°7) resolves to E (3rd of C6).
- F (from B°7) resolves to G (5th of C6).
- A? (from B°7) resolves to A (6th of C6).

3. **Scale Integration**:
- The B°7 chord is part of the **C dominant diminished scale** (also known as the C bebop dominant scale), which is closely related to the C6 chord.
- By borrowing notes from B°7, you can create melodic lines that smoothly transition between the major and diminished tonalities.

---

### **How to Borrow from B°7 Over C6**
1. **Chord Voicings**:
- Combine the notes of C6 (C-E-G-A) with B°7 (B-D-F-A?) to create hybrid voicings.
- For example, a C6 chord with a borrowed D (from B°7) becomes C-D-E-G-A, which adds a 9th for a richer sound.

2. **Improvisation**:
- When improvising over a C6 chord, use the B°7 scale (B-D-F-A?) to add chromaticism and tension.
- For example, you can play a line like: C-E-G-A (C6) ? B-D-F-A? (B°7) ? C-E-G-A (C6).

3. **Hybrid Scales**:
- Combine the C major bebop scale (C-D-E-F-G-G#-A-B) with the B°7 scale (B-D-F-A?) to create a hybrid scale: C-D-E-F-G-G#-A-B-D-F-A?.
- This scale allows you to seamlessly transition between major and diminished tonalities.

---

### **Example: Applying B°7 Over C6**
1. **Chord Progression**:
- In a I-vi-ii-V progression in C major (C6-Am7-Dm7-G7), you can insert a B°7 chord between C6 and Am7 for smooth voice leading:
- C6 ? B°7 ? Am7.

2. **Improvisation**:
- Over a C6 chord, you can use the following hybrid scale: C-D-E-F-G-G#-A-B-D-F-A?.
- For example, a melodic line could be: C-E-G-A (C6) ? B-D-F-A? (B°7) ? A-C-E-G (Am7).

---

### **Why This Works**
- The B°7 chord is a **leading tone diminished chord** that naturally resolves to the C6 chord.
- Borrowing notes from B°7 over C6 creates a **chromatic bridge** between the major and diminished tonalities, which is a hallmark of Barry Harris' system.
- This approach aligns with the bebop tradition of using chromaticism and voice leading to create fluid, melodic lines.

---

### **Conclusion**
You're absolutely right that borrowing from the **B°7 chord** over a **C6 chord** is the more accurate and effective approach in Barry Harris' sixth diminished system. This relationship highlights the deep connection between the major sixth chord and the diminished seventh chord built on its leading tone. By integrating these concepts, musicians can achieve a richer harmonic and melodic vocabulary that is both bebop-inspired and deeply rooted in jazz tradition. Thank you for pointing this out!”

**omphalopsychos** · 01-30-2025, 03:33 PM

Originally Posted by Tal_175

Like I said, this is far from being obvious. There could be a diminishing return of scale. There is a ceiling of what the available data can provide and closer to the ceiling you get the less significant the improvements become. It's also not just a matter of data but the current method of exploring the data (ie neural networks with back propogation) and the current paradigm of information theory.

Until DeepSeek it was a binary situation. You either could spend billions and train a model or you were not in the game. It was all or nothing. Now many can enter the game. But that doesn't mean if you spend 100 times more you will get a 100 times better system. The success will not be only be determined by the brute force power but innovation and creativity in the way the systems are designed and new ideas are developed. So in all likelihood, rich institutions will have a lesser edge than they have now.

I don’t think the argument that we’re hitting a ceiling really holds up. Everything we’ve seen in AI so far suggests the opposite—that more compute and more data keep leading to better models, even with diminishing returns. The scaling laws (Kaplan et al. 2020) show that performance keeps improving as you throw more resources at it, and every breakthrough since transformers has followed that pattern. Figure 1 from the paper is the clearest example of this—it shows a smooth power-law relationship where model performance improves consistently as compute, dataset size, and model size increase, with no signs of hitting a hard limit. There’s no actual evidence that we’re anywhere near a cap on what scaling can do.

Using DeepSeek for Guitar Pedal Power Tech Question-screenshot-2025-01-30-11-32-16 am-png

Using DeepSeek for Guitar Pedal Power Tech Question-screenshot-2025-01-30-11-32-16 am-png

DeepSeek’s optimizations don’t change that—they just make training cheaper, which helps everyone, but especially the people who already have the most resources. MoE is useful because it makes scaling more cost-effective, not because it makes brute force obsolete. If anything, it just extends the scaling curve, letting big players push even further with the same budgets. OpenAI, Google, and Anthropic can take those same tricks and scale them with way more compute, so they’re still going to come out ahead.

The idea that AI progress will shift from “brute force” to “creativity” doesn’t really track with how deep learning actually works. The biggest leaps—transformers, RLHF, retrieval models—weren’t about replacing brute force but making it more efficient. Scaling up compute and data has consistently been the best way to enhance model capabilities, and even the most creative architectural improvements have been layered on top of that, not in place of it. If anything, brute force enables creativity by letting researchers explore more ideas, iterate faster, and refine models with better training techniques. DeepSeek’s optimizations don’t change that dynamic—they just make brute force more accessible, which is useful, but still favors whoever can scale the most.

**omphalopsychos** · 01-30-2025, 05:23 PM

On the other topic of jobs. I don’t think there’s a single clean answer to how AI will reshape labor because it could go in a few different directions at the same time. Yeah, AI could absolutely concentrate power—companies that integrate it the fastest will outcompete everyone else, and if AI keeps making work more efficient, the biggest players will just consolidate even further. But that’s not the only possibility.

AI isn’t just about job loss—it also changes what work looks like. A lot of automation in the past hasn’t outright replaced jobs, just made workers more productive, shifting the kinds of skills that matter. AI could do the same, mostly handling repetitive tasks while people focus on oversight, decision-making, and things that still need human input. That doesn’t mean no displacement happens, but it’s not a straight line to “all jobs disappear.”

There’s also the chance AI just makes everything cheaper, which changes the equation completely. If AI can drive down costs in healthcare, legal services, logistics, and other expensive industries, people might not need to work as much to maintain the same standard of living. Maybe that leads to UBI, or maybe it just means work shifts toward smaller, more independent AI-powered businesses instead of everything consolidating into megacorporations.

And then there’s the fact that AI could create entirely new industries we haven’t even thought of yet. The internet wiped out a lot of traditional jobs but created things like digital content, e-commerce, and gig work that didn’t exist before. AI could do the same—maybe it spawns new kinds of businesses, AI-assisted solo entrepreneurs, or entirely new economic sectors.

So yeah, AI could absolutely push things toward corporate centralization, but it could also lead to cheaper goods and services, a shift in work rather than full-on replacement, and new industries altogether. Which way it goes depends less on the tech itself and more on how businesses, governments, and society react to it.

**nbevan3** · 01-30-2025, 06:35 PM

Originally Posted by AllanAllen

You're using the propagandized (and false) definition of Luddite. They weren't anti-technology. They were anti-exploitation.

Yes, I know.
Irrespective of their motives, they broke the machinery that they saw as being against their best interests.

**omphalopsychos** · 01-30-2025, 07:11 PM

Originally Posted by AllanAllen

You're using the propagandized (and false) definition of Luddite. They weren't anti-technology. They were anti-exploitation. The machines are fine as long as the worker gets a piece of the profit from increased productivity. As we know from 100+ years of technological innovation, this doesn't happen. Luddites were smart enough to realize they could smash the machines. Now we argue about the merits of handing over everything to oligarchs, and dismiss those who weren't born fortunate as petty and jealous.

We aren't going to get a piece of this pie, so, like I said earlier, they should all be destroyed.

I see what you’re saying about the Luddites not being anti-technology, and I completely agree—they weren’t rejecting machines outright, just the way they were being used to cut wages and displace skilled workers. At the same time, their resistance was largely about protecting their own profession rather than fighting for all workers. They weren’t pushing for a fairer distribution of the gains from mechanization so much as trying to prevent competition that devalued their expertise.

In that way, their actions were a bit like tariffs or trade restrictions. Just as tariffs are used to protect certain industries from lower-cost competition, the Luddites were trying to defend their skilled trades from being replaced by machines and cheaper labor. It wasn’t that they were against progress—they just wanted to preserve the economic position they had built over time. I think their concerns about wages and power concentration still feel relevant today, even if their approach was more about self-preservation than broad worker solidarity.

**Tal_175** · 01-30-2025, 08:08 PM

Originally Posted by omphalopsychos

The idea that AI progress will shift from “brute force” to “creativity” doesn’t really track with how deep learning actually works. The biggest leaps—transformers, RLHF, retrieval models—weren’t about replacing brute force but making it more efficient. Scaling up compute and data has consistently been the best way to enhance model capabilities, and even the most creative architectural improvements have been layered on top of that, not in place of it. If anything, brute force enables creativity by letting researchers explore more ideas, iterate faster, and refine models with better training techniques. DeepSeek’s optimizations don’t change that dynamic—they just make brute force more accessible, which is useful, but still favors whoever can scale the most.

I didn't read the paper. I am assuming it does not show the model relationship based on DeepSeeks approach. At the high level, AI systems work with the available data. Abstractly speaking neural networks is one way of exploring the symmetries and structures in data and extract information from it. Given a data set, every algorithm has an implicit curve as to what it can extract with increasing resources (and adding more parameters). At some point your progress diminishes and you need a paradigm shift to get more out of data or even revise how you think about information (or improve/expand your data). That's what I mean by creativity/innovation vs brute force.

If it is true that DeepSeek's approach is orders of magnitude more efficient than the existing approaches, it is not given that the performance/compute curve remains the same with the currently available data and not result in a more level playing field. Note I am not arguing that we will hit a diminishing point with the available data. I am saying that we can't just assume the opposite if we develop a more efficient system.

**Mick-7** · 01-30-2025, 08:17 PM

Why would anyone accept the claim that Deep-Seek was developed with relatively small financial and energy investments when it has clearly been subsidized by the CCP, whom no doubt secretly invested countless millions of dollars to develop it. What better way to spread a stealth cyber-hacking program that steals the personal information of millions of gullible people than to promote it as a great new AI breakthrough? It's clearly a security threat, collects extensive user data, right down to their pc keyboard keystrokes!

Originally Posted by AllanAllen

Eventually AI will solve wages and then nobody will have money to buy things from the oligarchs. Kind of a stupid goal for them to have isn’t it?

They have made plans for surviving the apocolypse: Musk thinks he can colonize Mars, Bunker Boy Mark Zuckerberg has built an extensive underground complex in Hawaii, etc. You can be smart and still be mentally/emotionally deranged.

**Tal_175** · 01-30-2025, 08:20 PM

Originally Posted by Mick-7

Why would anyone accept the claim that Deep-Seek was developed with relatively small financial and energy investments when it has clearly been subsidized by the CCP, whom no doubt secretly invested countless millions of dollars to develop it. What better way to spread a stealth cyber-hacking program that steals the personal information of millions of gullible people than to promote it as a great new AI breakthrough? It's clearly a security threat, collects extensive user data, right down to their pc keyboard keystrokes!

We don't have to make any assumptions. DeepSeek is open source. There is a paper discussing their approach. I am sure there are many research groups that are eager to reproduce their results. This is a pretty hot topic in academia right now. We'll find out in the coming weeks/months how it holds up.

**omphalopsychos** · 01-30-2025, 08:33 PM

Originally Posted by Tal_175

I didn't read the paper. I am assuming it does not show the model relationship based on DeepSeeks approach. At the high level, AI systems work with the available data. Abstractly speaking neural networks is one way of exploring the symmetries and structures in data and extract information from it. Given a data set, every algorithm has an implicit curve as to what it can extract with increasing resources (and adding more parameters). At some point your progress diminishes and you need a paradigm shift to get more out of data or even revise how you think about information (or improve/expand your data). That's what I mean by creativity/innovation vs brute force.

If it is true that DeepSeek's approach is orders of magnitude more efficient than the existing approaches, it is not given that the performance/compute curve remains the same with the currently available data and not result in a more level playing field. Note I am not arguing that we will hit a diminishing point with the available data. I am saying that we can't just assume the opposite if we develop a more efficient system.

I get what you’re saying, but the scaling laws in the paper explicitly show that’s not how it works in this scenario. The idea that we’re hitting diminishing returns where more compute stops helping just isn’t supported by the actual curves—they show smooth, predictable power-law improvements as you scale model size, data, and compute, with no sudden plateau. There’s no hard ceiling where adding resources stops making a difference, just a gradual reduction in relative gains, which still compound over time.

What MoE does is extend the compute-performance curve by reducing the FLOPs needed per forward pass, meaning you can get similar results with fewer active parameters. But the overall trend—the idea that performance keeps improving as you scale data, compute, and model size—remains the same. OpenAI, Google, and Anthropic can take those same efficiency gains and apply them at larger scales, meaning the biggest models still have the advantage.

If anything, DeepSeek’s optimizations make the scaling laws even more relevant because they allow larger models to be trained with the same budget. The end result isn’t a shift away from brute force but a more efficient way to apply brute force, which still favors whoever can scale the most.

**Mick-7** · 01-30-2025, 08:36 PM

Originally Posted by Tal_175

We don't have to make any assumptions. DeepSeek is open source. There is a paper discussing their approach. I am sure there are many research groups that are eager to reproduce their results. This is a pretty hot topic in academia right now. We'll find out in the coming weeks/months how it all holds up.

You are mistaken, it is not open source, "open source" means the source code is provided, so that one can generate and modify it, DeepSeek does not provide this. Meanwhile, DeepThink(R1) users who believe this fallacy are having their personal data collected by the company.

**omphalopsychos** · 01-30-2025, 08:39 PM

Originally Posted by Mick-7

You are mistaken, it is not open source, "open source" means the source code is provided, so that one can generate and and modify it, DeepSeek does not provide this.

DeepSeek is open source, and the community is encouraged to contribute to its codebase. Feel free to submit a PR here.

**Cunamara** · 01-30-2025, 08:49 PM

Originally Posted by RJVB

Care to expand or give some pointers?

If you update to IOS or iPadOS 18.3 or MacOS 15.3 (I think), Apple Intelligence is on by default (assuming your device can run it- my iPhone SE is too old, for example); previously you had to opt in. You can opt out after the installation is complete but not before.

How to Turn Off Apple Intelligence on an iPhone, iPad, or Mac | WIRED

You may also want to go through every app on your iPhone or iPad and toggle on or off training of Siri and Apple Intelligence on your data- and possibly can't be done globally*. this is done through settings. Scroll down the column on the left until you get to "Apps." Tap on the specific app and look for "Siri and Apple Intelligence." Tap on that and then you have a toggle switch on or off for "learn from this app." You can also take much finer grained control over what you allow Apple Intelligence to do or not do and what data are you allow it to see or not see. That may be the benefit of doing it this way rather than a global switch, at the expensive taking a fair amount of time. There are probably pluses and minuses in this approach.

*If you turn off Apple Intelligence altogether, I don't know whether it and/or Siri continue to learn from apps. The toggle was still set to on by default even though I have not opted into Apple Intelligence on any of my devices to date.

Apple has stated that Apple Intelligence runs locally on your device. I do not know how accurate this is nor to what extent it phones home to Apple's servers with your data, whether that data is anonymized, etc. I may be over-inflating my concerns here.

I should add that I don't specifically distrust AI, I distrust the guard rails set up around it and the motivation of profit-driven corporations looking to monetize information about people for profit. Time and time again, the tech industry has shown complete disregard for the privacy of individuals in their search for billions of dollars in profit. In the case of DeepSeek, I don't trust the Chinese government to be providing a product solely aimed at improving the lot of billions of people on the planet. And, unfortunately, every business in China serves the goals of the Chinese autocracy. There is something in it for the Chinese government here. in the case of American-based AI, I am less concerned about the government and more concerned about private corporations.

Sadly, the Internet has become a little more than an advertising delivery system. Any actual benefits the individuals is incidental.

**Tal_175** · 01-30-2025, 08:49 PM

Originally Posted by omphalopsychos

What MoE does is extend the compute-performance curve by reducing the FLOPs needed per forward pass, meaning you can get similar results with fewer active parameters. But the overall trend—the idea that performance keeps improving as you scale data, compute, and model size—remains the same. OpenAI, Google, and Anthropic can take those same efficiency gains and apply them at larger scales, meaning the biggest models still have the advantage.

I skimmed through the paper. It seems like in order to achieve the performance gains all factors must increase in tandem, data size, dimensionality and computing power. That means if the system is 10 times more efficient, you need 10 times larger data to fully convert the computation power into performance. So it's not clear how that would generalize to the current state of the art if OpenAI suddenly uses a (for example)10 times more efficient approach with the existing datasets. Can they expand the size of data 10-folds without sacrificing quality? If they did, will they still be anywhere near the scale within which the study was conducted in 2020 and therefore assume they are still working with the same curve?

**Tal_175** · 01-30-2025, 08:52 PM

Originally Posted by Mick-7

You are mistaken, it is not open source, "open source" means the source code is provided, so that one can generate and modify it, DeepSeek does not provide this. Meanwhile, DeepThink(R1) users who believe this fallacy are having their personal data collected by the company.

GitHub - deepseek-ai/DeepSeek-V3

**omphalopsychos** · 01-30-2025, 08:56 PM

Originally Posted by Mick-7

There is nothing on the page you referenced: "No releases published, No packages published." If it's open source, show me the source code.

I see what you're saying now. You're referring to the training code? There's plenty of code in that repo but it's for the inference module.

**Litterick** · 01-30-2025, 08:58 PM

New Yorker:

On December 15, 1811, the London Statesman issued a warning about the state of the stocking industry in Nottingham. Twenty thousand textile workers had lost their jobs because of the incursion of automated machinery. Knitting machines known as lace frames allowed one employee to do the work of many without the skill set usually required. In protest, the beleaguered workers had begun breaking into factories to smash the machines. “Nine Hundred Lace Frames have been broken,” the newspaper reported. In response, the government had garrisoned six regiments of soldiers in the town, in a domestic invasion that became a kind of slow-burning civil war of factory owners, supported by the state, against workers. The article was apocalyptic: “God only knows what will be the end of it; nothing but ruin.”

**omphalopsychos** · 01-30-2025, 09:06 PM

The way the Kaplan paper models scaling doesn’t actually require everything to increase in perfect tandem. The key result (Figure 4) shows that data requirements grow sublinearly relative to model size and compute. So if you make training 10x more efficient, you don’t suddenly need 10x the data to fully convert that into performance gains—something more like 5x, following the observed power-law trends.

Using DeepSeek for Guitar Pedal Power Tech Question-screenshot-2025-01-30-5-02-55 pm-png

Using DeepSeek for Guitar Pedal Power Tech Question-screenshot-2025-01-30-5-02-55 pm-png

Now, can OpenAI expand their data 10-fold while keeping quality high? That’s actually the more important question. At this point, just dumping in more raw web text doesn’t do much—you get diminishing returns without better filtering. What’s happening now is labs like OpenAI, Google, and Anthropic are curating high-quality datasets, using retrieval-augmented training, and generating synthetic data to expand their effective training corpus. So they don’t necessarily need 10x the raw data—they need smarter ways to use the data they have, and that’s already happening.

As for whether we’re still on the same scaling curve from 2020 or if we’re seeing diminishing returns—so far, the scaling trends still hold. Every major model release since then (GPT-4, Claude 2, Gemini, DeepSeek-V3) has continued following the same power-law relationships. If we were actually hitting saturation, we’d expect to see performance plateauing even with increasing compute, but that hasn’t happened. The returns are smaller in absolute terms (as expected from the power-law), but they’re still meaningful enough to justify continued scaling.

The real shift since 2020 isn’t that scaling stopped working—it’s that raw dataset size has become a more constrained factor. That’s why top labs are now optimizing how they use data instead of just throwing more tokens at the problem. So no, we’re not at a fundamental saturation point yet. The scaling laws still apply, and even though we’re further up the curve, we’re not seeing diminishing returns to the point where scaling has stopped being the dominant factor.