Data Scaling or Processing Error? [resolved]

SimonPeiSimonPei South Korean
edited July 2025 in Ganglion

Hello OpenBCI Community,

I am conducting an N-back experiment using OpenBCI hardware and I'm running into a fundamental issue during my ERP analysis. After filtering, epoching, and averaging my data, the resulting ERP waveforms have an amplitude in the range of [-5000, 5000] µV, which is physiologically impossible and several orders of magnitude larger than what is reported in the literature (±15 μV).

I suspect I am making an error in my data processing pipeline, most likely related to unit conversion or baseline correction. I'm hoping the community can help me spot what I might be missing.

What I Expect vs. What I'm Getting

I am comparing my results to two papers with similar paradigms:

  1. CHI 2020 ("BrainCoDe") [1]:

    • Device: Brain Products LiveAmp (32-ch)
    • ERP Amplitude: Plotted in the [-4, 4] µV range.
    • Filtering: 1Hz high-pass, 125Hz low-pass, 50Hz notch filter.
  2. 2024 Paper ("Neurocognitive Impact of Icon Styles") [2]:

    • Device: Neuroelectrics StarStim (32-ch)
    • ERP Amplitude: Plotted in the [-14, 14] µV range.
    • Filtering: Used two different band-pass filters: 0.2-10 Hz (for P300) and 0.1-30 Hz (for N400).

My results, however, are consistently in the thousands of microvolts.


My Processing Attempts

My experiment is an N-back task. I have processed my data using the exact filtering methods from the papers above, but the amplitude problem persists in all cases.

  • My Hardware: [Please specify your board, e.g., Cyton, Ganglion, Cyton+Daisy]
  • My Software: [Please specify your software, e.g., Python with MNE/BrainFlow, OpenBCI GUI, MATLAB]

Here are the results from my filtering attempts:

  1. Attempt 1 (Following CHI 2020 paper): 1Hz high-pass, 125Hz low-pass, 50Hz notch filter.

    • Result: Amplitude is approximately +/- 5000 µV.
  2. Attempt 2 (Following 2024 paper's P300 filter): 0.2 - 10 Hz band-pass filter.

    • Result: Amplitude remains extremely high.
  3. Attempt 3 (Following 2024 paper's N400 filter): 0.1 - 30 Hz band-pass filter.

    • Result: Amplitude is still in the thousands of microvolts.

My Specific Questions

Given the massive discrepancy, my issue is likely not the filter choice, but something more fundamental.

Unit Conversion: This seems like the most probable cause. The raw OpenBCI data comes in ADC counts, not microvolts. What is the correct formula or scaling factor to convert the raw data from a Ganglion board into microvolts? I may be missing this step entirely or using the wrong factor.

Could anyone who has successfully generated ERPs with OpenBCI share the essential pre-processing steps they use, especially regarding scaling the raw data? Any guidance would be greatly appreciated.

Thank you!

References:

[1] BrainCoDe: Electroencephalography-based Comprehension Detection during Reading and Listening (CHI 2020)

[2] Neurocognitive Impact of Icon Styles and Semantics on Memory and Recall Speed: An N Back Study Using EEG (2024)

Comments

  • wjcroftwjcroft Mount Shasta, CA

    @SimonPei said:
    ...
    Given the massive discrepancy, my issue is likely not the filter choice, but something more fundamental.

    Unit Conversion: This seems like the most probable cause. The raw OpenBCI data comes in ADC counts, not microvolts. What is the correct formula or scaling factor to convert the raw data from a Ganglion board into microvolts? I may be missing this step entirely or using the wrong factor. ...

    You are incorrect. The Ganglion data stream provided by both the GUI and Brainflow is in microvolts already. No conversion is needed. You may be confusing the provided data stream with the values present in the radio packets. These are converted by Brainflow / GUI into microvolts.

    https://docs.openbci.com/Ganglion/GanglionDataFormat/

  • SimonPeiSimonPei South Korean

    Hello OpenBCI Community, and thank you for the prompt clarification.

    You are incorrect. The Ganglion data stream provided by both the GUI and Brainflow is in microvolts already. No conversion is needed.

    This is very helpful, as it confirms we should not be performing any additional scaling on the data from BrainFlow. Thank you for correcting my initial assumption.

    However, this brings me to my core confusion, which is about the magnitude of the raw signal itself. The academic papers we're referencing show final, averaged ERPs with amplitudes of only +/- 4 to 15 µV. In contrast, our raw data from the Ganglion has amplitudes that are orders of magnitude larger.

    To illustrate, we recorded 1-minute sessions from three subjects using a Ganglion with electrodes at F7, Fp1, Fp2, and F8. The table below summarizes the statistics of the raw data stream (which you confirmed is in µV).

    user Session channel max (µV) min (µV) mean (µV) std (µV)
    subject01 1 F7-pin 1 20219.62 -25452.98 -1773.78 8478.31
    subject01 1 Fp1-pin 2 39912.55 -42839.53 1027.50 10618.74
    subject01 1 Fp2-pin 3 57443.05 -48090.47 -317.79 14931.52
    subject01 1 F8-pin 4 28800.64 -16990.00 845.96 8543.38
    subject01 2 F7 28229.97 -27171.91 -3177.43 8613.22
    subject01 2 Fp1 28411.55 -15515.77 515.38 6094.01
    subject01 2 Fp2 44333.62 -45077.19 -1992.31 12162.54
    subject01 2 F8 28770.60 -18937.91 1085.53 8535.86
    subject01 3 F7 12973.96 -24792.56 -3817.51 7433.17
    subject01 3 Fp1 22086.45 -16540.79 668.98 6626.13
    subject01 3 Fp2 27193.52 -38800.70 -1754.27 13478.17
    subject01 3 F8 27269.91 -19522.47 1008.42 9974.00
    subject02 1 F7 127.17 -2264.03 -2088.22 278.75
    subject02 1 Fp1 265.50 -2755.06 -2493.36 335.17
    subject02 1 Fp2 269.00 -2050.93 -1784.18 248.09
    subject02 1 F8 448.82 -3328.51 -1739.63 466.58
    subject02 2 F7 69.48 -2453.62 -2302.75 302.17
    subject02 2 Fp1 230.04 -2975.63 -2736.96 367.75
    subject02 2 Fp2 252.35 -2210.70 -1961.91 272.23
    subject02 2 F8 103.92 -4111.95 -1907.79 384.08
    subject02 3 F7 108.49 -2735.87 -2530.66 331.63
    subject02 3 Fp1 257.05 -3225.99 -3014.86 400.71
    subject02 3 Fp2 246.87 -2588.25 -2223.03 300.20
    subject02 3 F8 87.74 -2939.45 -2107.09 283.57
    subject03 1 F7 24109.96 -33864.20 -370.55 8206.52
    subject03 1 Fp1 35946.43 -22971.28 -350.71 8492.68
    subject03 1 Fp2 20818.14 -27716.07 199.37 7052.33
    subject03 1 F8 21847.33 -17432.11 -39.70 6052.48
    subject03 2 F7 23759.76 -33405.61 296.30 8551.03
    subject03 2 Fp1 37630.31 -40076.43 -45.94 9558.15
    subject03 2 Fp2 19050.16 -48636.05 -6012.83 9513.80
    subject03 2 F8 26264.45 -23326.08 -214.62 7432.04
    subject03 3 F7 29761.52 -33343.16 -2446.77 9401.96
    subject03 3 Fp1 39382.83 -21085.88 2909.31 11612.36
    subject03 3 Fp2 13600.42 -40241.50 -4914.07 8232.54
    subject03 3 F8 35663.94 -26989.00 274.56 9280.91

    As the table shows, the raw signal for Subject 01 and 03 regularly has peak-to-peak amplitudes exceeding 50,000 µV, and even our "cleanest" subject (Subject 02) has peaks of a few thousand µV.

    This leads to our follow-up question:

    Is it expected that the continuous raw EEG signal from a Ganglion has amplitudes in the thousands or tens of thousands of microvolts, and that this large amplitude is reduced to just a few microvolts after the process of epoching, baseline correction, artifact rejection, and averaging many trials?

    Or, to put it another way:

    1. Raw vs. Averaged: Are we simply comparing "apples and oranges"—our raw signal amplitude versus a heavily processed and averaged ERP amplitude from a paper? Is this massive difference in magnitude primarily explained by the signal averaging process itself?
    2. Hardware Differences: Could the different devices used in the papers (e.g., Brain Products, Neuroelectrics) have vastly different raw voltage outputs compared to the Ganglion, or is the final ERP result more dependent on the processing pipeline than the initial raw amplitude?

    We are essentially looking for a "sanity check." Do the raw data values in our table seem plausible for a typical Ganglion recording (especially from frontal channels prone to EMG/EOG artifacts), or do they still indicate a fundamental problem with our setup, such as high impedance or noise?

    Thank you again for your time and expertise.

  • wjcroftwjcroft Mount Shasta, CA

    I'm having a hard time differentiating your 'broken hardware' (channel 4) other thread, with this thread showing universally wonky values. Are you sure you do not have some type of EMF electromagnetic field noise in your environment?? One of your early threads from June showed HUGE EMF spikes.

    https://openbci.com/forum/index.php?p=/discussion/4026/biosensing-starter-bundle-ganglion-headband-noisy-data#latest

    NO EEG should be over 100 uV or so. The huge values you are seeing confirms you have some anomalous noise or disruption in your lab environment. Try moving to an electrically quieter area. And also invest in an EMF meter (hardware or app) as I suggested before.

  • SimonPeiSimonPei South Korean

    Hello OpenBCI Community, and thank you for your invaluable advice.

    Following your strong recommendation, we have conducted a series of experiments to rigorously test for environmental EMF noise. We collected data in four distinct locations: our lab, an office, an electrically quiet storage room, and outdoors. In all locations, we used an EMF meter and confirmed that there were no significant EMF fields present.

    This process has led us to a critical new discovery:

    (1) The Core Finding: A Consistent Data Jump at the 1-Second Mark in Saved Files

    We have found a highly specific and repeatable pattern across all recordings, regardless of location. The issue is identical whether using the OpenBCI GUI to record or capturing the data directly with BrainFlow.

    1. For the first second of recording (approx. the first 200 data samples), the EEG values in the saved CSV file are in the expected, normal range (mostly under 100 µV).
    2. At almost exactly the 1-second mark, the data values in the CSV file abruptly and dramatically jump to an anomalous, high range (e.g., 2000+ µV for one subject, 300+ µV for another).

    This behavior is consistent across all tested subjects and locations.

    • Lab

      EMF test results are normal

      OpenBCI GUI (or BrainFlow)-collected data suddenly jumped to over 2000 µV starting at 2 s

      • Subject 1:

      • Subject 2 (Affected by magnetic field noise interference, the EEG data exceeded 100 µV right from the start):

    • storage room

      EMF test results are normal


    OpenBCI GUI (or BrainFlow)-collected data suddenly jumped to over 2000 or 300 µV starting at 2 s
    - Subject 1:

    - Subject 2

    • office

      EMF test results are normal

      OpenBCI GUI (or BrainFlow)-collected data suddenly jumped to over 2000 or 300 µV starting at 2 s

      • Subject 1:

      • Subject 2:

    • outdoor

      EMF test results are normal

      • Subject 1:

      • Subject 2:

    (2) The Crucial Clue: Live GUI Data is Normal, Saved CSV Data is Corrupted

    Here is the most important piece of information we have uncovered:

    This data corruption issue ONLY exists in the final, saved CSV files.

    When we watch the live-scrolling time series plot within the OpenBCI GUI, the data remains in the normal, low-microvolt range for the entire duration of the recording. The sudden jump to high values is not visible in the live plot.

    This strongly suggests that:

    • The Ganglion board is likely functioning correctly and sending good data.
    • The problem is not environmental noise.
    • The issue appears to be a software problem related to the process of writing the data stream to the CSV file.

    Our New Question and Hypothesis

    Given this new evidence, could there be a bug in how the OpenBCI GUI and/or BrainFlow handles the Ganglion data stream when writing it to a file?

    Specifically, we suspect something might be going wrong after the first second of data collection (or after the first ~200 data packets). Could this be related to how the software handles the Ganglion's packet counter rollover, a data type conversion error, or a buffer issue during the file-write process?

    (3) Side Note: We also observe a large, brief oscillation in the very first ~0.5 seconds of every recording, which we assume is normal electrode settling time, but we are mentioning it for completeness.

    Thank you again for your continued guidance and expertise. We believe this new information brings us much closer to the root cause.

  • wjcroftwjcroft Mount Shasta, CA

    Simon, great detective work.

    I am going to mention some staff members here, whom I hope will log this as an urgent issue in the OpenBCI GUI or Brainflow Github(s). @Shirley, Richard @retiutut, @philip_pitts . As you observe, it appears the data stream from Ganglion is correct, but some corruption is happening before the GUI can record the stream to the CSV file.

    In the meantime, can you try some testing with the previous version of the GUI to the one you are using? The releases are listed here:

    https://github.com/OpenBCI/OpenBCI_GUI/releases

    Thanks again for your careful documentation. Please comment here if the previous GUI solves the issue for you.

    Regards, William

  • SimonPeiSimonPei South Korean

    Hello William,

    Thank you again for all your incredible support and for providing the previous version for us to test. We will proceed with testing that version shortly.

    Before we do, we have one more critical discovery to share that we believe might solve this entire mystery, and it confirms your initial points about the hardware likely being fine.

    Our Key Finding: Official Sample Data Shows the Same Pattern

    1. We used the "Playback" feature in the OpenBCI GUI to load one of the official sample data files provided with the software: OpenBCI_GUI-v6-meditation.txt.
    2. During playback, the live data displayed in the GUI looked perfect and was well within the expected range (under 100 µV).

    1. However, when we opened the OpenBCI_GUI-v6-meditation.txt file itself in a text editor, we were very surprised to find that the raw numbers in the EXG columns are huge—in the tens of thousands, looking exactly like our own recorded data.

    This leads us to a new and much clearer hypothesis: The OpenBCI GUI's live display shows correctly scaled microvolts (µV), but when it saves the session data to a text file, it is saving the raw, unscaled ADC "counts".

    Our Final Question

    This brings us back full circle to our very first question, but now with strong evidence. To get the correct microvolt values for our analysis, should we be processing the data from the saved files by multiplying them by the scale factor?

    Specifically, should we apply the formula from the Ganglion Data Format documentation that you shared previously?

    i.e., final_µV = value_from_file * 0.001869917138805

    (Link to docs: https://docs.openbci.com/Ganglion/GanglionDataFormat/)

    This would perfectly explain everything we have observed—the normal live data, the huge numbers in our saved files, and the huge numbers in the official sample files.

    A Minor Question on GUI Display

    As a separate, minor point, we noticed the unit displayed in the GUI's time series window is "µVms". Could you clarify what this unit means? We suspect it might be a small typo and should just be "µV".

    Thank you for guiding us through this entire troubleshooting process. We feel we are finally at the correct answer and just need this last confirmation.

  • wjcroftwjcroft Mount Shasta, CA

    No. The raw Cyton data CSV files are in microvolts also. However Cyton has a DC offset, which must be removed with either a highpass at .5 Hz, or say a bandpass from .5 to .45. Unlike Cyton, Ganglion raw data already has the highpass in the CSV.

    BOTH CSV files are always in microvolts. The scale factors only apply to the data in the radio packets.

    https://openbci.com/forum/index.php?p=/discussion/201/large-millivolt-data-values-fbeeg-full-band-eeg

  • SimonPeiSimonPei South Korean
    edited July 2025

    Hello William,

    Thank you very much for your detailed reply and for providing that forum link. This is incredibly helpful and we believe we are getting very close to the core issue.

    We have a couple of follow-up questions to ensure we are understanding these concepts correctly, particularly regarding the difference between the data displayed in the GUI and the data saved in the CSV file.

    1. Question on Filters and the GUI Display vs. Saved Data

    You mentioned that "Ganglion raw data already has the highpass in the CSV" to remove the DC offset. This led us to look closely at the OpenBCI GUI's default settings. We observed that in the "Filters" configuration section, the default is a Bandpass filter from 5 Hz to 50 Hz, along with a 50/60 Hz Notch filter.

    This has led us to a new critical hypothesis, and we would be grateful if you could confirm it:

    Is it correct to assume that the clean signal we see in the live GUI display (the data that is under 100 µV) is the data AFTER it has been processed by these default GUI filters (i.e., the 5-50 Hz bandpass)?

    And conversely, does this mean that the saved CSV file, while already in microvolts, contains the raw data BEFORE these aggressive filters are applied?

    If this is true, it would explain everything. This would mean that to get the clean, sub-100 µV results in our own analysis, we would need to manually apply a similar bandpass filter (e.g., 5 Hz to 50 Hz) to the data from the CSV file using our BrainFlow code. Is this the correct workflow?

    2. Clarification on "Radio Packets"

    To ensure we understand the terminology correctly: When you state, "The scale factors only apply to the data in the radio packets," could you confirm what this means for our setup? We are using the Ganglion board with its official USB Dongle for wireless transmission.

    Does "data in the radio packets" refer to the raw binary data that the Dongle receives before it is processed by the BrainFlow driver?

    So, our current understanding is that when our code calls a function like board.get_board_data(), BrainFlow has already handled the "radio packets" and applied the necessary scale factor, meaning the data array we receive is intended to be in microvolts. Is this correct?

    Is the following understanding correct?

    • The Ganglion board sends raw data wirelessly in the form of "radio packets".

    • Our USB Dongle receives these packets.

    • The conversion step (applying the scale factor to turn raw 'counts' into microvolts) is handled at a very low level inside the BrainFlow/GUI driver, immediately after the driver "unpacks" these raw radio packets.

    • Therefore, as end-users, we do not need to perform this conversion manually, because the data we receive from BrainFlow is intended to be the final, converted microvolt value.

    Could you please confirm if this description of the data pipeline is accurate?

    Thank you again for your patience and for guiding us through this learning process. We feel that understanding the role of the GUI's default filters is the key to finally resolving our confusion.

  • wjcroftwjcroft Mount Shasta, CA

    re: GUI filtering. Yes filters only apply to the GUI screens. The recording files and Brainflow stream are always raw and unfiltered.

    re: radio packets. I was just trying to underline (have said this before), that the scaling consideration only applies to the radio packet stream. The stream handed to you by Brainflow or GUI has already taken the hardware scale factor into account. Thus those streams or files are always in raw microvolts, unfiltered.

  • wjcroftwjcroft Mount Shasta, CA

    @wjcroft said:
    ...
    In the meantime, can you try some testing with the previous version of the GUI to the one you are using? The releases are listed here:

    https://github.com/OpenBCI/OpenBCI_GUI/releases

    What happened to this test with the previous GUI version? Your image (above July 10) showing uV values in the 2300 or 500 uV range for the initial packet numbers is suspicious. As I believe you said this initial high values fade away and the data goes back to normal low uV range shortly after. And that these high values repeat at one second intervals, when the packet number resets to zero each second. Is this not correct, or did I misinterpret your images.

  • wjcroftwjcroft Mount Shasta, CA

    @SimonPei, please update this thread with your testing results for a previous GUI.

    Thanks,

  • SimonPeiSimonPei South Korean

    Hello William,

    Thank you so much for your last reply. It was the final key we needed, and we've had a major breakthrough in our understanding and process. We are happy to report that the issue is now resolved.

    You were absolutely correct. The core of our misunderstanding was the nature of the raw data versus the filtered data displayed in the GUI.

    Following your explanation, we took our raw saved data (with the very high values) and applied a 5-50 Hz bandpass filter in our own code. It worked perfectly. The data amplitude immediately dropped into the expected <100 µV range, revealing the clear underlying physiological signals.

    This confirms your points entirely:

    1. The saved CSV/BrainFlow data is indeed raw, unfiltered microvolts.
    2. Filtering is the essential, mandatory step to remove DC offset and noise to get clean, analyzable data.

    We also took your advice about the environment seriously. We have now found an electrically quiet location, and with our new filtering pipeline, the baseline data we are collecting is very clean. Furthermore, we can now clearly and reliably see the expected large spikes in the data when we perform intentional actions like clenching our jaw or rolling our eyes. This gives us great confidence that the hardware is working correctly.

    To clear up any confusion from my previous posts: the issue with the data "jumping" to high values after the first second is also resolved by this filtering process. We now believe this was likely a large DC offset or very low-frequency drift appearing after the initial connection, which the 5 Hz high-pass component of our new filter completely removes.

    Because applying the filter to the data from the current GUI version has completely solved our problem and is giving us clean, usable data for our research, we believe that testing a previous GUI version is no longer necessary at this time.

    We cannot thank you enough for your patience and for guiding us through this entire process. Your expertise has been invaluable. We consider this matter resolved.

  • wjcroftwjcroft Mount Shasta, CA

    Hmm, previously you showed that you were seeing large spikes in the raw data values at precisely 1 second intervals. Values in the 2300 uV range. These spikes lined up with the packet counter showing 0.

    re: DC offset, there IS NO DC OFFSET for Ganglion, that only applies to Cyton with the ADS1299 chip. For Ganglion the hardware front end has a hardwired high pass filter as part of the signal flow. The high pass is around .5 Hz.

    Question for your raw data CSV file. Are you STILL seeing these 1 Hz large values? If that is the case previously, but not now, then the spikes were likely from EMF noise in your environment.

    If you are STILL seeing 1 Hz spikes in the raw data, you need to further investigate.

  • SimonPeiSimonPei South Korean

    Hi William,

    Thank you for your continued support. Following your advice, we ran multiple recording sessions in electromagnetically quiet locations. As you can see in the figure below, the first few seconds of data contain no abnormal peaks, and the rest of the recording stays well within the expected range.

    This consistency under low-EMF conditions reassures us that both the Ganglion hardware and BrainFlow pipeline are operating correctly. Please let me know if you have any further suggestions or tests we should perform.

    Best regards,
    Simon


Sign In or Register to comment.