Jump to content
Sign in to follow this  
yyrkoon

Any kernel hackers here ?

Recommended Posts

I was just wondering . . . as  I recently ran into a situation where I have no clue how to proceed further. The situation is this:

 

When a battery is connected to a beaglebone black, and power( barrel jack or USB ) is removed, depending on which image type the board is running. The PMIC sends an NMI to the am335x processor, and then the beaglebone shuts down.

 

So, I know how this works up to the point where the NMI is sent, but have no idea where to look in the huge source tree that is the Linux kernel. How can I find out ?

 

Now to be sure, I've found a few files that seem to be related, but nothing obvious as to how this interrupt ( NMI ) is acted upon. Or even where it is acted upon.

 

How would you proceed to finding out ?

Share this post


Link to post
Share on other sites

I believe you do not need to look in the code. This is what I get on x86 machine
 

# sysctl -a | grep nmi
kernel.nmi_watchdog = 1
kernel.panic_on_io_nmi = 0
kernel.panic_on_unrecovered_nmi = 0
kernel.unknown_nmi_panic = 0

on BBB it should be the same.

Share this post


Link to post
Share on other sites

I believe you do not need to look in the code. This is what I get on x86 machine

 

# sysctl -a | grep nmi
kernel.nmi_watchdog = 1
kernel.panic_on_io_nmi = 0
kernel.panic_on_unrecovered_nmi = 0
kernel.unknown_nmi_panic = 0

on BBB it should be the same.

Some kernel options do not work for non x86 kernels, however . . .

william@beaglebone:~$ sudo sysctl -a | grep nmi
kernel.nmi_watchdog = 0

Here lies the problem. We do not want to just disable it. We want to track power outages, and if the power is still out after 5 minutes, then issue a shutdown. Reasoning is fairly simple, if the power does not come back on after 5 minutes, in many cases it will be a long while before i comes back up.

 

Additionally, we do not want to poll anything. So creating a userspace driver while possible, is really not efficient.

 

Anyway, this is not an argument so much as a list of our concerns. Maybe you know something I don't ? Which would not be a surprise to me ;)

Share this post


Link to post
Share on other sites

@@bazuchan

 

So thanks for you help, but this is not intended to be used as I described in my original post. nmi-watchdog, it's meant to be used to trap system hangs, and create a core dump before rebooting. Additonally, the documentation claims this is an x86 / x86-64 only feature. Although the /proc kernel path object does seem to exist on the board.

Share this post


Link to post
Share on other sites

Depending how the interrupts are routed, the IRQ should go to the PMIC driver, which then decides what to do. The PMIC is a TPS65217, do you see something for that in the source tree? Could be under drivers/platform, drivers/power or arch/.

Share this post


Link to post
Share on other sites

Depending how the interrupts are routed, the IRQ should go to the PMIC driver, which then decides what to do. The PMIC is a TPS65217, do you see something for that in the source tree? Could be under drivers/platform, drivers/power or arch/.

Yes, but there is no button press code, nor NMI code from power good. There is a patch to patch all this into the tsp65217.c driver source, but the current driver I'm using right now does not have this. It's enabled somewhere else, and I'm not sure where.

Share this post


Link to post
Share on other sites

So, as a follow up of sorts. My buddy and I have been doing a bit of reading on the subject. He stumbled upon a beaglebone black article that talked about acpid controlling the push button system reset. Which seems to be true. How we can prove this, is that the "console" images provided by beaglebone.org, which are built by Robert Nelson from Digikey. Do not act on the push button press, or power good ( PMIC ) signal at all. But, the LXDE / X11 images do.

 

However, if one installs the package acpid, and then enables the service. Push button, and power good signals are acted upon as expected. How the system reacts to a push button event is somewhat configurable in the sccript that is run for the acpid service. Modifying the power good event behavior may require a deeper look into the daemon , or kernel source. I have not figured that out yet, but have only begun to read. When I have the chance.

Share this post


Link to post
Share on other sites

More of a follow up.

 

So, I found this link: http://linuxgazette.net/106/pramode.html

 

This article is written on 2.6.x kernel, so much of the information was out dated. However, he did mention cat /proc/interrupts which was a valid path on the beaglebone platform.

 

william@beaglebone:~$ cat /proc/interrupts
           CPU0
 . . .

179:          3      INTC   7 Level     tps65217
Err:          0

 

So I deduced from the article that the interrupt handler in the kernel was "179". However, the rest of the paths / files he mentions in that article do not exist on the beaglebone platform. So I did a little exploring until I ran into . . .

 

william@beaglebone:~$ ls /proc/irq/
121  154  155  156  157  158  16  166  167  169  17  170  172  174  176  177  178  179  19  21  22  29  55  88

 

william@beaglebone:~$ ls /proc/irq/179
spurious  tps65217

 

Which is curious because tps65217 is a directory, and it is empty. So . . .

william@beaglebone:~$ cat /proc/irq/179/spurious
count 0
unhandled 0
last_unhandled 0 ms


Ah ha ! But the problem is, how do I check if these iterate on a button press or power good ? When the acpi daemon shuts the board down in both cases ? The article to the rescue again, we kill the acpid daemon.

 

william@beaglebone:~$ ps -ax |grep acpi
warning: bad ps syntax, perhaps a bogus '-'?
See http://gitorious.org/procps/procps/blobs/master/Documentation/FAQ
 1879 ?        Ss     0:00 /usr/sbin/acpid
 2354 pts/0    S+     0:00 grep acpi
william@beaglebone:~$ sudo kill 1879


Yes, yes ps options apparently do not need a dash proceeding, and I forgot that . . . Anyway, I found the pid of the acpid process, and killed it. So now when I press the power button:

william@beaglebone:~$ cat /proc/irq/179/spurious
count 2
unhandled 1
last_unhandled 4294940086 ms


I'm still unclear as to how we increase the unhandled count, and the time of unhandled tsp65217 interrupts. However, I can say with 100% certainty that each interrupt is tracked. So when you press and hold the power button. the count increases by 1. When you release the power button, the count increases by 1. So the whole cycle of physically pressing a button gives you a count of 2. Press, and release, etc. The same for power good interrupts. When the power goes down, the count increases by 1, and when the power comes back it increases by 1 yet again.

For a total of a count of 2 for the whole cycle.

 

Yes, both button press, and power good events are tracked through this one file. As if that were not "cool enough", we really do not even need to have the acpid package installed any longer for tracking these interrupts any longer. Then dealing with these interrupts how we see fit in code.

 

For our purpose, we'll be using an external MCU to track AC power( actually 5v DC . . . ) coming into our own cape, and acting as an external watchog / reset manager for the beaglebone. As the PCB design was poorly designed for commercial applications. There is no test point, or any connection to perform a hard reset of the processor. So, the next best option is to have a battery on the cape, and a way for the beaglebone to know when power goes away, and time a shutdown. Here, there are two potential problems. First, if power just blips, or is down for less than a set amount of time. It may not be wise to shutdown immediately. Here is the other problem which is tied to the last. *If* it is determined a power down is not necessary for a set amount of time, and the power does indeed come back before that time frame is over. 5V on the USB bus goes away and never comes back until a board reset i performed. So, if one is using USB power, a reset will be required anyway, and may as well be immediately after the first time of power loss. With some sort of power monitoring / detection for a short duration before bringing the board back up. Which could be as simple as an R/C circuit, to as complex as an external MCU performing those duties. Like we're doing.

 

EDIT:

 

I should amend my statement about losing input power while having a battery connected. If the battery is of 5V or greater. USB should continue to run correctly. With that said, I'm not exactly sure if you're allowed to have a battery with more than 3.7v connected to the PMIC test points or not. We use a standard cell phone battery from years ago, which is LiPO / 3.7v .  . .

Share this post


Link to post
Share on other sites

EDIT:

 

I should amend my statement about losing input power while having a battery connected. If the battery is of 5V or greater. USB should continue to run correctly. With that said, I'm not exactly sure if you're allowed to have a battery with more than 3.7v connected to the PMIC test points or not. We use a standard cell phone battery from years ago, which is LiPO / 3.7v .  . .

 

I wrote a mini I2C library by wrapping  the Linux tool i2cget, in Nodejs. So as an example of how to use said "library", I decided to read the pmic's registers and parse out the data. As a result of this I had to read the tps65217 datasheet, and found that the pmic can actually be adjusted to charge at 4.2v. But higher does not look like a possibility.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×