Discussion:
Speed Issues on a slow CPU
Alan Cox
2007-10-03 12:56:36 UTC
Permalink
I costed me several month and a lot of gcc hacking to reclaim this
memory as general-purpose memory. So now we have a 36MByte uClinux
system, with 32 MByte of this memory is a bit slower than usual.
I couldn't find info on this to see what its performance hit was or if
you have put a small graphics accelerator library on the ARM7
I have nano-X and PIXIL up and running, but I am facing serious speed
issues. The reaction to a click on the touchscreen is slow, and the
calculator needs 1-2 seconds to display.
Thats slower than on an original IBM XT so bad
- define NDEBUG in nano-X drivers.
- add assembler code for horizontal and vertical lines in the driver.
- implement shared memory support.
Do you have gprof running on the system yet - embedded can have such
strange bottlenecks that gprof can reveal a lot - and you only need the
profiling side on the DS. You can do the analysis with cross tools on a
PC.


There are a couple of oddities I noted on the web site too btw:

"No. Because the NDS has no MMU, DSLinux has no virtual memory, so it
cannot swap at all."

Thats not totally true - you can swap entire apps to/from secondary
storage if you have any kind of segmentation (eg FCSE on
some ARM although the granularity is a bit high..) and/or PI code. You've
also presumably got protection ranges ?

BTW on

"Why doesn't DSLinux support reading from or writing to a CF"

if you've got specs for the CF interface and a tester thats probably easy
to fix now.
Amadeus
2007-10-03 18:20:49 UTC
Permalink
Alan,

glad to hear from you!
Post by Alan Cox
system, with 32 MByte of this memory is a bit slower than usual.
I couldn't find info on this to see what its performance hit was
The speed of a burst read is 120ns for 16 bit. Not much...

I have not investigated into running apps in thumb mode.
Post by Alan Cox
or
if you have put a small graphics accelerator library on the ARM7
No. The video memory is exported as a framebuffer to the ARM9 running
nano-X in 16bit RGB mode.
Post by Alan Cox
Do you have gprof running on the system yet - embedded can have such
strange bottlenecks that gprof can reveal a lot - and you only need
the profiling side on the DS. You can do the analysis with cross
tools on a PC.
I will look into gprof.
Post by Alan Cox
"No. Because the NDS has no MMU, DSLinux has no virtual memory, so it
cannot swap at all."
Thats not totally true - you can swap entire apps to/from secondary
storage if you have any kind of segmentation (eg FCSE on
some ARM although the granularity is a bit high..) and/or PI code.
Hmm.. swapping entire apps may be possible. The current state is that
the kernel and a minimal userland (busybox) are occupying 2 MBytes of
the internal RAM (XIP), and the other 2 MBytes are free for
applications.
Post by Alan Cox
You've also presumably got protection ranges ?
Yes. We use them for access control to the special memory regions of the
DS.
Post by Alan Cox
"Why doesn't DSLinux support reading from or writing to a CF"
if you've got specs for the CF interface and a tester thats probably
easy to fix now.
Where have you found that? It's outdated. With the incorporation of the
DLDI interface (http://dldi.drunkencoders.com) we have access to most
SD/CF based hardware on the DS.

There is one SERIOUS problem in this area I have not found a solution
for: as soon as FAT16 with 32 KByte cluster size is used (needed for
the common 2 GByte SD cards), DSLINUX has problems to handle them.
There are data aborts while directory traversal. I have not heard from
any other embedded system having this problem, and it looks rather
strange to me.

regards
Amadeus
--
We're back to the times when men were men
and wrote their own device drivers.

(Linus Torvalds)
Alan Cox
2007-10-03 18:35:58 UTC
Permalink
Post by Amadeus
The speed of a burst read is 120ns for 16 bit. Not much...
I have not investigated into running apps in thumb mode.
That may help, also putting the blitter functions into assembler and
using the ability to lock them into cache. This is where stuff like gprof
timing can reveal the true hotspots.
Post by Amadeus
Hmm.. swapping entire apps may be possible. The current state is that
the kernel and a minimal userland (busybox) are occupying 2 MBytes of
the internal RAM (XIP), and the other 2 MBytes are free for
applications.
Thats pretty tight. I'd assumed you were able to use the full 32MB as
well.
Post by Amadeus
There is one SERIOUS problem in this area I have not found a solution
for: as soon as FAT16 with 32 KByte cluster size is used (needed for
the common 2 GByte SD cards), DSLINUX has problems to handle them.
There are data aborts while directory traversal. I have not heard from
any other embedded system having this problem, and it looks rather
strange to me.
I've not seen similar reports at all, but I don't know how many people
are using FAT16 on such devices on a PC in Linux.

Alan
Amadeus
2007-10-04 17:20:42 UTC
Permalink
Hello Alan,
Post by Alan Cox
Post by Amadeus
The speed of a burst read is 120ns for 16 bit. Not much...
I have not investigated into running apps in thumb mode.
That may help, also putting the blitter functions into assembler and
using the ability to lock them into cache. This is where stuff like
gprof timing can reveal the true hotspots.
I have put the whole nano-X server into internal ram. This has helped,
but speed is still not acceptable. I will show if I can blitt in
assembler...
Post by Alan Cox
Thats pretty tight. I'd assumed you were able to use the full 32MB as
well.
Oops.. missunderstanding. I am able to use the full 32 MB as well.
Post by Alan Cox
Post by Amadeus
There is one SERIOUS problem in this area I have not found a
solution for: as soon as FAT16 with 32 KByte cluster size is used
(needed for the common 2 GByte SD cards), DSLINUX has problems to
handle them. There are data aborts while directory traversal. I
have not heard from any other embedded system having this problem,
and it looks rather strange to me.
I've not seen similar reports at all, but I don't know how many
people are using FAT16 on such devices on a PC in Linux.
I can open and use these cards on my desktop PC without problems...

regards
Amadeus
--
We're back to the times when men were men
and wrote their own device drivers.

(Linus Torvalds)
Amadeus
2007-10-03 09:15:03 UTC
Permalink
Hello,

this is my first post to this list. I am the gui who wants to get PIXIL
running on the Nintendo DS. This effort is hosted at www.dslinux.org.

The Nintendo DS is a dual-core machine. An ARM7 CPU @ 33 MHz for the
sound, IO and background tasks. And an ARM946es @ 66 MHz for the main
system.

There are several limitations and shortcommings on this system...
First of all, there is no MMU, only a MPU. So there is uClinux / Kernel
2.6.14 running.

Second, there are only 4 MByte of internal memory. There was a
possibility to expand this memory +32 MByte externally, but only on
16bit bus with only ONE(!) write strobe.

I costed me several month and a lot of gcc hacking to reclaim this
memory as general-purpose memory. So now we have a 36MByte uClinux
system, with 32 MByte of this memory is a bit slower than usual.

The screen is 256 x 192 pixel RGB 555.

There are no shared libraries and no dynamic loader on this system. I
have had some issues with PIXIL about this, but no big problems.

I have nano-X and PIXIL up and running, but I am facing serious speed
issues. The reaction to a click on the touchscreen is slow, and the
calculator needs 1-2 seconds to display.

What I have done so far:
- define NDEBUG in nano-X drivers.
- add assembler code for horizontal and vertical lines in the driver.
- implement shared memory support.

There were small improvements in speed, but nothing worth mention.
The load monitor applet is displaying a constant load of about 40% CPU
usage (I think because of the frequent screen updates in the load
monitor window).

So, can someone with more experience in nano-X and PIXIL explain which
are the most performance-critical parts of the system and how to
improve them?

regards
Amadeus
--
We're back to the times when men were men
and wrote their own device drivers.

(Linus Torvalds)
Loading...