Semantic patching with Coccinelle
Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net. |
We've all been there: You're tracking down some evil bug, and you have the sudden chilling realization that you're going to have to refactor an enormous chunk of code to fix it. You break out in a cold sweat as you run a quick grep over the source base: hundreds of lines of code to change! And the change is too complex to do with a script because it depends on the calling context, or requires adding a new variable to every caller.
This happened to me last month when I was adding support for 64-bit file systems to e2fsprogs. I thought I was nearly finished when I discovered I needed to write (yet another) new interface and convert (yet another) several hundred lines of code to it. The changes were complex enough that I couldn't use a script, and simple enough that I wanted to claw my eyes out with the soul-killing boredom of doing it by hand. That's when the maintainer, Theodore Ts'o, suggested I look at Coccinelle (a.k.a., spatch).
Coccinelle
Coccinelle is a tool to automatically analyze and rewrite C code. Coccinelle (pronounced cock'-see-nel) means "ladybug" in French, a name chosen because ladybugs eat other bugs. Coccinelle is not just another scripting language; it is aware of the structure of the C language and can make much more complex changes than are possible with pure string processing. For example, Coccinelle can make a particular change only in functions which are assigned to a function pointer in a particular type of array — say, thecreate
member
of struct inode_operations
.
The input to the tool is the file(s) to be changed and a "semantic patch," written in SmPL (Semantic Patch Language). SmPL looks a like a unified diff (a patch) with some C-like declarations mixed in. Here's an example:
@@ expression E; identifier fld; @@ - !E && !E->fld + !E || !E->fldThis semantic patch fixes the bug in which the pointer is tested for NULL — and then dereferenced if the pointer is NULL. An example of a bug this semantic patch found in the Linux kernel (and automatically generated the fix for):
--- a/drivers/pci/hotplug/cpqphp_ctrl.c +++ b/drivers/pci/hotplug/cpqphp_ctrl.c @@ -1139,7 +1139,7 @@ static u8 set_controller_speed(struct controller *ctrl, u8 adapter_speed, u8 hp_ for(slot = ctrl->slot; slot; slot = slot->next) { if (slot->device == (hp_slot + ctrl->slot_device_offset)) continue; - if (!slot->hotplug_slot && !slot->hotplug_slot->info) + if (!slot->hotplug_slot || !slot->hotplug_slot->info) continue; if (slot->hotplug_slot->info->adapter_status == 0) continue;(More on the semantic patch format later.)
Coccinelle is designed, written, and maintained by Julia Lawall at the Department of Computer Science at University of Copenhagen, Gilles Muller and Yoann Padioleau at the Ecole des Mines de Nantes, and René Rydhof Hansen at the Department of Computer Science of Aalborg University. Coccinelle is licensed under the GPL, however, it is written in OCaml, so the potential developer base is somewhat limited.
The original goal of Coccinelle was to automate as much as possible the task of keeping device drivers up to date with the latest kernel interfaces. But the end result can do far more than that, including finding and fixing bugs and coding style irregularities. Over 180 patches created using Coccinelle have been accepted into the Linux kernel to date.
Coccinelle quickstart
Like many languages, SmPL is best learned through example. We'll run through one simple example here just to get started. After that, the Coccinelle web page has some documentation and a plethora of examples.
First, download
Coccinelle and install it. I used the source version rather than
any of the precompiled options. The Coccinelle binary is
called spatch
.
As our example, say we have program with a lot of calls to
alloca()
that we would like to replace with
malloc()
. alloca()
allocates space on the
stack and can be more efficient and convenient than
malloc()
, but it is also compiler-dependent,
non-standard, easy to use incorrectly, and has undefined behavior on
failure. (Replacing alloca()
with malloc()
isn't enough, we also have to check the return value — but that
will come later.)
Here is the C file we are working on:
#include <alloca.h> int main(int argc, char *argv[]) { unsigned int bytes = 1024 * 1024; char *buf; /* allocate memory */ buf = alloca(bytes); return 0; }We could make the replacement using a scripting language like
sed
:
$ sed -i 's/alloca/malloc/g' test.cBut this will replace the string "alloca" anywhere it appears. The resulting diff:
--- test.c +++ /tmp/test.c @@ -1,4 +1,4 @@ -#include <alloca.h> +#include <malloc.h> int main(int argc, char *argv[]) @@ -6,8 +6,8 @@ unsigned int bytes = 1024 * 1024; char *buf; - /* allocate memory */ - buf = alloca(bytes); + /* mallocte memory */ + buf = malloc(bytes); return 0; }We can tweak our script to handle 90% of the cases:
$ sed -i 's/alloca(/malloc(/g' test.cBut this script doesn't handle the case where a second function name has the first as a suffix, it depends on a particular coding style in which no white space comes between the function name and the open parenthesis, etc., etc. By now our simple
sed
script is a
hundred-character monster. It can be done, but it's a pain.
In Coccinelle, we'd use the following semantic patch:
@@ expression E; @@ -alloca(E) +malloc(E)Put the C file in
test.c
and the above semantic
patch in test.cocci
and run it like so:
$ spatch -sp_file test.cocci test.cIt should produce the following diff:
--- test.c +++ /tmp/cocci-output-17416-b5450d-test.c @@ -7,7 +7,7 @@ main(int argc, char *argv[]) char *buf; /* allocate memory */ - buf = alloca(bytes); + buf = malloc(bytes); return 0; }Let's look at the semantic patch line by line.
@@ expression E; @@This declares the "metavariable" E as a variable that can match any expression — e.g.,
1 +
2
, sizeof(x)
, strlen(name) + sizeof(x) *
72
. When spatch processes the input, it sets the value of E to
the argument to alloca()
. The "@@ @@
" syntax is
chosen
to resemble the line in a unified diff describing the lines to be
patched. I don't find the resemblance particularly helpful, but the
intention is well-taken.
-alloca(E)This line says to remove any call to the function
alloca()
, and to save its argument in the
metavariable E for later use.
+malloc(E)And this line says to replace the call to
alloca()
with a
call to malloc()
and use the value of metavariable E as
its argument.
Now, we also want to check the return value of malloc()
and return an error if it failed. We can do that too:
@@ expression E; identifier ptr; @@ -ptr = alloca(E); +ptr = malloc(E); +if (ptr == NULL) + return 1;The resulting diff:
--- test.c +++ /tmp/cocci-output-17494-22a573-test.c @@ -7,7 +7,8 @@ main(int argc, char *argv[]) char *buf; /* allocate memory */ - buf = alloca(bytes); + buf = malloc(bytes); + if (buf == NULL) + return 1; return 0; }Semantic patches can be far more complex. One of my favorite examples is the move of reference counting of the
Scsi_Host
structure out of drivers. Changing this required adding an argument
to the function signature and removing a declaration and several other
lines from each SCSI driver's proc_info
function. The semantic patch, explained in detail in their OLS
2007 slides
[PPT] [ODP],
does all of this automatically. I recommend reading and re-reading this
example
until it sinks in.
Experience
My first experience with Coccinelle was mixed. In theory, Coccinelle does exactly what I want — automate complex changes to code — but in practice the implementation is beta quality. I successfully used Coccinelle to make hundreds of lines of changes with less than a hundred lines of semantic patches, but only after working directly with the developers to get bug fixes and help figuring out SmPL features. Coccinelle is one of those schizophrenic projects situated on the boundary between academic research and practical software development.
One of the first hurdles I had to overcome was teaching Coccinelle
about the macros in my code. Coccinelle has to do all its own parsing
and pre-processing — you can't just run the input C code through
cpp because then you'd have to map the post-processor output back to
the original code. Macros will sometimes confuse it enough that it
gives up parsing a function until it reaches the next safe grammatical
starting point (e.g., the next function) — which may mean that it
doesn't process most of the file. To get around this, you can create
a list of macros and feed them to spatch with
the -macro_file
option. (Yes, that's one dash — one
of my pet peeves about Coccinelle is the non-standard command-line
option style.) For example, here are a few lines from the macro file I
used for e2fsprogs:
#define EXT2FS_ATTR(a) #define _INLINE_ inline #define ATTR(a)You can build the list of macros by hand, but spatch has a feature that helps find them automatically. The
-parse_c
option
makes spatch list the top ten parsing errors, which will include the
macro name. For example, some of the output from running spatch
-parse_c
on e2fsprogs:
EXT2FS_ATTR: present in 85 parsing errors example: static int check_and_change_inodes(ext2_ino_t dir, int entry EXT2FS_ATTR((unused)), struct ext2_dir_entry *dirent, int offset, int blocksize EXT2FS_ATTR((unused)),Coccinelle has improved significantly in the past few weeks. The 0.1.2 release had a number of bugs that made spatch unusable for me. The next release, 0.1.3, fixed those bugs and with it I was able to make practical, real-world patches. The 0.1.4 release will be out shortly. The developers wrote and released more documentation, including a description of all the command-line options [PDF] and a grammar for SmPL. Many more example spatch scripts are available now. The best reference for learning Coccinelle continues to be the slides from their 2007 OLS tutorial and the associated paper [PDF]. White space handling is improving; originally Coccinelle didn't care much about white space and frequently mangled transformations involving it, which is a problem if you want to take the hand out of hand-editing. One of my semantic patches left a dangling semi-colon in the middle; the developers sent me a patch to fix it within a few days.
One thing I am absolutely certain of: learning Coccinelle and writing semantic patches was way more fun than making the changes by hand or using regular expressions. I also had much greater confidence that my changes were correct; it is remarkably pleasant to make several hundred lines of changes and have the result compile cleanly and pass the regression tests the first time.
Related work
If you really want to, you can do everything Coccinelle can do by writing your own scripts — after all, code is code. But you have to deal with all the little corner cases — e.g., to C, white space is all the same, generally speaking, but regular expressions care intensely about the difference between a space, a newline, and a tab. Use the right tool for the job — if you're just replacing a variable name and your first script works, great. If you're changing a calling convention or moving the allocation and freeing of an object to another context, give a tool like Coccinelle a try.In terms of power and flexibility, Coccinelle is similar to the Stanford compiler checker [PDF] (commercialized by Coverity). While the compiler checker is far more mature and has better flow analysis and parsing, Coccinelle can generate code to fix the bugs it finds. Most importantly, Coccinelle is open source, so developers can find and fix bugs themselves.
Some IDEs include tools to automatically refactor code, which is one aspect of what Coccinelle does. I have never personally used one of these IDE refactoring tools and can't compare it with Coccinelle, but my friends who have report that their stability leaves something to be desired. Xrefactory is a refactoring tool available on *NIX platforms which is fully integrated with Emacs and XEmacs. It is not open source and requires the purchase of a license, although one version is available for use free of charge.
Conclusion
Coccinelle is an open source tool that can analyze and transform C code according to specified rules, or semantic patches. Semantic patches are much more powerful than patches or regular expressions. The tool is beta quality right now but usable for practical tasks and the developers are very responsive. It's worth learning for any developer making a non-trivial interface change.Index entries for this article | |
---|---|
Kernel | Development tools/Coccinelle |
GuestArticles | Aurora (Henson), Valerie |
(Log in to post comments)
Semantic patching with Coccinelle
Posted Jan 20, 2009 22:34 UTC (Tue) by biehl (subscriber, #14636) [Link]
Dehydra ( https://developer.mozilla.org/en/Dehydra )
GTK-rewriter ( http://people.imendio.com/richard/gtk-rewriter/ )
and maybe other tools ( http://blog.mozilla.com/tglek/2008/09/02/converging-elsa-... ) to Coccinelle?
Semantic patching with Coccinelle
Posted Jan 21, 2009 1:30 UTC (Wed) by padator (guest, #56235) [Link]
does not support any transformation (just analysis, and you have
to write javascripts code apparently to match over C code), and gtk-rewrite
have a few transformations hard-coded in a file. The goal of coccinelle
is to make it easy to specify code patterns and transformations. You
use a syntax you already know for that: C (not javascripts on ASTs), and
the patch syntax.
For elsa I can not speak, it's not clear what is their goal
and what they can do.
Semantic patching with Coccinelle
Posted Jan 20, 2009 22:41 UTC (Tue) by Thue (guest, #14277) [Link]
it is written in OCaml, so the potential developer base is somewhat limitedEvery programmer should learn to write ML or another functional language (OCaml is an object-oriented version of ML). It is the first language taught at the Department of Computer Science at University of Copenhagen, so there is at least some people who know it.
For some kinds of programs ML code is 1/3 the size of an equivalent imperative program, as well as more readable and easier to verify for correctness. ML has excellent compile time checking; If your ML program compiles it will usually also run correctly.
Many of the computer science students who learn ML keeps it as their favorite language. In my experience it is especially the best students who 'gets it' and likes ML.
Semantic patching with Coccinelle
Posted Jan 21, 2009 1:33 UTC (Wed) by padator (guest, #56235) [Link]
Semantic patching with Coccinelle
Posted Jan 21, 2009 18:45 UTC (Wed) by felixfix (subscriber, #242) [Link]
All comments to the contrary are wishful thinking (If I had some ham, I could have ham and eggs, if I had some eggs) and begging for a language war.
Semantic patching with Coccinelle
Posted Jan 21, 2009 19:45 UTC (Wed) by rwmj (subscriber, #5474) [Link]
Learn something new ...
Semantic patching with Coccinelle
Posted Jan 21, 2009 20:32 UTC (Wed) by felixfix (subscriber, #242) [Link]
You remind me of people who criticize where I go on vacation and what I do. "You could have gone to xxx and done yyy." Yeh, well, no matter where I go and what I do, I could have gone somewhere else and done something else.
Time and resources are limited. Some people would rather get on with the doing rather than learn new ways to not do things they don't have time for because they spend all their time learning new ways they won't use.
Semantic patching with Coccinelle
Posted Jan 21, 2009 22:23 UTC (Wed) by rwmj (subscriber, #5474) [Link]
always been done.
Semantic patching with Coccinelle
Posted Jan 21, 2009 23:49 UTC (Wed) by felixfix (subscriber, #242) [Link]
It's just a simple fact. It has nothing to do with the benefits of education, of new and improved ways of doing things. NOTHING.
Quit taking it personally. It has zero to do with you personally, your personal taste in languages or living styles, what would be best or ideal or anything. It is a simple fact of counting heads. More people know languages other than OCaml and could contribute in those languages.
Separate raw data from your personal wishes and dreams. Valerie wrote a fact. Dispute the fact if you want, but don't ramble on about learning and better ways and so on, those are not facts.
Semantic patching with Coccinelle
Posted Jan 23, 2009 1:37 UTC (Fri) by giraffedata (guest, #1954) [Link]
Looks to me like you're putting words in rwmj's mouth and then disagreeing with them. I don't see that rwmj has taken issue with Valerie's conclusions about the language choice.Not every comment is a contradiction of the parent, and I wouldn't assume that "learn something new" was meant to say, "there's nothing unfortunate about the fact that this code is in OCaml."
Of course, I'm not really sure how "learn something new" does fit into the thread. The posts after it follow more obviously: you point out that learning something new isn't always the right thing and rwmj misreads that as learning something new is never the right thing and disagrees. While that position (learning something new is sometimes good) is obviously right, you respond as if he were arguing -- still -- that there's nothing unfortunate about the fact that this code is in OCaml.
Semantic patching with Coccinelle
Posted Jan 23, 2009 5:41 UTC (Fri) by rwmj (subscriber, #5474) [Link]
I was just being sarcastic in that second posting. OCaml and Haskell are something new. They're not just exotic scripting languages - in the way that Ruby is just Perl with a different syntax. They are something considerably more powerful and expressive that can take programming in new directions. Unfortunately explaining this is a bit like the Paul Graham explaining LISP to "Blub" programmers.
Semantic patching with Coccinelle
Posted Jan 23, 2009 8:09 UTC (Fri) by hppnq (guest, #14462) [Link]
Unfortunately explaining this is a bit like the Paul Graham explaining LISP to "Blub" programmers.
It's hard to find explanations that do a worse job of introducing people to Lisp. There's a good explanation of Haskell (PDF), including a bit of history, design choices and an overview of the functional programming paradigm.
Haskell, by the way, is roughly of the same age as Python, but is expected to become the next great programming language Any Moment Now. Or maybe not.
Semantic patching with Coccinelle
Posted Jan 22, 2009 6:49 UTC (Thu) by njs (guest, #40338) [Link]
Writing in a niche language *can* have the opposite effect on finding contributors, though. Darcs for instance benefited quite a bit from being written in Haskell, because there were many people who had learned the language out of interest and really wanted to work on something in Haskell, but not many real-world projects to go around. Its competitors were written in better known languages, but their potential contributor base was correspondingly diluted by all the other projects also written in those languages...
This effect does exist but it's short-lived...
Posted Jan 22, 2009 10:29 UTC (Thu) by khim (subscriber, #9252) [Link]
Yes, darcs benefited for a time from the fact that it could attract all these people - but the end result was the same: when C crowd got it's shiny new bauble (Git) all other projects were left in dust...
Sometimes it's good idea to use non-mainstream language because it's the only way to produce something and you don't need many contributors: one of the most popular DFT library (FFTW) is written in OCaml (well, kinda). But it does limit number of potential contributors! No way to avoid this...
Semantic patching with Coccinelle
Posted Jan 21, 2009 10:48 UTC (Wed) by Yorick (guest, #19241) [Link]
First, many thanks to Valerie Henson for an excellent article and for reminding us of the existence of Coccinelle which appears to be a fine tool.But I must agree with Thue. Statements on the form X is written in Y, so the potential developer base is somewhat limited where Y is a language not well-known by the speaker are misleading. Any competent and motivated programmer will quickly learn a language such as ML, Scheme or Haskell in order to contribute to a project.
We are not talking about exoticisms like Befunge or Brainfuck here but standard, well-known, well-documented and widely-taught languages. A notation well suited to the task makes the task easier; for nontrivial applications, the complexity lies in the problem domain. Anyone who has worked with GCC will attest that the fact that it is (mostly) written in C does not make it easier to understand.
Semantic patching with Coccinelle
Posted Jan 21, 2009 12:35 UTC (Wed) by hppnq (guest, #14462) [Link]
Statements on the form X is written in Y, so the potential developer base is somewhat limited where Y is a language not well-known by the speaker are misleading. Any competent and motivated programmer will quickly learn a language such as ML, Scheme or Haskell in order to contribute to a project.
This is not the common practice, of course, if only because there are a lot more incompetent programmers than programmers who quickly learn Haskell. Whether this is true for any language Y and any project X is an interesting question.
But of the statements S in publication P, I would say yours is more sweeping than Valerie's. ;-)
Semantic patching with Coccinelle
Posted Jan 22, 2009 4:11 UTC (Thu) by ncm (guest, #165) [Link]
A formal definition and multiple implementations help to reassure coders that time spent learning the language and writing reams of code in it won't end up wasted when, e.g., developers of the sole implementation lose interest and leave it orphaned. (NB: I am not saying I expect this to happen to OCaml.) It is precisely this quality, and nothing about the details of the language design, that make apt Ms. Henson's remark about Coccinelle's potential developer base.
Semantic patching with Coccinelle
Posted Jan 22, 2009 6:10 UTC (Thu) by shimei (guest, #54776) [Link]
Semantic patching with Coccinelle
Posted Jan 22, 2009 5:43 UTC (Thu) by i3839 (guest, #31386) [Link]
That said, how the reception of outside contributions is by the main developers has a bigger impact than what language is used...
Semantic patching with Coccinelle
Posted Jan 22, 2009 10:37 UTC (Thu) by Yorick (guest, #19241) [Link]
But what I really wanted to challenge is the sad prevailing idea that some languages are "common" and the rest "strange". The statement in the article could be interpreted that way, although I am confident that Ms Henson does not suffer from that delusion herself. It is not helpful in making "uncommon" languages less so, even when this has great merit.
(Also, some "helpful" contributions that you receive as a maintainer of a free software package makes you wonder if the language-as-barrier is such a bad idea...)
Semantic patching with Coccinelle
Posted Jan 22, 2009 13:43 UTC (Thu) by hppnq (guest, #14462) [Link]
But what I really wanted to challenge is the sad prevailing idea that some languages are "common" and the rest "strange".
To prove your point, maybe you should write patches for Coccinelle so that it can produce semantic patches for OCaml?
Semantic patching with Coccinelle
Posted Jan 22, 2009 16:18 UTC (Thu) by rwmj (subscriber, #5474) [Link]
To prove your point, maybe you should write patches for Coccinelle so that it can produce semantic patches for OCaml?
OCaml actually supports the principle of semantic patching natively. You can perform almost arbitrary transformations of the abstract syntax tree at compile time, and this feature is used to implement interesting new features like Erlang-style bitstrings, type-safe access to databases, type-safe regular expressions, and much more.
Of course this is "strange" to many. (LISP programmers might recognise them as a very much more powerful version of LISP macros). But this is just one of the several ways that OCaml (and Haskell) are far beyond common programming languages.
Rich.
Semantic patching with Coccinelle
Posted Jan 22, 2009 16:28 UTC (Thu) by padator (guest, #56235) [Link]
This is not true. What you are talking about is different and is called
meta-programming. The need to refactor code is different. Even in OCaml
you often need to refactor code and there is no tool right now for OCaml
that does that. In fact we, in the coccinelle project, had in the past internally needed to refactor the coccinelle code and it was painful.
So I guess the comment of the other guy was right on the point; we decided to do
a semantic patching tool for C rather than a semantic patching tool for OCaml because there are more people writing C code :)
Semantic patching with Coccinelle
Posted Jan 23, 2009 0:42 UTC (Fri) by nix (subscriber, #2304) [Link]
tool (called, perhaps, ocamelle), in C, which carries out such
transformations on OCaml code. ;}
Semantic patching with Coccinelle
Posted Jan 23, 2009 5:38 UTC (Fri) by rwmj (subscriber, #5474) [Link]
I didn't mean that semantic patching was used in the same way as metaprogramming, but they are certainly analogous to each other. In one case, the transformed code is applied as a patch back on the source. In the other case, the transformed code is immediately passed to the compiler.
Anyhow .. for OCaml refactoring, Jane Street sponsored this project last summer. It's also something that Eclipse + the OCaml Eclipse plugin claims to do. I have not used either.
Semantic patching with Coccinelle
Posted Jan 22, 2009 6:36 UTC (Thu) by dirtyepic (guest, #30178) [Link]
Statements on the form X is written in Y, so the potential developer base is somewhat limited where Y is a language not well-known by the speaker are misleading. Any competent and motivated programmer will quickly learn a language such as ML, Scheme or Haskell in order to contribute to a project.The point is that the total number of people who know ML or will learn it in the near future is less than the total number of people who know or will know C/python/etc, just as a book written in Ukrainian has a smaller potential audience than one written in English. You don't have to know Ukrainian to make that observation, just how to count.
Semantic patching with Coccinelle
Posted Jan 22, 2009 6:58 UTC (Thu) by padator (guest, #56235) [Link]
Semantic patching with Coccinelle
Posted Jan 22, 2009 17:40 UTC (Thu) by dirtyepic (guest, #30178) [Link]
Semantic patching with Coccinelle
Posted Jan 22, 2009 12:21 UTC (Thu) by mjg59 (subscriber, #23239) [Link]
Semantic patching with Coccinelle
Posted Jan 23, 2009 0:40 UTC (Fri) by lysse (guest, #3190) [Link]
Semantic patching with Coccinelle
Posted Jan 21, 2009 2:23 UTC (Wed) by zooko (guest, #2589) [Link]
Semantic patching with Coccinelle
Posted Jan 21, 2009 10:05 UTC (Wed) by wingo (guest, #26929) [Link]
I also like turning tedium into tools problems -- it probably takes the same amount of time but writing tools is much more fun.
Semantic patching with Coccinelle
Posted Jan 21, 2009 18:46 UTC (Wed) by ndk (subscriber, #43509) [Link]
"I'd rather write programs that write programs, than write programs."
Semantic patching with Coccinelle
Posted Jan 22, 2009 1:30 UTC (Thu) by sitaram (guest, #5959) [Link]
If it wasn't, it should be :-) Sounds like him...!
Semantic patching with Coccinelle
Posted Jan 22, 2009 2:32 UTC (Thu) by JoeBuck (subscriber, #2330) [Link]
It appears that Richard Sites, who was a student of Donald Knuth, said it first (or at least before Larry Wall did), sometime in the 1970s.
Semantic patching with Coccinelle
Posted Jan 22, 2009 5:46 UTC (Thu) by Mithrandir (guest, #3031) [Link]
Semantic patching with Coccinelle
Posted Jan 22, 2009 8:35 UTC (Thu) by nix (subscriber, #2304) [Link]
than doing the actual job, even when the job was a one-off, in _Last
Chance to See_. Given that this was a book about conservation this was a
somewhat strange choice :)
Semantic patching with Coccinelle
Posted Jan 23, 2009 0:47 UTC (Fri) by lysse (guest, #3190) [Link]
Semantic patching with Coccinelle in Fedora
Posted Jan 21, 2009 22:22 UTC (Wed) by rwmj (subscriber, #5474) [Link]
https://bugzilla.redhat.com/show_bug.cgi?id=481034
Semantic patching with Coccinelle in Fedora
Posted Jan 22, 2009 20:56 UTC (Thu) by vaurora (subscriber, #38407) [Link]
Semantic patching with Coccinelle in Fedora
Posted Jan 22, 2009 21:23 UTC (Thu) by eugeniy (subscriber, #24280) [Link]
Semantic patching with Coccinelle in Fedora
Posted Jan 23, 2009 3:49 UTC (Fri) by vaurora (subscriber, #38407) [Link]
coccinelle problems
Posted Jan 22, 2009 7:57 UTC (Thu) by Octavian (guest, #7462) [Link]
- struct foo my_foo[] = {
- .a = 1,
- .u.b = 42,
- }
+ FOO(1, 42)
However, after contacting Julia Lawal I can say that she was always very helpfull at fixing the issues I found.
coccinelle problems
Posted Jan 22, 2009 12:42 UTC (Thu) by lawall (guest, #56234) [Link]
julia
coccinelle problems
Posted Jan 30, 2009 9:09 UTC (Fri) by lawall (guest, #56234) [Link]
Note that this refers to the support for structures in SmPL, ie the language in which semantic patches are written. The C parser supports all kinds of structures with no problem, and indeed all of C, with various extensions as used in Linux code.
julia
coccinelle problems
Posted Feb 5, 2009 8:46 UTC (Thu) by lawall (guest, #56234) [Link]
julia
Wrong Section?!
Posted Jan 22, 2009 12:07 UTC (Thu) by nikanth (guest, #50093) [Link]
Wrong Section?!
Posted Jan 22, 2009 19:34 UTC (Thu) by egoforth (subscriber, #2351) [Link]
I wonder why is this article published under Kernel section?!I would posit that it's because this is the Kernel development section, and there has already been a measured impact.
The original goal of Coccinelle was to automate as much as possible the task of keeping device drivers up to date with the latest kernel interfaces. But the end result can do far more than that, including finding and fixing bugs and coding style irregularities. Over 180 patches created using Coccinelle have been accepted into the Linux kernel to date.
Semantic patching with Coccinelle
Posted Jan 22, 2009 13:26 UTC (Thu) by johill (subscriber, #25196) [Link]
Thanks for the article, I anticipate the .4 release and this is the first I heard of it :) One thing I've been battling with recently that unfortunately it doesn't support is modifying printf-style formats. I'd love to write something like@@ string A, B; expression M, MBUF; @@ -printk(... "%s" ..., ..., print_mac(MBUF, M), ...); +printk(... "%pM" ..., ..., M, ...);but obviously parameter matching is quite a hard task and definitely needs different syntax than what I just invented on the spot. But even without that, I've used it a few times already, if only to detect problems, e.g. http://thread.gmane.org/gmane.linux.kernel.wireless.general/26371 (though that case required filtering manually for the correct places)
Semantic patching with Coccinelle
Posted Jan 22, 2009 13:27 UTC (Thu) by johill (subscriber, #25196) [Link]
Semantic patching with Coccinelle
Posted Jan 28, 2009 14:35 UTC (Wed) by dmk (guest, #50141) [Link]
(i feel so proud!)
...to wander further off topic..
Semantic patching with Coccinelle
Posted Jan 26, 2009 10:19 UTC (Mon) by jmmc (guest, #34939) [Link]
Related work on C/C++ refactoring
Posted Jan 29, 2009 8:29 UTC (Thu) by tglek (guest, #56374) [Link]
I wrote a blog post about it:
http://blog.mozilla.com/tglek/2009/01/29/semantic-rewriti...
As an interesting tidbit: since Pork itself is largely written in C++ I actually used it to refactor itself, I briefly ranted about it in http://blog.mozilla.com/tglek/2008/07/25/pull-pork-with-c...
With regards to C preprocessor issues mentioned, I worked with MCPP(http://sourceforge.net/projects/mcpp/) author to enable MCPP to produce a special form of preprocessed files that makes dealing with macros much simpler. See -K option in MCPP.
Related work on C/C++ refactoring
Posted Jan 30, 2009 7:03 UTC (Fri) by padator (guest, #56235) [Link]
code in arch/, in code protected by ifdef, etc.
Regarding CPP, how MCPP handles ifdef ? How much of the linux kernel can you analyse ? How many lines MCPP skip to make your parsing job easier ?
How do you handle iterator macros ? In the case of coccinelle we need sometimes to express transformations on macros, on iterators, declarors, and so we must try to not expand macro and represent macro directly in the internal AST.
Related work on C/C++ refactoring
Posted Jan 30, 2009 19:24 UTC (Fri) by tglek (guest, #56374) [Link]
So for the rename it'd be a matter of running
./renamer ::alloca ::malloc on the files of interest
For the other change, who knows could be a few lines of C++ or could be a few thousand depending if the complexity of the attempted change(pork's changes can be as complex as the user desires due to having a fully elaborated AST).
Related work on C/C++ refactoring
Posted Jan 30, 2009 19:42 UTC (Fri) by tglek (guest, #56374) [Link]
Instead Pork allows detection of code where macros interfere with a particular refactoring so it can produce an error message and inform the user that some manual help is needed for that particular bad macro. Usually that turns out to be a trivial amount of effort(in the worst possible case it took a couple of days to compensate for it).
Related work on C/C++ refactoring
Posted Jan 30, 2009 19:50 UTC (Fri) by padator (guest, #56235) [Link]
protected by #ifdef DEBUG ?
Coccinelle has also internally a fully elaborated AST and one can write any transformations in OCaml working on this AST but this is precisely what we want to avoid with Coccinelle. We don't want users to express their transformations on the AST but instead to express it easily using our SmPL patch syntax.
Also you didn't really answer my question, how long will it take using Pork to express the second program transformation mentioned in val's article about renaming malloc and also adding the pointer checking code for NULL.
Regular Expressions Plus?
Posted Feb 1, 2009 1:52 UTC (Sun) by ldo (guest, #40946) [Link]
Is this a job for a packrat parser?
Regular Expressions Plus?
Posted Feb 2, 2009 0:30 UTC (Mon) by padator (guest, #56235) [Link]
generic tools using grammars as parameters takes time to implement
and are not always useful.You have to know more than just the
syntactic structure of a programming language to make something interesting.
Emacs/Eclipse knows about the grammar of many programming languages.
Moreover, Coccinelle is not just a search/replace of syntactic constructs.
You have expression, function, and statement metavariables allowing
to match and move code and you can specify constraints about the
context of those entities.As val said:
"can make a particular change only in functions which are assigned to a function pointer in a particular type of array say, the create member of struct inode_operations." You need a way to specify such constraint.
I don't really understand how a packrat parser would help for that ...
Re: Regular Expressions Plus?
Posted Feb 2, 2009 1:34 UTC (Mon) by ldo (guest, #40946) [Link]
padator wrote:
Moreover, Coccinelle is not just a search/replace of syntactic constructs. You have expression, function, and statement metavariables allowing to match and move code and you can specify constraints about the context of those entities.
Yes, but Im pretty sure that those constraints are all, in principle, expressible using well-known techniques like two-level grammars and attribute grammars.
Im not saying its a simple thing to do, but nevertheless it seems useful to have such prebuilt grammars for different languages, working with a common core of code, rather than writing different code for different languages.
Just a thought.
Semantic patching with Coccinelle
Posted Feb 3, 2009 13:56 UTC (Tue) by robbe (guest, #16131) [Link]
that leaks (remember that memory reserved with alloca() is automatically
freed, malloc() does not have this property).
All the while I was hoping to see a third run of the tool which added the
missing free() call before every return statement. Is that possible with
Coccinelle?
Semantic patching with Coccinelle
Posted Feb 3, 2009 21:27 UTC (Tue) by padator (guest, #56235) [Link]
Yes coccinelle can do that too.
Here is an example of a better semantic patch:
@@
expression E;
identifier ptr;
identifier func;
@@
func(...) {
...
- ptr = alloca(E);
+ ptr = malloc(E);
+ if (ptr == NULL)
+ return 1;
...
+ free(ptr);
return ...;
}
Note that the coccinelle engine will take care to add the call to free() to all control flow paths before a return. Here is an example of a patch produced by spatch on a simple C file:
./spatch -sp_file demos/lwn.cocci demos/lwn.c
--- demos/lwn.c 2009-02-03 15:10:38.000000000 -0600
+++ /tmp/cocci-output-22113-f80295-lwn.c 2009-02-03 15:15:05.000000000 -0600
@@ -3,12 +3,17 @@ void main(int argc, char *argv[])
char *buf;
/* allocate memory */
- buf = alloca(bytes);
+ buf = malloc(bytes);
+ if (buf == NULL)
+ return 1;
- if(argc == 0)
+ if(argc == 0) {
+ free(buf);
return 0;
+ }
+ free(buf);
return 1;
}
note: see also how (beautifully) coccinelle adds the necessary { } after the if to make it a compound statement. Coccinelle also put
the correct indentation each time, even if the LWN html page does not
show it because of html space mangling I guess.
Coccinelle output
Posted Jul 10, 2015 14:10 UTC (Fri) by bou6 (guest, #103486) [Link]
Currently I'am following the steps of this article
1)I created the c file
2)I created the coccinelle script
3)I run it using
$ spatch -sp_file test.cocci test.c
In the terminal I got the expected result as mentioned in the article
--- test.c
+++ /tmp/cocci-output-17416-b5450d-test.c
@@ -7,7 +7,7 @@ main(int argc, char *argv[])
char *buf;
/* allocate memory */
- buf = alloca(bytes);
+ buf = malloc(bytes);
return 0;
}
However the c file didn't change as expected.
Can any body tell me where can I get the changes made by the script?
Coccinelle output
Posted Oct 22, 2015 9:22 UTC (Thu) by mfrw (subscriber, #100251) [Link]
$ spatch --sp-file test.cocci --in-place test.c
By default spatch just prints the output on the standard output, by this it will change the file also