CosmoCoffee

Luca Amendola · Post by **Luca Amendola** » April 30 2006

Hi all

I hope this is the correct forum. I am experimenting with a web program that makes some bibliometric statistics (citations etc) on list of names (eg you can input the telephone directory of your department and get a ranking...). It uses the databases at NASA-ADS. If you want to experiment with it and give me some feedback, here it is:

http://odisseo.mporzio.astro.it/bibliomatrix

Luca Amendola

Post by **Sarah Bridle** » May 01 2006

Quite fun, thanks!
Some feedback: it currently appears to strip off all but the first letter of first name, e.g. "Bridle, Sarah" -> "Bridle, S" and seems to miss some papers (is it scanning PRD? or does it miss papers on which I am Sarah instead of S?).
Sarah

Alessandro Melchiorri · Post by **Alessandro Melchiorri** » May 02 2006

Impressive ! I think you should write to the NASA-ADS people: they should implement it on their web site.

Alessandro Melchiorri · Post by **Alessandro Melchiorri** » May 02 2006

Hi,
Since we are entering a "citations madness mode" I think there is one interesting thing you could do (in my opinion) :-)
We know that first authorships are important. However often the papers are in alphabetical order and people with names starting with A will statistically have more first authorships than people with names starting with Z.
If, with the power of bibliomatrix, you make a graph of first authorships in function of the first letter of the name you should see this correlation. If the correlation is strong, you may think to include a new prefactor that should correct for the "alphabetical order effect" the number of first authorships.

Does it makes any sense ? :-)

Cheers
Alessandro

Boud Roukema · Post by **Boud Roukema** » May 04 2006

Alessandro Melchiorri wrote: We know that first authorships are important. However often the papers are in alphabetical order and people with names starting with A will statistically have more first authorships than people with names starting with Z.
If, with the power of bibliomatrix, you make a graph of first authorships in function of the first letter of the name you should see this correlation. If the correlation is strong, you may think to include a new prefactor that should correct for the "alphabetical order effect" the number of first authorships.

Does it makes any sense ? :-)

Cheers
Alessandro

Well, I don't think people with names like Amendola or Bridle are likely to be unbiased in deciding the relevance of this issue ;).

Luca Amendola · Post by **Luca Amendola** » May 09 2006

Thanks for the feedback.
To Sarah: my program should retrieve all of your papers listed at NASA-ADS (and not Spires or anything else) with author Bridle, S. regardless of whether they have been originally submitted as Sarah, Sara or any other S. name (but click on the checkboxes at the bottom for a complete search). If you find some mismatch please let me know exactly which paper is missing.
To Alessandro: I got several suggestions for alternative weight schemes. In fact I am thinking in the future of leaving the choice of weighing scheme to the user (so we can have fun finding the weights that maximimise our performance...MCMC lovers are invited).

Luca

David Fonseca Mota · Post by **David Fonseca Mota** » May 11 2006

Hi Luca,

Nice! Thanks for setting up the website.
I just wonder why it is not possible to also include SPIRES references
and citations in your scheme. NASA-ADS misses several citations when one
searches for more theoretical physics articles. Which is a pity if one
works in the interface of particle physics and astrophysics... in my case
it misses about 30% of the citations... :)

Isn't it possible to use both databases and be sure a reference/citation
is only counted once in the search?

All the best,
David

Post by **Sarah Bridle** » May 11 2006

Hi Luca,
Thanks for the tip about the check boxes. I had to check the physics one to get some PRD papers. But then it does find everything.
Sarah

Boud Roukema · Post by **Boud Roukema** » May 12 2006

Wishlist

The ADS has various options for disambiguation, e.g. to distinguish Bond, J. Richard from Bond, James The Name Is, or combining with institute names with AND, etc. - this would be a useful feature for people not wanting to have to correct this by hand.

features

bibliomatrix requires ascii characters, but since the ADS seems to assume that ł and l are equivalent (ł is a polish letter), this functions OK (e.g. test on Krełowski seems to give the same result either by hand through ADS or through bibliomatrix)

debugging

Going through the ADS by hand, ticking astro + phys + arXiv, I get 378 citations for all my articles (Roukema B.), but with bibliomatrix I get 421. This is either a bug or a compliment ;). There's not much ambiguity problem with my name. There's a retired(?) US senator (I think) with my surname who gets a good google ranking, but she doesn't seem to publish astronomy articles (luckily).

Luca - i'm just wondering if you could trace the difference here. From neural memory, I think that when I checked bibliomatrix a few days ago, it gave me about 376, so if you changed the code in the last few days, maybe that might explain the difference of an extra 45 citations?

Luca Amendola · Post by **Luca Amendola** » May 16 2006

Going through the ADS by hand, ticking astro + phys + arXiv, I get 378 citations for all my articles (Roukema B.), but with bibliomatrix I get 421. This is either a bug or a compliment ;). There's not much ambiguity problem with my name. There's a retired(?) US senator (I think) with my surname who gets a good google ranking, but she doesn't seem to publish astronomy articles (luckily).

The problem was infact due to the way ADS dealt with dates of preprints, assigning them both the date of the preprint and the date of the final publication. This, combined with Bibliomatrix's time splitting procedure for long list of names, caused in some rare cases double-counting of some papers. This problem has now been solved by the (incredibly efficient) staff at ADS and, as a consequence, also Bibliomatrix is now working properly. Unfortunately this means that there are no longer citations for free.

Boud Roukema · Post by **Boud Roukema** » May 16 2006

Thanks for debugging so quickly :). I checked and it looks OK now.

Luca Amendola wrote:Unfortunately this means that there are no longer citations for free.

Just to clarify this for people who haven't been following the last few comments: Luca means there are no longer "gratuitous" citations (extra citations due to a software bug ;).

Anze Slosar · Post by **Anze Slosar** » May 17 2006

In the banana republic of Slovenia there is this concept of "clean citations". A "clean citation" is citation for which citing and cited article have no common authors. The "exclude self-citations" on ADS doesn't quite achieve this effect. Does anybody know how to get this number easily?

Also, has anybody else noticed that those greedy idiots at SCI Web of Science not only have a terrible interface but also seem to be missing lots of citations, seemingly because they don't count papers that were cited as astro-ph papers (even if published later) (but maybe I am just using it in a wrong manner) (again, in the BROS SCI is the only things that counts).

Boud Roukema · Post by **Boud Roukema** » June 22 2006

hi luca,

non-ascii letters in names

Although the ADS interactive interface correctly deals with names with non-ascii characters (if you enter them as ascii), your script seems to ignore most of the names which are in non-ascii characters.

You can test this with names like

Katarzyński
Goździewski

or in ascii:

Katarzynski
Gozdziewski

i think it's OK to keep the input in ascii, but it's a problem that the result ignores the non-ascii entries.

The simplest solution would probably be after retrieving the data from the ADS, replace all letters like ń,ż,ł,ą,ę,ś,ć,ó,ż and also non-polish latin-2 and latin-1 letters by their ascii equivalents.

Here's a uuencoded sed script to convert latin-2 polish letters to ascii:

Code: Select all

begin 744 iso2_drop.sh
M&#40;R&#36;O8FEN+W-H"B,C&#40;&QI8V5N8V4Z&#40;&#36;=03"!O<B!P=6&#41;L&#58;6,@9&&#93;M86EN&#40;&%S
M&#40;'EO=2!L&#58;6ME&#40;"!"+B!2;W5K96UA&#40;#&#40;P,#8*&#40;R,@<V-R&#58;7!T&#40;'1O&#40;'&#41;E<&QA
M8V4@25-/+3@X-3DM,B`H&#58;7-O+6QA=&EN+3&#40;I&#40;&-H87&#41;A8W1E<G,@8GD@<&QA
M&#58;6X@87-C&#58;6D*&#40;R,@92YG+B`@L2!B96-O;65S&#40;&&#36;*&#40;R,@"B,C&#40;'5S86=E.@HC
M&#40;R!I<V\R7V1R;W`N<V@@&#58;6YP=71F&#58;6QE&#40;`H*96-H;R`G&#41;24@5T%23DE.1SH@
M5&AI<R!I<R!A&#40;%9%4ED@<VEM<&QE&#40;'-C<FEP="X@270@&#58;7,@;F&#93;T&#40;'5S97&#40;@
M9G&#41;I96YD;'DN&#41;PIE8VAO&#40;"<E&#41;2!U<V%G93HG"F5C&#58;&\@&#41;R4E&#40;&ES;S&#41;S;&%S
M&#58;"YS&#58;"!I;G!U=&9I;&4@&#41;PH*<V5D&#40;"UE&#40;"=S?+%\87PG9R`D,2!\&#40;%P*<V5D
M&#40;"UE&#40;"=S?.9\8WPG9R`@?"!<"G-E9"`M92`G<WSJ?&5\&#41;V<@&#40;'P@7`IS960@
M+64@&#41;W-\LWQL?"=G&#40;"!\&#40;%P*<V5D&#40;"UE&#40;"=S?/%\;GPG9R`@?"!<"G-E9"`M
M92`G<WSS?&&#93;\&#41;V<@&#40;'P@7`IS960@+64@&#41;W-\MGQS?"=G&#40;"!\&#40;%P*<V5D&#40;"UE
M&#40;"=S?+&#93;\>GPG9R`@?"!<"G-E9"`M92`G<WR\?'A\&#41;V<@&#40;'P@7`IS960@+64@
M&#41;W-\H7Q!?"=G&#40;"!\&#40;%P*<V5D&#40;"UE&#40;"=S?,9\0WPG9R`@?"!<"G-E9"`M92`G
M<WS*?&#36;5\&#41;V<@&#40;'P@7`IS960@+64@&#41;W-\HWQ,?"=G&#40;"!\&#40;%P*<V5D&#40;"UE&#40;"=S
M?-%\3GPG9R`@?"!<"G-E9"`M92`G<WS3?&#36;&#93;\&#41;V<@&#40;'P@7`IS960@+64@&#41;W-\
MIGQ3?"=G&#40;"!\&#40;%P*<V5D&#40;"UE&#40;"=S?*&#93;\6GPG9R`@?"!<"G-E9"`M92`G<WRL
0?%A\&#41;V<@&#40;`H*97AI="`Q"@``
`
end

Code: Select all

iconv

can also help e.g.

Code: Select all

iconv -f utf8 -t latin2

except that iconv crashes whenever it finds a letter that it doesn't recognise. Maybe there's an "ignore-errors" option - it's GPL so can be used freely provided your own code is GPL-compatible.

h-statistic

Jorge E. Hirsch's statistic looks pretty cool - http://arxiv.org/abs/physics/0508025 - any chance of also calculating scientific age (years since first published paper) = n and then calculating

m = h/n ?

This essentially gives that m= 1.0 is a good scientist and 2.0 is an excellent scientist.

Luca Amendola · Post by **Luca Amendola** » June 23 2006

Boud, thanks for the feedback.
I will soon add the polish characters in my script' list of non-ascii characters
so that they are consistently replaced with the corresponding ascii characters both in input and in output (of course this is what the script was meant to do, but some characters escaped my attention).
As for your m=h/age, in fact you can do it easily from the results of Bibliomatrix: for a single author in fact you have in output h and the number of publications ordered by year. You can also do m'=h/authors/age which I am sure is a measure of something, although don't know what.
I could do it also for a list of authors, maybe in the future...

Boud Roukema · Post by **Boud Roukema** » March 01 2009

hi Luca,

There are a whole lot of HESS papers that have come out last year with about 140 or so authors. There are some people at my institute a bit puzzled that their 1/140-th or so contribution to these papers does not get recognised by the present bibliomatrix script. Any chance that the server can cope with doubling the limit from 80 to 160 authors to search among when deciding co-authorship?

cheers
boud

CosmoCoffee

bibliometrics

bibliometrics

bibliometrics

bibliometrics

bibliometrics

Re: bibliometrics

bibliometrics

bibliometrics

bibliometrics

bibliometrics - i found a bug (or feature = compliment :)

bibliometrics

Re: bibliometrics

bibliometrics

bibliometrics - non-ascii letters in names and normalising b

bibliometrics

Re: bibliometrics