Gini coefficient of economics and ROC curve machine learning

Receiver Operator curve or ROC curve are used in data mining , machine learning. from area under ROC curve u can calculate Gini coefficient. I have made an excel template

2013-05-12 18.58.03

Example to show how its calculated.

if AUC is area under curve then,

G= 2AUC-1

Gini coefficient the most watched coefficient of economics these days :
I wrote a article comparing different countries of world with data available

https://sandyclassic.wordpress.com/2013/02/06/watch-gini-coefficient-only-show-income-distribution-not-lowhigh-income-distribution/

Gini coefficient AUC has some component of noise which called to question of better measures which are used in machine learning DeltP or informedness ,mattews correlation coefficient each one is suitable to its own field while informedness=1 shows perfect performance while -1 represent perverse of negative performance despite all informedness. Economics Gini zero shows perfect equality.

So parameters keep improving there is no end result and there cannot be as our understanding increases we come at better measures and change is constant..but what is truth today was mystery or magic for old and would be kind of half truth for future..But the subjects are interconnected the branching of knowledge areas is going on since last 250 yrs.. earlier there was no engineering everything was under philosophy during Socrates. Socrates rightly said : that you cannot say anything with absolute certainty. But you can have informed decision that is what informedness quantifies that your decision how much they are informed decisions.

See a case from Biometrics:

submitAssign1

Bigdata,cloud , business Intelligence and Analytics

There huge amount of data being generated by BigData Chractersized by 3V (Variety,Volume,Velocity) of different variety (audio, video, text, ) huge volumes (large video feeds, audio feeds etc), and velocity ( rapid change in data , and rapid changes in new delta data being large than existing data each day…) Like facebook keep special software which keep latest data feeds posts on first layer storage server Memcached (memory caching) server bandwidth so that its not clogged and fetched quickly and posted in real time speed the old archive data stored not in front storage servers but second layer of the servers.
Bigdata 3V characteristic data likewise stored in huge (Storage Area Network) SAN of cloud storage can be controlled by IAAS (infrastucture as service) component software like Eucalyptus to create public or private cloud. PAAS (platform as service) provide platform API to control package and integrate to other components using code. while SAAS provide seamless Integration.
Now Bigdata stored in cloud can analyzed using hardtop clusters using business Intelligence and Analytic Software.
Datawahouse DW: in RDBMS database to in Hadoop Hive. Using ETL tools (like Informatica, datastage , SSIS) data can be fetched operational systems into data ware house either Hive  for unstructured data or RDBMS for more structured data.

BI over cloud DW: BI can create very user friendly intuitive reports by giving user access to layer of SQL generating software layer called semantic layer which can generate SQL queries on fly depending on what user drag and drop. This like noSQL and HIVE help in analyzing unstructured data faster like data of social media long text, sentences, video feeds.At same time due to parallelism in Hadoop clusters and use of map reduce algorithm the calculations and processing can be lot quicker..which is fulling the Entry of Hadoop and cloud there.
Analytics and data mining is expension to BI. The social media data mostly being unstructured and hence cannot be analysed without categorization and hence quantification then running other algorithm for analysis..hence Analytics is the only way to get meaning from terabyte of data being populated in social media sites each day.

Even simple assumptions like test of hypothesis cannot be done with analytics on the vast unstructured data without using Analytics. Analytics differentiate itself from datawarehouse as it require much lower granularity data..or like base/raw data..which is were traditional warehouses differ. some provide a workaround by having a staging datawarehouse but still  data storage here has limits and its only possible for structured data. So traditional datawarehouse solution is not fit in new 3V data analysis. here new Hadoop take position with Hive and HBase and noSQL and mining with mahout.

How to maintain privacy with surveillance ?

Recent month questions are being raised On topic of surveillance about privacy of individual. Surveillance is very much important for safety of society even if to some extent it takes privacy away as it gives safety.
But there was to give both at same time?

What can be done all three forms sound, video, data can be input into cloud on large hadoop clusters..
Now if we are able to tag all inputs or make sound to text conversion then auto tag whole script. Now the data so collected can be analysed using Bigdata analysis technology for certain suspicious keyword patters or network of words created by running social network analysis or market basket or Markov chain algorithm… which can decipher and arrange categorize the actors. Using this analysis we can directly reach to suspected traffic rather than scanning through whole traffic.. there are problems
problem 1: there are many languages in world ?
Solution:but translation software exist for each of those.

problem 2: How to achieve voice traffic tagging?
Solution: lot of speech to text conversion software are available which can do this work more quickly and with speed thus tagged scripts are easy to search through.

Even capability of predictive Analytic can be exploited.

Now this approach no one has direct access to data yet analysis can be possible in better way.. But surely there will be some constraint which where only manual interventions will required and those cannot be discounted.